Skip to content

Configuring the Sensor

Common configuration

Refer to the separate guide for security-related configuration options.

We use Helm for managing the deployment of Soveren Sensors. Refer to our Helm chart for all values that can be tuned up for the Soveren Sensor.

To customize values sent to your Soveren Sensor, you need to create the values.yaml file in the folder that you use for custom Helm configuration.

Don't forget to run a helm upgrade command after you've updated the values.yaml file, providing the -f path_to/values.yaml as a command line option (see the updating guide).

Only use values.yaml to override specific values!

Avoid using a complete copy of our values.yaml from the repository. This can lead to numerous issues in production that are difficult and time-consuming to resolve.

Sensor token

You should use values.yaml to set the token for the sensor.

digger:
  token: <TOKEN>
crawler:
  token: <TOKEN>

The token value is used to send metadata to the Soveren Cloud and to check for over-the-air updates of the detection model.

Use unique tokens for different deployments

If you're managing multiple Soveren deployments, please create unique tokens for each one. Using the same token across different deployments can result in data being mixed and lead to interpretation errors that are difficult to track.

You can also use Kubernetes secrets or store the token value in HashiCorp Vault and retrieve it at runtime using various techniques. Check the Securing Sensors page for instructions on how to do this.

Custom volumes

You can mount custom volumes, e.g., for secrets or configuration. To do this, define volumeMounts and volumes for each pod.

Defining volumeMounts and volumes
1
2
3
crawler:
  volumeMounts: []
  volumes: []
Example of how you can set up custom volume mounts
volumeMounts:
  - name: all-in-one
    mountPath: /etc/config
  - name: secrets-store-inline
    mountPath: /etc/secret
volumes:
  - name: all-in-one
    projected:
      sources:
      - secret:
          name: mysecret
          items:
            - key: username
              path: my-group/my-username
      - configMap:
          name: myconfigmap
          items:
            - key: config
              path: my-group/my-config
  - name: secrets-store-inline
    csi:
      driver: secrets-store.csi.k8s.io
      readOnly: true
      volumeAttributes:
        secretProviderClass: "my-provider"

Binding components to nodes

The Soveren Sensor consists of two types of components:

  • Interceptors, which are distributed to each node via DaemonSet. Interceptors are exclusively used by the Data-in-motion (DIM) sensors.

  • Components instantiated only once per cluster via Deployments; these include digger, crawler, kafka, detectionTool and prometheusAgent. These can be thought of as the centralized components.

The centralized components consume a relatively large yet steady amount of resources. Their resource consumption is not significantly affected by variations in traffic volume and patterns. In contrast, the resource requirements for Interceptors can vary depending on traffic.

Given these considerations, it may be beneficial to isolate the centralized components on specific nodes. For example, you might choose nodes that are more focused on infrastructure monitoring rather than on business processes. Alternatively, you could select nodes that offer more resources than the average node.

If you know exactly which nodes host the workloads you wish to monitor with Soveren, you can also limit the deployment of Interceptors to those specific nodes.

First, you'll need to label the nodes that Soveren components will utilize:

kubectl label nodes <your-node-name> nodepool=soveren

After labeling, you have two options for directing the deployment of components: using nodeSelector or affinity.

Option 1: using nodeSelector
nodeSelector:
  nodepool: soveren
Option 2: using affinity
1
2
3
4
5
6
7
8
9
affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: nodepool
          operator: In
          values:
          - soveren

The affinity option is conceptually similar to nodeSelector but allows for a broader set of constraints.

Resources

We do not recommend changing the requests values. They are calibrated to ensure the minimum functionality required by the component with the allocated resources.

On the other hand, the limits for different containers can vary significantly and are dependent on the volume of collected data. There is no one-size-fits-all approach to determining them, but it's crucial to monitor actual usage and observe how quickly the data map is constructed by the product. The general trade-off here is: the more resources you allocate, the quicker the map is built.

It's important to note that the Soveren Sensor does not persist any data. It is normal for components to restart and virtual storage to be flushed. The ephemeral-storage values are set to prevent the overuse of virtual disk space.

Detailed breakdown of resources:

Container CPU requests CPU limits MEM requests MEM limits Ephemeral storage limits
interceptor 50m 1000m 64Mi 1536Mi 100Mi
digger 100m 1500m 100Mi 768Mi 100Mi
detection-tool 200m 2200m 2252Mi 2764Mi 200Mi
kafka 100m 400m 650Mi 1024Mi 10Gi
kafka-exporter 100m 400m 650Mi 1024Mi 10Gi
prometheus 75m 75m 192Mi 400Mi 100Mi

Pods containing interceptor are deployed as a DaemonSet. To estimate the required resources, you will need to multiply the values by the number of nodes.

Container CPU requests CPU limits MEM requests MEM limits Ephemeral storage limits
crawler 100m 1500m 100Mi 768Mi 100Mi
detection-tool 200m 2200m 2252Mi 4000Mi 200Mi
kafka 100m 400m 650Mi 1024Mi 10Gi
kafka-exporter 100m 400m 650Mi 1024Mi 10Gi
prometheus 75m 75m 192Mi 400Mi 100Mi

Kafka

In our testing, Kafka was found to be somewhat heap-hungry. That's why we limited the heap usage separately from the main memory usage limits.

Default heap settings for Kafka
1
2
3
4
5
kafka:
  embedded:
    env:
    - name: KAFKA_HEAP_OPTS
      value: -Xmx512m -Xms512m

The rule of thumb is this: if you increased the limits memory value for the kafka container ×N-fold, also increase the heap ×N-fold.

The Soveren Sensor is designed to avoid persisting any information during runtime or between restarts. All containers are allocated a certain amount of ephemeral-storage to limit potential disk usage. Kafka is a significant consumer of ephemeral-storage as it temporarily holds collected information before further processing by other components.

There may be scenarios where you'd want to use persistentVolume for Kafka. For instance, the disk space might be shared among various workloads running on the same node, and your cloud provider may not differentiate between persistent and ephemeral storage usage.

Enabling persistent volume for Kafka
kafka:
  embedded:
    persistentVolume:
      # Create/use Persistent Volume Claim for server component.
      # Uses empty dir if set to false.
      enabled: false
      # Array of access modes.
      # Must match those of existing PV or dynamic provisioner.
      # Ref: http://kubernetes.io/docs/user-guide/persistent-volumes/
      accessModes:
        - ReadWriteOnce
      annotations: {}
      storageClass: ""
      # Bind the Persistent Volume using labels.
      # Must match all labels of the targeted PV.
      matchLabels: {}
      # Size of the volume.
      # The size should be determined based on the metrics you collect and the retention policy you set.
      size: 10Gi

Local metrics

You can collect metrics from the Soveren Sensor locally and create your own dashboards.

Collecting metrics in your own Prometheus instance
1
2
3
4
5
6
7
prometheusAgent:
  additionalMetrics: 
    enabled: "true"
    # The name that you want to assign to your local Prometheus
    name: "<PROMETHEUS_NAME>"
    # The URL which will be receiving the metrics
    url: "<PROMETHEUS_URL>"

Log level

By default, the log levels for all Soveren Sensor components are set to error. To adjust the verbosity of the logs according to your monitoring needs, you can specify different log levels for individual components.

Tuning the log level
1
2
3
4
digger:
  cfg:
    log:
      level: info

You can adjust the log level for all components except Kafka.

DIM configuration

Multi-cluster deployment

For each Kubernetes cluster, you'll need a separate DIM sensor. When deploying DIM sensors across multiple clusters, they will be identified by the tokens and names assigned during their creation.

Use a separate sensor for each cluster

Sometimes you may want to automate the naming of your clusters in Soveren during deployment.

Automating the cluster name configuration
digger:
  clusterName: <NAME>

Without those settings, Soveren will default to using the Sensor's name defined in the Soveren app.

Namespace filtering

At times, you may want to limit the Soveren Sensor to specific namespaces for monitoring. You can achieve this by either specifying allowed namespaces (the "allow list") or by excluding particular ones (the "exclude list").

The syntax is as follows:

  • If nothing is specified, all namespaces will be monitored.
  • An asterisk (*) represents "everything."
  • action: allow includes the specified namespace for monitoring.
  • action: deny excludes the specified namespace from monitoring.
Filtering out namespaces from monitoring
digger:
  cfg:
    kubernetesfilterlist:
      definitions:
        # - namespace: default
        #   action: allow
        # - namespace: kube-system
        #   action: deny
        - namespace: "*"
          action: allow

When defining names, you can use wildcards and globs such as foo*, /dev/sd?, and devspace-[1-9], as defined in the Go path package.

The Sensor's default policy is to work only with explicitly mentioned namespaces, ignoring all others.

End with allow * if you have any deny definitions

If you've included deny definitions in your filter list and want to monitor all other namespaces, make sure to conclude the list with:

      - namespace: "*"
        action: allow

Failing to do so could result in the Sensor not monitoring any namespaces if only deny definitions are present.

Service mesh and encryption

Soveren can monitor connections encrypted through service meshes like Linkerd or Istio.

The Sensor will automatically detect if a service mesh is deployed on the node. Fine-tuning is only necessary if your mesh implementation uses non-standard ports.

Example of non-standard Linkerd port
1
2
3
4
5
interceptor:
  cfg:
    # if the port of Linkerd differs from the default (4140)
    conntracker:
      linkerdPort: <PORT>

TLS interception

Soveren can intercept encrypted traffic from applications running in containers that use the OpenSSL library.

Enabling application-level TLS interception
1
2
3
4
5
6
interceptor:
  cfg:
    openssl:
      enabled: true
      # strategy: "round-robin" # default, can be set to "all"
      # processesCount: 1 # default, can be increased slightly (e.g. to 5)

TLS interception is a highly experimental feature and requires more resources than intercepting non-encrypted traffic. Therefore, it is disabled by default

updateStrategy

You can adjust the update strategy for Interceptors.

Using updateStrategy for Interceptors (DaemonSet)
1
2
3
4
5
interceptor:
  updateStrategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1

DAR configuration

Deployment

We recommend creating a separate sensor for each type of asset that you want to monitor. For example, one sensor for S3 buckets, one for Kafka, and one for each database type.

We recommend using a separate DAR sensor for each asset type (S3, or Kafka, or a database variant)

You can also have multiple sensors covering the same type of asset, for performance reasons. While it is possible to use one sensor for all types, this approach can complicate the resolution of potential performance bottlenecks and other issues.

Instead of passing the credentials directly, you can use secrets to pass the whole connection string or configuration section.

S3 buckets

To enable S3 bucket discovery and scanning, you must provide the sensor with credentials for access. This can be done either directly by providing an access key or by configuring a specific role that the sensor will assume at runtime.

Soveren supports various S3 implementations, such as AWS and MinIO. For each S3 implementation, ensure you create and configure a separate DAR sensor.

A separate DAR sensor must be deployed for each S3 implementation, such as AWS or MinIO

You can also use secrets to pass the configuration to the sensor.

S3 scanning configuration
crawler:
  cfg:
    s3:
      enabled: false
      # S3 storage URL, including schema. Leave empty for AWS
      url: ""
      # S3 security token service URL, including schema. Leave empty for AWS
      # For MinIO, leave it blank or set equal to url (see above)
      stsurl: ""
      # S3 storage type. Acceptable values: aws, yandexcloud, minio
      #
      # WARNING!
      # A separate DAR sensor must be deployed for each S3 implementation,
      # such as AWS or MinIO.
      #
      type: "aws"
      # Access key ID
      accessKeyId: ""
      # Secret access key
      secretAccessKey: ""
      # Interval between checking if the bucket info is updated
      checkinterval: "12h"
      # Max number of attempts for retrying requests to the storage
      retrymaxattempts: 5
      # Max delay before the next retry request to the storage
      retrymaxbackoffdelay: "20s"
      # Folder ID. Relevant only for Yandex Cloud
      # If left blank, there will be no links to the cloud console available in the UI
      folderid: ""
      # Fully Qualified Domain Name (FQDN) self-hosted s3 service console URL
      # (including schema).
      # If empty, links to self-hosted console won't be available
      consoleurl: ""
      # Fully Qualified Domain Name (FQDN) for self-hosted s3 service domain
      # (including schema).
      # If empty, connections from assets to self-hosted s3 won't be found
      bucketsdomain: ""
      # Set false if AWS is used, otherwise true
      # If true, get owner info from the buckets list;
      # if false, use separate API to get info about the current AWS user
      getbucketownerfromlist: false
      s3role:
        # Role to assume when accessing the storage
        enabled: false
        # The Amazon Resource Name (ARN) of the role to assume.
        rolearn: ""
        # An identifier for the assumed role session.
        rolesessionname: SoverenCrawlerSession
        # The duration of the role session
        # Min: 15 minutes
        # Max: max session duration set for the role in the IAM.
        # If you specify a value higher than that, the operation fails
        duration: 15m0s
        # Access policy
        policy: ""
      # Yandex Cloud parameters needed for correctly inferring
      # the public/non-public status of buckets
      yandexcloud:
        # The private key obtained when creating authorized keys
        # for the provided service account
        # Example:
        # privatekey: |-
        #   PLEASE DO NOT REMOVE THIS LINE!
        #   -----BEGIN PRIVATE KEY-----
        #   ...
        #   -----END PRIVATE KEY-----
        privatekey: ""
        # ID of the public key obtained when creating authorized keys
        # for the provided service account
        publickeyid: ""
        # Service account ID that the IAM token will be requested for.
        # This service account must have the iam.serviceAccounts.tokenCreator role
        serviceaccountid: ""
      clienttlsconfig:
        # Skip tls certificate verification for self-hosted s3 service
        insecureskipverify: false

The user must be granted the s3:ListAllMyBuckets, and the following minimal Actions on all buckets that need to be monitored:

  • s3:GetBucketPolicyStatus

  • s3:GetBucketPolicy

  • s3:GetBucketAcl

  • s3:GetObject

  • s3:GetEncryptionConfiguration

  • s3:GetBucketTagging

  • s3:ListBucket

Kafka

To enable Kafka scanning, you must provide the sensor with the instance name and address, as well as the necessary access credentials.

You can also use secrets to pass the configuration to the sensor.

Kafka scanning configuration
crawler:
  cfg:
    kafka:
      enabled: true
      elements:
        # Name of the Kafka instance
        - instancename: "<YOUR KAFKA INSTANCE NAME>"
          # Kafka broker network addresses
          brokers: ["<YOUR KAFKA INSTANCE BROKER 1>", "<YOUR KAFKA INSTANCE BROKER 2>", ..., "<YOUR KAFKA INSTANCE BROKER N>"]
          tls: false
          tlsconfig:
            # Skip server certificate verification
            insecureskipverify: true
            # Path to PEM file with CA certificate
            # use volumeMounts and volumes options to mount certificates from secrets
            cafile: ""
            # Paths to PEM files for TLS certificate based client authentication
            # use volumeMounts and volumes options to mount certificates from secrets
            certfile: ""
            keyfile: ""
          sasl: false
          user: "<YOUR SASL USER>"
          password: "<YOUR SASL PASSWORD>"

SQL databases

To enable database scanning, you must provide the sensor with the instance name and the connection string containing necessary access credentials.

Currently we support PostgreSQL, SQL Server, and MySQL.

PostgreSQL

PostgreSQL configuration
crawler:
  cfg:
    database:
      postgres:
        enabled: true
        elements:
          - name: "<YOUR POSTGRESQL INSTANCE NAME>"
            # postgresql://[user[:password]@][netloc][:port][/dbname]
            # If /dbname is specified then only this database will be scanned by the sensor
            connectionString: "<YOUR POSTGRESQL INSTANCE CONNECTION STRING>"
            # Default database name to use if not specified in the connection string
            # If this value is empty, the dbname from the connection string will be used
            defaultDBName: ""

The user must have SELECT permissions on the following tables:

  • information_schema.role_table_grants

  • pg_catalog.pg_stat_ssl

  • pg_catalog.pg_database

  • pg_catalog.pg_roles

  • pg_catalog.pg_auth_members

  • pg_catalog.pg_tables

  • pg_catalog.pg_class

  • pg_catalog.pg_namespace

  • pg_catalog.pg_attribute

  • (optional) pg_catalog.pg_hba_file_rules

In addition, the user must also have SELECT permissions on any databases, schemas, or tables to be scanned for sensitive data.

SQL Server

SQL Server configuration
crawler:
  cfg:
    database:
      mssql:
        enabled: true
        elements:
          - name: "<YOUR SQL SERVER INSTANCE NAME>"
            # sqlserver://[user[:password]@][netloc][:port][/dbname]
            # If /dbname is specified then only this database will be scanned by the sensor
            connectionString: "<YOUR SQL SERVER INSTANCE CONNECTION STRING>"
            # Default database name to use if not specified in the connection string
            # If this value is empty, the dbname from the connection string will be used
            defaultDBName: ""

The user must have SELECT permissions on the following tables:

  • information_schema.COLUMNS

  • sys.dm_exec_connections

  • sys.tables

  • sys.schemas

  • sys.dm_db_partition_stats

In addition, the user must also have SELECT permissions on any databases, schemas, or tables to be scanned for sensitive data.

MySQL

MySQL configuration
crawler:
  cfg:
    database:
      mysql:
        enabled: true
        elements:
          - name: "<YOUR MYSQL INSTANCE NAME>"
            # mysql://[soveren_read_only_user[:password]@][netloc][:port][/dbname]
            # If /dbname is specified then only this database will be scanned by the sensor
            connectionString: "<YOUR MYSQL INSTANCE CONNECTION STRING>"
            # Default database name to use if not specified in the connection string
            # If this value is empty, the dbname from the connection string will be used
            defaultDBName: ""

The soveren_read_only_user must have SELECT permissions on any databases and tables to be scanned for sensitive data, as well as on some system tables.

This is how you should configure permissions for the soveren_read_only_user
CREATE ROLE `soveren_read_only_role`;
CREATE USER `soveren_read_only_user`@`%`;

SET DEFAULT ROLE `soveren_read_only_role` TO `soveren_read_only_user`@`%`;

GRANT USAGE ON *.* TO `soveren_read_only_role`@`%`;

-- For all databases that you want to monitor:
GRANT SELECT ON `DB_TO_MONITOR`.* TO `soveren_read_only_role`@`%`;

-- If you plan to use the Access Control feature:
-- (optional, the scanning will still work)
GRANT SELECT ON `mysql`.`db` TO `soveren_read_only_role`@`%`;
GRANT SELECT ON `mysql`.`role_edges` TO `soveren_read_only_role`@`%`;
GRANT SELECT ON `mysql`.`tables_priv` TO `soveren_read_only_role`@`%`;
GRANT SELECT ON `mysql`.`user` TO `soveren_read_only_role`@`%`;

NoSQL databases

To enable database scanning, you must provide the sensor with the instance name and the connection string containing necessary access credentials.

Currently we support MongoDB.

MongoDB

MongoDB configuration
crawler:
  cfg:
    nosqldatabase:
      mongodb:
        enabled: true
        elements:
          - name: "<YOUR MONGODB INSTANCE NAME>"
            # mongodb://[user[:password]@][netloc][:port][/dbname]
            # If /dbname is specified then only this database will be scanned by the sensor
            connectionString: "<YOUR MONGODB INSTANCE CONNECTION STRING>"

The user must have the following permissions:

  • getCmdLineOpts — globally;

  • listCollections — on databases to be scanned for sensitive data;

  • find, collStats — on databases and collections to be scanned for sensitive data.