Skip to content

Infrastructure fit: requirements and recommendations

How well will Soveren fit into your technical infrastructure? There are several key aspects that you need to consider when evaluating Soveren.

Deployment

  1. Kubernetes version 1.21 and higher: these are the versions that Soveren has been tested with. It can operate on older versions, but functionality may be suboptimal.

  2. We use Helm for deployment. However, clients often adapt our charts to integrate the resulting manifests into their own deployment systems.

  3. 4 CPU, 16 Gb nodes. We optimized our limits for nodes of this size. These are standard general-purpose nodes from popular cloud platforms:

  4. The components of the Soveren Sensor are entirely self-sufficient. For instance, you don't need expertise in Kafka for our messaging and processing system to operate; we handle everything.

DIM Sensor specifics

Traffic analysis

Must-haves for Soveren DIM Sensor to work are Kubernetes and HTTP/JSON

At present, Soveren DIM Sensor only works in Kubernetes and monitors only HTTP traffic with JSON payloads (setting the correct content-type is crucial). These are necessities for Soveren to run and start providing insights.

  1. We exclusively analyze HTTP traffic.

  2. If the internal cluster traffic is encrypted with a service mesh (Istio or Linkerd), we automatically recognize and monitor it.

  3. The requests must have the content-type properly set, namely:

    • application/json: our preferred mode of operation;

    • application/x-www-form-urlencoded: we also recognize form data.

  4. Both the request and response, as well as the resulting request/response pair, must be under 1 Mb in size. Anything larger is excluded from processing.

  5. It's acceptable for JSON or form payloads to be compressed (e.g., gzipped), provided the resulting size doesn't exceed 1 Mb (otherwise, they're also not processed).

Resource usage

The components of the Soveren Sensor exhibit varied resource usage patterns. Here's the general rule for all of them: the more resources you can allocate, the faster the data map will be built. In extreme cases, there might be restarts of some components, but they only affect the data map building time — not the core functionality of Soveren, and certainly not your cluster's stability.

  • Interceptors often utilize 100% of the allocated CPU resources as they aggressively attempt to capture all the available traffic. The memory they consume is used for storing the packets while assembling the request/response pairs. However, the amount of resources allocated to the Interceptors only affects the time it takes to capture enough representative traffic to build a comprehensive data map. The more resources you can allocate, the faster the map will be built, and vice versa.

Remember that the Interceptors are present on each node (they are a DaemonSet). To get the full picture, you need to multiply their requests (and limits!) by the number of nodes in the cluster. However, the actual load on a particular Interceptor is heavily influenced by the amount of requests and the payloads it is collecting.

  • Digger is usually larger than any of the Interceptors, but there's only one instance of Digger in the cluster. Digger requires memory to read the collected payloads from Kafka, process them, and send the results to the Soveren Cloud. If the cluster is sizable and the load is high, then resource usage will be noticeable, reaching the limits that you've set. If this situation persists, we recommend increasing the limits — however, as with the Interceptors, this should only affect the time it takes to build the data map.

  • Kafka also exists as a single instance per cluster. It is more memory-hungry than Digger because it needs to hold all the traffic that Soveren collects until it is processed by Digger and other components.

  • Detection-tool is the most resource-intensive component of the Soveren Sensor because it hosts the data detection model. There is also one instance per cluster. It's not unusual for it to consume 1 CPU and 2 Gb of memory — this is the benchmark for the most resources that any of the Soveren Sensor components can consume even under moderate load.

The detection-tool processes only a portion of the traffic — the part that Digger samples and sends for data detection. Because of this, the data map needs some time to build.

DAR Sensor specifics

Currently, Soveren DAR Sensor supports AWS S3, Kafka and PostgreSQL (both cloud-based and on-premise). Some metadata may not be available depending on the cloud where your infrastructure is located.

For S3, sensitive data detection is available for the following formats:

  • JSON

  • NDJSON

  • CSV

  • YAML

  • Text-based logs

GZipped JSON and logs are also supported.

For Kafka, sensitive data detection is available for JSON messages.