Configuring the Soveren Agent
We use Helm for managing the deployment of Soveren Agents. To customize values sent to your Soveren Agent, you need to create the
values.yaml file in the folder that you use for custom Helm configuration.
You can change a number of things regarding the Soveren Agent deployment. You can always check our repository for the full list of possible values. But don't forget to run a
helm upgrade command after you've updated the
values.yaml file, providing the
-f path_to/values.yaml as a command line option.
values.yaml only for the values that you want to override!
Never use a complete copy of our
values.yaml from the repository. This leads to a lot of glitches in production that are hard and time consuming to track down.
values.yaml for the values that you want to change.
To save you some keystrokes when installing or updating the Agent, we suggest placing the following snippet into the
digger: token: <TOKEN>
Digger is a component of the Agent that actually sends metadata to the Soveren Cloud. Detection tool gets over-the-air updates of the part of the model from the Soveren Cloud. These are the places where the token value is used. (Detection tool gets the token value from Digger.)
You can adjust resource usage limits for each of the Soveren Agent's components.
As a rule of thumb, we do not recommend to change the
requests values. They are set with regard to a minimum reasonable functionality that the component can provide given that much resources.
limits however differ widely between Agent's components, and are heavily traffic dependent. There is no universal recipe for determining them, except to keep an eye on the actual usage and check how fast the data map is built by the product. General tradeoff here is this: the more resources you allow, the faster the map is built.
Soveren Agent does not persist any data, it is completely normal if any component restarts and virtual storage is flushed. The
ephemeral-storage is just for making sure that the virtual disk space is not overused. You can safely get rid of any of them.
The interceptors are placed on each node of the cluster as a DaemonSet. Their ability to collect the traffic is proportional to how much resources they are allowed to use.
HTTP requests and responses with
Content-type: application/json, reading from virtual network interfaces of the host and building request/response pairs. Thus the memory they use is directly proportional to how large those
The reading is done in a non-blocking fashion, leveraging the
libpcap library. If there is not enough CPU the interceptors may not have enough time to read the traffic and build enough request/response pairs relevant for building the data map.
The default configuration is the following. You are encouraged to observe the actual usage for a while and tune the
limits up or down.
interceptor: resources: requests: cpu: "100m" memory: "128Mi" limits: cpu: "1000m" memory: "2048Mi" ephemeral-storage: "100Mi"
Kafka is the only component not built by Soveren and used pretty much as is. It can grow very large in terms of the
The default values here are as follows. Under normal circumstances you don't need to touch any of them.
kafka: embedded: resources: requests: cpu: "100m" memory: "650Mi" ephemeral-storage: "5Gi" limits: cpu: "200m" memory: "1024Mi" ephemeral-storage: "10Gi"
Heap usage by Kafka
In our testing, Kafka was found to be somewhat heap-hungry. That's why we limited the heap usage separately from the main memory usage limits. You don't need to change it but here's what is set as the default:
kafka: embedded: env: - name: KAFKA_HEAP_OPTS value: -Xmx512m -Xms512m
Digger is a component which reads the data from Kafka, sends relevant requests and responses to the Deection tool and collect the results. Then it forms a metadata packet and sends it to the Soveren Cloud which creates all those beautiful product dashboards.
Digger employs all sorts of data sampling algorithms to make sure that all endpoints and assets in the data map are uniformly covered. In particular, Digger looks into the Kafka topics and moves offsets in there according to what has already been covered.
Normally you should not want to change the resource values for Digger but here they are:
digger: resources: requests: cpu: "100m" memory: "100Mi" limits: cpu: "1500m" memory: "768Mi" ephemeral-storage: "100Mi"
The Detection tool does all the heavy lifting when it comes to detecting data types in the flows and their sensitivity. It runs a custom built machine learning models using Python for that.
The values for the Detection tool resource consumption are adjusted for optimal performance regardless of the traffic nature. However, in some cases with a lot of heavy traffic it might make sense to increase the limits, so we encourage you to monitor the actual usage and adjust accordingly.
detectionTool: resources: requests: cpu: "100m" memory: "1680Mi" limits: cpu: "1100m" memory: "2304Mi" ephemeral-storage: "200Mi"
We run a Prometheus agent to collect some metrics to check basic performance of the Soveren Agent. Values here are pretty generic for most cases.
prometheusAgent: resources: requests: memory: "192Mi" cpu: "75m" limits: memory: "400Mi" cpu: "75m" ephemeral-storage: "100Mi"
Sending metrics to local Prometheus
If you want to monitor the metrics that the Soveren Agent collects, here's hod to do that:
prometheusAgent: additionalMetrics: enabled: "true" name: "<PROMETHEUS_NAME>" url: "<PROMETHEUS_URL>"
<PROMETHEUS_NAME>is the name that you want to give here to your local Prometheus,
<PROMETHEUS_URL>is the URL which will be receiving the metrics.
Proxying the traffic
Sometimes you might want to direct the traffic between Soveren Agent and Cloud though the proxy. You might want to do it e.g. for additional control of only the allowed traffic going outside of you cluster.
To do that, just specify the top-level value in your
<PROXY_PORT> are the address of your proxy service and the port which is dedicated to listening, respectively.
Sometimes it makes sense to confine the Soveren Agent to dedicated namespaces to monitor. You can do that by explicitly stating the allowed namespaces (the allow list) or by excluding particular ones (the exclude list).
The syntax is like this:
- if nothing is specified then all namespaces will be covered;
- asterisk means everything;
action: allowincludes this namespace into monitoring;
action: denyexcludes this namespace from monitoring.
Here's an example of how you can do this:
digger: cfg: kubernetesfilterlist: definitions: # - namespace: default # action: allow # - namespace: kube-system # action: deny - namespace: "*" action: allow
When defining names you can use wildcards and globs like
devspace-[1-9] etc as defined in the Go path package
The default policy of the Agent is to work with explicitly mentioned namespaces and ignore everything else.
allow * if you set any
If you've placed some
deny definitions into the filter list and want everything else to be monitored then please make sure you've ended the list with the following:
- namespace: "*" action: allow
Service mesh and encryption
Soveren can monitor connections encrypted with service mesh like Linkerd or Istio.
The agent will automatically detect if there is service mesh deployed in the cluster / on the node. You only need fine tuning if your mesh implementation uses non-standard ports.
For example, for Linkerd you might need something like this in your
interceptor: cfg: # if the port of Linkerd differs from the default (4140) conntracker: linkerdPort: <PORT>
Changing the log level
By default log levels of all Soveren Agent components is set to
error. You can change this by specifying different log level for individual components, like this:
[digger|interceptor|detectionTool|prometheusAgent]: cfg: log: level: error
(You need to create different config sections for dirrefent components —
prometheusAgent — but the syntax is the same.)
We don't manage the log level of Kafka, by default it's