Soveren consists of two major parts:
- Soveren Agent which you deploy in your Kubernetes cluster. It intercepts and parses structured HTTP JSON traffic, gathers metadata about the flows and sends it to the Soveren Cloud. This metadata contains information about how the payload was structured (what fields), which sensitive data types were detected, and which services were involved in the communication. No part of the actual payload values is included in the metadata.
- Soveren Cloud is hosted and managed by Soveren. It provides dashboards to gain visibility into sensitive data flows, as well as summary statistics and metrics.
Soveren Agent consists of several parts:
- Interceptors which are distributed to all worker nodes of the cluster through the DaemonSet. They capture the traffic from virtual interfaces of the pods with the help of a packet capturing mechanism;
- Messaging system which receives data from the Interceptors. Consists of a Kafka instance which stores request/response data, and Digger which passes data to the detection and further to the Soveren Cloud;
- Sensitive Data Detector (or simply Detector) which discovers data types and their sensitivity with the help of proprietary machine learning algorithms.
In Kubernetes (K8s) terms, all components of the Soveren Agent are pods which are deployed to the worker nodes.
In general, Interceptors are present on each node of the cluster. Kafka, Digger and Detector are deployed onto a separate node.
So speaking in Kubernetes terms, there are the following pods:
- Interceptors: many of them, one per each worker node;
- Kafka as part of the Messaging system, one per deployment;
- Digger as another part of the Messaging system, one per deployment;
- Detection-tool as Detector, one per deployment.
Let’s look in more detail into what those components do and how they talk to each other.
The end-to-end flow
The flow of the Soveren Agent looks like this:
Interceptors collect relevant traffic from the pods. They observe only HTTP requests with
Interceptors match requests to individual endpoints with responses coming from them, and build request+response pairs.
Interceptors write the request/response pairs to the Kafka topic using binary Kafka protocol.
(The Interceptor is present as well on the node where Kafka and Digger and Detection-tool might be deployed because there can also be other pods with the traffic subject to monitoring.)
Digger reads the request/response pair from the topic and decides whether it’s subject to the detailed analysis of present data types and their sensitivity. (Intelligent sampling may be involved here if the load becomes really high). If yes, Digger sends the pair to the Detection-tool and gets back the result.
Digger forms a metadata package describing the processed request/response pair and sends it to the Soveren Cloud. This connection uses gRPC and protobuf.
Interceptors are deployed as a DaemonSet. Normally they are present as pods on each worker node of the cluster.
Pods with the Interceptors have
hostNetwork set to
true (more on that below). That gives them access to the underlying host, that is to the virtual machine they are running on. Given that, the Interceptors read data from network namespaces of the host, leveraging the PCAP library for that.
Then, the Interceptors should know which interfaces to read. For that they are given the list of relevant IP addresses that should be present on their host, they could then match them with the interfaces that they actually observe on the host. Digger — a part of the messaging system — leverages the Kubernetes (K8s) API to obtain the addressing information.
Interceptors then read data from the virtual interfaces available to them in the network namespace of the host. This reading happens in a non-blocking fashion. If it so happens that the host is loaded with higher priority tasks, then the OS may deprive the Interceptor of CPU and memory resources, which in turn may result in partial coverage of the traffic by the Interceptor.
The K8s API also provides naming of pods and other metadata to the Interceptors. As a result, later on in the Soveren Cloud the assets are called by their DNS/K8s names instead of IP addresses, which makes data that the Soveren app displays more accessible.
For Interceptors to be able to read from the host, the containers they run in require the following permissions (you can't really change them without breaking the interception, but just in case):
securityContext: privileged: true dnsPolicy: ClusterFirstWithHostNet hostNetwork: true hostPID: true
Soveren Cloud is a SaaS managed by Soveren. It offers a set of dashboards that provide various views into the metadata collected by Soveren Agent. There are analytics and stats on which relevant data types have been observed and how sensitive they were, what services were involved, were there any violations of pre-set policies and configurations in terms of allowed data types.