Soveren Data-in-Motion (DIM) Sensor¶
Soveren DIM Sensor intercepts and analyzes structured traffic over HTTP and gRPC. It collects metadata about data flows, identifying field structures, detected sensitive data types, and involved services.
Soveren DIM Sensor comprises several key parts:
-
Interceptors: Distributed across all nodes in the cluster via a DaemonSet, Interceptors capture traffic from pod’s virtual interfaces using a packet capturing mechanism.
-
Processing and messaging system: This system includes a Kafka instance that stores request/response data and a component called Digger which forwards data for detection and eventually sends the resulting metadata to the Soveren Cloud.
-
Sensitive data detector (Detector): Employs proprietary machine learning algorithms to identify data types and gauge their sensitivity.
In Kubernetes terms, the Soveren DIM Sensor introduces the following pods to the cluster:
-
Interceptors: One per worker node.
-
Kafka: Part of the Processing and messaging system, deployed once per setup.
-
Digger: The core component of the Processing and messaging system, deployed once per setup.
-
Detection-tool (Detector): Deployed once per setup.
We also employ Prometheus Agent for metrics collection, this component is not shown here.
Let's delve deeper into the main components' operations and communications.
The Soveren DIM Sensor follows this sequence of operations:
-
Interceptors collect relevant traffic from pods, focusing on HTTP and gRPC.
-
Interceptors pair requests to individual endpoints with their respective responses, creating request/response pairs.
-
Interceptors transfer these pairs to Kafka using the binary Kafka protocol.
-
Digger reads the request/response pair from Kafka, evaluates it for detailed analysis of data types and their sensitivity (employing intelligent sampling for high load scenarios). If necessary, Digger forwards the pair to the Detection-tool and retrieves the result.
-
Digger assembles a metadata package describing the processed request/response pair and transmits it to the Soveren Cloud using gRPC protocol and protobuf.
The Kubernetes API provides pod names and other metadata to the Digger. Consequently, Soveren Cloud identifies services by their Kubernetes names rather than IP addresses, enhancing data comprehensibility in the Soveren app.