Data Flow Engine¶
The Data Flow Engine (DFE) enables customers to create workflows based on platform events and ingested data. These workflows can be used for example to automate KPI calculations in third party apps.
For accessing this service, you need to have the respective roles listed in Data Flow Engine roles and scopes.
Workflows created with the Data Flow Engine are called streams. They can be used to filter and process ingested data, react on events, calculate KPIs, aggregate data, modify the data model, write data into platform data lakes and to send notifications or create platform events. DFE streams consist of a chain of different apps provided by DFE. These apps can be combined and configured to do real time computation by consuming and processing an unbound amount of data. The number of apps is continually enhanced and can be extended by custom apps in the future.
The DFE is implemented based on the Spring Cloud Data Flow architecture. Refer to the Spring Cloud Data Flow documentation to understand the idea behind the engine. Similar concepts can be found under different terminologies, for example sources, processors, and sinks are also referred to as trigger, rules, and actions.
Workflows are defined in the Data Flow Engine as message based streams between configurable apps. In this context, three types of apps are distinguished:
- Source: Inputs data in order to trigger a flow.
- Processor: Filters or redirects the data.
- Sink: Executes an action based on the arriving data.
A working stream contains one source app and one sink app. It can also contain one or more processors between them. When using multiple processors, they have to be arranged in sensible order. Using these components the model of a DFE stream can be outlined as follows:
The apps listed below are already available, but DFE can be enhanced based on general usage or custom needs.
Source apps are used to listen to different source entities or events. This type of application can only stand at the beginning of a stream. A source may provide a filtering option to decrease the number of messages fed into the stream. There are two types of sources available for the Data Flow Engine:
The TimeSeriesSource monitors time series uploads and triggers a stream. This is the entry point for time series data into a DFE stream.
TimerSource sends timestamp messages based on a fixed delay, date or cron expression.
Processor apps can be placed inside a DFE stream. They can interrupt the stream and suppress data flow, but do not modify the data (JSON). The Data Flow Engine provides the following processors - for more detailed information refer to Processors:
The ChangeDetector filters out messages, where a specified variable is the same as in the previous message.
The FilterProcessor filters out messages, if a specified expression is not fulfilled.
The MessageSampler limits the maximum number of messages coming from one device in a given time period to prevent overload of the downstream flow.
The Hysteresis processor introduces a little slack space after an event has occurred to prevent message spamming because of oscillations around the threshold.
The Marker adds a label to incoming messages before forwarding them.
The Sequence processor filters out messages coming from parallel streams, if they don't match a specified pattern.
Sinks are the ending apps of a DFE stream. They only have inputs and perform the last step of a stream. This can be a notification or an event inside the platform. There are two types of sinks available for the Data Flow Engine - for more detailed information refer to Sinks:
When the EventSink receives a message, it creates an event based on the configured parameters and forwards it to the Event Management Service.
The EmailSink sends an e-mail when it receives a message from the stream.
Data Flow Server¶
The Data Flow Server (DFS) acts as a management service for the Data Flow Engine, it provides functionalities for listing and modifying streams and their components. To deploy a new DFE stream, a user has to call the API endpoints on the Data Flow Server.
Use the Data Flow Engine API for realizing the following tasks:
- Create or delete streams
- Get the list of existing streams or find an exact stream by name
- Deploy or undeploy a stream
- Get the list of runtime apps or find an exact app by name
- Get the specific app instances
The Data Flow Engine is able to:
- Subscribe to time series data flows from MindSphere
- Send events when a specific stream has been triggered
- Filter messages containing irrelevant information
- MindSphere can store any number of stream definitions, but the number of concurrently deployed streams is restricted according to the purchased plan.
- In region Europe 1, FIFO queues are not available and message ordering is not guaranteed by the Data Flow Server.
- Owing to the asynchronous nature of messages, the processing time of a message is non-deterministic depending on system load.
- The stream name must not be longer than 15 characters.
- There is a delay of up to 30 minute delay between deleting an asset and deletion of dependent streams.
A client wants to monitor an engine in a factory. The engine is a critical part of the manufacturing process and has a history of overheating (Simple Scenario). Conclusions about the engine's condition can be drawn from how much it resonates during operation (Extended Scenario).
The client creates a Data Flow Engine stream with a TimeSeriesSource, a FilterProcessor and an EventSink. The TimeSeriesSource listens to a heat sensor placed on the engine. The FilterProcessor is set to only forward messages of temperatures higher than an allowed threshold. The EventSink chooses a built-in event, instead of customizing one. The sink stays dormant for a while, because the filter does not forward messages until the temperature threshold is reached. Once the threshold is reached, the client receives notifications about the engine overheating.
The same client wants to monitor the resonance of the engine. The client sets up the TimeSeriesSource and the EventSink as for monitoring the temperature, but inserts a FilterProcessor and a MessageSamplerFilter between them. The FilterProcessor is set to only forward messages when the resonance is higher than a threshold (but still lower than what is expected at the end of the lifecycle of the engine). The MessageSamplerFilter is set to only forward messages once per day so that the client gets a daily reminder, but can wait for a good offer from the engine manufacturers without being bombarded with notifications.
Any questions left?
Except where otherwise noted, content on this site is licensed under the MindSphere Development License Agreement.