Skip to content

Data Flow Engine

Idea

The Data Flow Engine (DFE) enables customers to create workflows based on platform events and ingested data. These workflows can be used for example to automate KPI calculations in third party apps.

Access

For accessing this service, you need to have the respective roles listed in Data Flow Engine roles and scopes.

Basics

Workflows created with the Data Flow Engine are called streams. They can be used to filter and process ingested data, react on events, calculate KPIs, aggregate data, modify the data model, write data into platform data lakes and to send notifications or create platform events. DFE streams consist of a chain of different apps provided by DFE. These apps can be combined and configured to do real time computation by consuming and processing an unbound amount of data. The number of apps is continually enhanced and can be extended by custom apps in the future.

The DFE is implemented based on the Spring Cloud Data Flow architecture. Refer to the Spring Cloud Data Flow documentation to understand the idea behind the engine. Similar concepts can be found under different terminologies, for example sources, processors, and sinks are also referred to as trigger, rules, and actions.

DFE Streams

Workflows are defined in the Data Flow Engine as message based streams between configurable apps. In this context, three types of apps are distinguished:

  • Source: Inputs data in order to trigger a flow.
  • Processor: Filters or redirects the data.
  • Sink: Executes an action based on the arriving data.

A working stream contains one source app and one sink app. It can also contain one or more processors between them. When using multiple processors, they have to be arranged in sensible order. Using these components the model of a DFE stream can be outlined as follows:

Schematic diagram of a DFE stream

The apps listed below are already available, but DFE can be enhanced based on general usage or custom needs.

Sources

Source apps are used to listen to different source entities or events. This type of application can only stand at the beginning of a stream. A source may provide a filtering option to decrease the number of messages fed into the stream. There are two types of sources available for the Data Flow Engine:

  • TimeSeriesSource
    The TimeSeriesSource monitors time series uploads and triggers a stream. This is the entry point for time series data into a DFE stream.
  • TimerSource
    TimerSource sends timestamp messages based on a fixed delay, date or cron expression.

Processors

Processor apps can be placed inside a DFE stream. They can interrupt the stream and suppress data flow, but do not modify the data (JSON). The Data Flow Engine provides the following processors - for more detailed information refer to Processors:

  • ChangeDetector
    The ChangeDetector filters out messages, where a specified variable is the same as in the previous message.
  • FilterProcessor
    The FilterProcessor filters out messages, if a specified expression is not fulfilled.
  • MessageSampler
    The MessageSampler limits the maximum number of messages coming from one device in a given time period to prevent overload of the downstream flow.
  • Hysteresis
    The Hysteresis processor introduces a little slack space after an event has occurred to prevent message spamming because of oscillations around the threshold.
  • Marker
    The Marker adds a label to incoming messages before forwarding them.
  • Sequence
    The Sequence processor filters out messages coming from parallel streams, if they don't match a specified pattern.

Sinks

Sinks are the ending apps of a DFE stream. They only have inputs and perform the last step of a stream. This can be a notification or an event inside the platform. There are two types of sinks available for the Data Flow Engine - for more detailed information refer to Sinks:

  • EventSink
    When the EventSink receives a message, it creates an event based on the configured parameters and forwards it to the Event Management Service.
  • EmailSink
    The EmailSink sends an e-mail when it receives a message from the stream.

Data Flow Server

The Data Flow Server (DFS) acts as a management service for the Data Flow Engine, it provides functionalities for listing and modifying streams and their components. To deploy a new DFE stream, a user has to call the API endpoints on the Data Flow Server.

Features

Use the Data Flow Engine API for realizing the following tasks:

  • Create or delete streams
  • Get the list of existing streams or find an exact stream by name
  • Deploy or undeploy a stream
  • Get the list of runtime apps or find an exact app by name
  • Get the specific app instances

The Data Flow Engine is able to:

  • Subscribe to time series data flows from MindSphere
  • Send events when a specific stream has been triggered
  • Filter messages containing irrelevant information

Limitations

  • MindSphere can store any number of stream definitions, but the number of concurrently deployed streams is restricted according to the purchased plan.
  • In region Europe 1, FIFO queues are not available and message ordering is not guaranteed by the Data Flow Server.
  • Owing to the asynchronous nature of messages, the processing time of a message is non-deterministic depending on system load.
  • The stream name must not be longer than 15 characters.
  • There is a delay of up to 30 minute delay between deleting an asset and deletion of dependent streams.

Example Scenarios

A client wants to monitor an engine in a factory. The engine is a critical part of the manufacturing process and has a history of overheating (Simple Scenario). Conclusions about the engine's condition can be drawn from how much it resonates during operation (Extended Scenario).

Simple Scenario

The client creates a Data Flow Engine stream with a TimeSeriesSource, a FilterProcessor and an EventSink. The TimeSeriesSource listens to a heat sensor placed on the engine. The FilterProcessor is set to only forward messages of temperatures higher than an allowed threshold. The EventSink chooses a built-in event, instead of customizing one. The sink stays dormant for a while, because the filter does not forward messages until the temperature threshold is reached. Once the threshold is reached, the client receives notifications about the engine overheating.

Extended Scenario

The same client wants to monitor the resonance of the engine. The client sets up the TimeSeriesSource and the EventSink as for monitoring the temperature, but inserts a FilterProcessor and a MessageSamplerFilter between them. The FilterProcessor is set to only forward messages when the resonance is higher than a threshold (but still lower than what is expected at the end of the lifecycle of the engine). The MessageSamplerFilter is set to only forward messages once per day so that the client gets a daily reminder, but can wait for a good offer from the engine manufacturers without being bombarded with notifications.

Any questions left?

Ask the community


Except where otherwise noted, content on this site is licensed under the MindSphere Development License Agreement.