Spring Cloud Data Flow provides native support for applications built with Spring Cloud Stream or Spring Cloud Task.
Given a stream definition such as
http | log, Data Flow expects
http to be a stream source with an
output destination configured.
log, as a sink, must have an
input destination configured.
To run this pipeline without Data Flow, you must manually configure these applications with Spring Cloud Stream binding properties so that:
- The output of
httpis the same named destination as the input of
- The applications have the same message broker connection properties.
- A unique consumer group is defined.
Then you can deploy these applications individually to the platform of your choice.
Data Flow takes care of all this, and more, for you.
For task applications, Data Flow initializes a database schema for Spring Cloud Task and Spring Batch and provides the necessary JDBC connection properties when launching a task to let the task track its execution status. The Data Flow UI also provides views of this information.
The Data Flow model has subsequently been extended to support applications that do not necessarily conform to the standard conventions and must be manually configured. You can find more details in the Application DSL page and also in the Polyglot Recipe page.
The Spring team provides and supports a selection of pre-packaged applications that are used to assemble various data integration and processing pipelines and to support production Spring Cloud Data Flow development, learning and experimentation.
To use an application in Data Flow, you must first register it. Stream Processing with Data Flow explains how to register an individual application. This can be one of the pre-packaged applications or a custom application.
The pre-packaged applications page covers how to bulk register pre-packaged applications.
If you want to create a file to bulk register only applications that you use, including your own applications, the format of this file is:
<type> contains a supported application type (source, processor, sink, task, app),
<name> contains the registration name, and
<app-url> is the location of the executable artifact.
The URL can be any standard URL or can use one of the Data Flow
docker:// formats described in Stream Processing with Data Flow.
To optimize performance, you may package application metadata, which contains the names and descriptions of exposed application properties, in a separate, companion artifact. This is not required, but, since the metadata is typically accessed before the application binary is needed, it allows more efficient use of network resources when using Data Flow. In this case, add a registration entry for the metadata artifact as
Here is a snippet of a bulk registration file used to register Maven artifacts:
sink.cassandra=maven://org.springframework.cloud.stream.app:cassandra-sink-rabbit:3.2.1 sink.cassandra.metadata=maven://org.springframework.cloud.stream.app:cassandra-sink-rabbit:jar:metadata:3.2.1 sink.s3=maven://org.springframework.cloud.stream.app:s3-sink-rabbit:3.2.1 sink.s3.metadata=maven://org.springframework.cloud.stream.app:s3-sink-rabbit:jar:metadata:3.2.1
From the Data Flow Shell, you can bulk register the applications:
dataflow:>app import --uri file://path-to-my-app-registration.properties
You can also pass the
--local option (which is
true by default) to indicate whether the properties file location should be resolved within the shell process itself. If the location should be resolved from the Data Flow Server process, specify
When using either
app register or
app import, if an app is already registered with the provided name and type and version, it is not overridden by default. If you would like to override the pre-existing app URI or
metadata-uri coordinates, include the
Note, however, that, once downloaded, applications can be cached locally on the Data Flow server, based on the resource location. If the resource location does not change (even though the actual resource bytes may be different), it is not re-downloaded. When using
maven:// resources, on the other hand, using a constant location can still circumvent caching (if using
Moreover, if a stream is already deployed and uses some version of a registered app, then (forcibly) re-registering a different app has no effect until the stream is deployed again.