Spring Cloud Data Flow provides native support for applications built with Spring Cloud Stream or Spring Cloud Task.
Given a stream definition such as
http | log, Data Flow will expect
http to be a stream source with an
output destination configured.
log, as a sink, must have an
input destination configured.
To run this pipeline without Data Flow, you must manually configure these applications with Spring Cloud Stream binding properties so that:
- The output of
httpis the same named destination as the input of
- The applications have the same message broker connection properties
- A unique consumer group is defined
Then deploy these applications individually to the platform of your choice.
Data Flow takes care of all this for you, and more!
For task applications, Data Flow initializes database schema for Spring Cloud Task and Spring Batch and provides the necessary JDBC connection properties when launching a task to allow the task to track its execution status. The Data Flow UI also provides views of this information.
The Data Flow model has subsequently been extended to support applications that don't necessarily conform to the standard conventions and must be manually configured. You can find more details in the Application DSL page and also in the Polyglot Recipe.
The Spring team provides and supports a selection of pre-packaged applications used to assemble various data integration and processing pipelines and to support production Spring Cloud Data Flow development, learning and experimentation.
In order to use an application in Data Flow, you must first register it. Stream Processing with Data Flow explains how to register an individual application. This can be one of the pre-packaged applications or a custom application.
The pre-packaged applications covers how to bulk register pre-packaged applications.
If you want to create a file to bulk register only applications that you use, including your own applications, the format of this file is:
<type> contains a supported application type (source, processor, sink, task, app),
<name> contains the registration name, and
<app-url> is the location of the executable artifact.
The URL may be any standard URL or may use one of the Data Flow
docker:// formats described in Stream Processing with Data Flow.
To optimize performance, you may package application metadata, which contains the names and descriptions of exposed application properties, in a separate smaller artifact. This is not required but, since the metadata is typically accessed before the application binary is needed, it allows more efficient use of network resources when using Data Flow. In this case, add a registration entry for the metadata artifact as
Here is a snippet of a bulk registration file used to register Maven artifacts:
sink.cassandra=maven://org.springframework.cloud.stream.app:cassandra-sink-rabbit:2.1.2.RELEASE sink.cassandra.metadata=maven://org.springframework.cloud.stream.app:cassandra-sink-rabbit:jar:metadata:2.1.2.RELEASE sink.s3=maven://org.springframework.cloud.stream.app:s3-sink-rabbit:2.1.2.RELEASE sink.s3.metadata=maven://org.springframework.cloud.stream.app:s3-sink-rabbit:jar:metadata:2.1.2.RELEASE
From the Data Flow Shell, you can bulk register the applications:
dataflow:>app import --uri file://path-to-my-app-registration.properties
You can also pass the --local option (which is true by default) to indicate whether the properties file location should be resolved within the shell process itself. If the location should be resolved from the Data Flow Server process, specify --local false.
When using either
app register or
app import, if an app is already registered with the provided name and type and version, it is not overridden by default. If you would like to override the pre-existing app uri or metadata-uri coordinates, then include the --force option.
Note, however, that, once downloaded, applications may be cached locally on the Data Flow server, based on the resource location. If the resource location does not change (even though the actual resource bytes may be different), then it is not re-downloaded. When using maven:// resources on the other hand, using a constant location may still circumvent caching (if using -SNAPSHOT versions).
Moreover, if a stream is already deployed and using some version of a registered app, then (forcibly) re-registering a different app has no effect until the stream is deployed again.