Using Applications with Spring Cloud Data Flow
Spring Cloud Data Flow provides native support for applications built with Spring Cloud Stream or Spring Cloud Task.
Given a stream definition such as http | log
, Data Flow expects http
to be a stream source with an output
destination configured.
Likewise log
, as a sink, must have an input
destination configured.
To run this pipeline without Data Flow, you must manually configure these applications with Spring Cloud Stream binding properties so that:
- The output of
http
is the same named destination as the input oflog
. - The applications have the same message broker connection properties.
- A unique consumer group is defined.
Then you can deploy these applications individually to the platform of your choice.
Data Flow takes care of all this, and more, for you.
For task applications, Data Flow initializes a database schema for Spring Cloud Task and Spring Batch and provides the necessary JDBC connection properties when launching a task to let the task track its execution status. The Data Flow UI also provides views of this information.
The Data Flow model has subsequently been extended to support applications that do not necessarily conform to the standard conventions and must be manually configured. You can find more details in the Application DSL page and also in the Polyglot Recipe page.
Pre-packaged Applications
The Spring team provides and supports a selection of pre-packaged applications that are used to assemble various data integration and processing pipelines and to support production Spring Cloud Data Flow development, learning and experimentation.
Application Registration
To use an application in Data Flow, you must first register it. Stream Processing with Data Flow explains how to register an individual application. This can be one of the pre-packaged applications or a custom application.
Bulk Registration
The pre-packaged applications page covers how to bulk register pre-packaged applications.
If you want to create a file to bulk register only applications that you use, including your own applications, the format of this file is:
<type>.<name>=<app-url>
, where <type>
contains a supported application type (source, processor, sink, task, app), <name>
contains the registration name, and <app-url>
is the location of the executable artifact.
The URL can be any standard URL or can use one of the Data Flow maven://
or docker://
formats described in Stream Processing with Data Flow.
To optimize performance, you may package application metadata, which contains the names and descriptions of exposed application properties, in a separate, companion artifact. This is not required, but, since the metadata is typically accessed before the application binary is needed, it allows more efficient use of network resources when using Data Flow. In this case, add a registration entry for the metadata artifact as <type>.<name>.metadata=app-metadata-url
.
Here is a snippet of a bulk registration file used to register Maven artifacts:
sink.cassandra=maven://org.springframework.cloud.stream.app:cassandra-sink-rabbit:3.2.1
sink.cassandra.metadata=maven://org.springframework.cloud.stream.app:cassandra-sink-rabbit:jar:metadata:3.2.1
sink.s3=maven://org.springframework.cloud.stream.app:s3-sink-rabbit:3.2.1
sink.s3.metadata=maven://org.springframework.cloud.stream.app:s3-sink-rabbit:jar:metadata:3.2.1
From the Data Flow Shell, you can bulk register the applications:
dataflow:>app import --uri file://path-to-my-app-registration.properties
You can also pass the --local
option (which is true
by default) to indicate whether the properties file location should be resolved within the shell process itself. If the location should be resolved from the Data Flow Server process, specify --local false
.
When using either app register
or app import
, if an app is already registered with the provided name and type and version, it is not overridden by default. If you would like to override the pre-existing app URI or metadata-uri
coordinates, include the --force
option.
Note, however, that, once downloaded, applications can be cached locally on the Data Flow server, based on the resource location. If the resource location does not change (even though the actual resource bytes may be different), it is not re-downloaded. When using maven://
resources, on the other hand, using a constant location can still circumvent caching (if using -SNAPSHOT
versions).
Moreover, if a stream is already deployed and uses some version of a registered app, then (forcibly) re-registering a different app has no effect until the stream is deployed again.