Frequently Asked Questions

Application Starters

Where can I find the latest Spring Cloud Stream and Spring Cloud Task application starters?

The latest releases of the Stream and Task application starters are published to Maven Central and Docker Hub. You can find the latest release versions from the Spring Cloud Stream App Starters and Spring Cloud Task App Starters project sites.

Where can I find the documentation for the latest application releases?

Can I patch and extend the out-of-the-box applications?

Yes. You can find more details in the reference guide section on Patching Application Starters as well as documentation on Functional Composition.

Can I build a new application based on the same infrastructure as the out-of-the-box applications?

Yes. You can find more details in the Spring Cloud Stream App Starter's reference guide section FAQ on Spring Cloud Stream App Starters.

Where can I download the latest applications?

See the stream and task apps project sites.

Where are the Docker images hosted?

See stream and task apps in Docker Hub.

Data Flow

How are streaming applications and Spring Cloud Data Flow (SCDF) related?

Streaming applications are standalone, and they communicate with other applications through message brokers, such as RabbitMQ or Apache Kafka. They run independently and no runtime dependency between applications and SCDF exists. However, based on user actions, SCDF interacts with the platform runtime to update the currently running application, query the current status, or stop the application from running.

How are task and batch applications and Spring Cloud Data FLow (SCDF) related?

Though batch and task applications are standalone Spring Boot applications, to record the execution status of batch and task applications, you must connect both SCDF and the batch applications to the same database. The individual batch applications (deployed by SCDF), in turn, attempt to update their execution status to the shared database. The database, in turn, is used by SCDF to show the execution history and other details about the batch applications in SCDF's dashboard. You can also construct your batch and task applications to connect to the SCDF Database only for recording execution status but perform the work in another database.

What is the relationship of Composed Task Runner and SCDF?

Composed tasks delegate the running of the collection of tasks to a separate application, named the Composed Task Runner (CTR). The CTR orchestrates the launching of Tasks defined in the composed task graph. To use composed tasks, you must connect SCDF, CTR, and batch applications to a shared database. Only then can you track all of their execution history from SCDF’s dashboard.

Does SCDF use message broker?

No. The Data Flow and Skipper servers do not interact with the message broker. Streaming applications deployed by Data flow connect to the message broker to publish and consume messages.

What is the role of Skipper in Spring Cloud Data Flow (SCDF)?

SCDF delegates and relies on Skipper for the life cycle management of streaming applications. With Skipper, applications contained within the streaming data pipelines are versioned and can be updated (on a rolling basis) and rolled back to previous versions.

What tools are available to interact with Spring Cloud Data Flow (SCDF)?

You can use the following tools to interact with Spring Cloud Data Flow:

Why is Spring Cloud Data Flow (SCDF) not in Spring Initializr?

Initializr's goal is to provide a getting started experience for creating a Spring Boot Application. It is not the goal of Initializr to create a production-ready server application. We had tried this in the past, but we were not able to succeed because of the need for us to have very fine grained control over dependent libraries. As such, we ship the binaries directly instead. We expect the users to either use the binaries as-is or extend them by building SCDF locally from the source.

Can Spring Cloud Data Flow (SCDF) work with an Oracle database?

Yes. You can read more about the supported databases here..

When and where should I use Task properties versus arguments?

If the configuration for each task execution remains the same across all task launches, you can set the properties at the time in which you create the task definition. The following example shows how to do so:

task create myTaskDefinition --definition "timestamp --format='yyyy'"

If the configuration for each task execution changes for each task launch, you can use the arguments at task launch time, as the following example shows:

task launch myTaskDefinition "--server.port=8080"

When you use Spring Cloud Data Flow to orchestrate the launches of a task application that uses Spring Batch, you should use arguments to set the Job Parameters required for your batch job.

Remember: If your argument is a non-identifying parameter, suffix the argument with --.

How to configure remote Maven repositories?

You can set the maven properties such as local maven repository location, remote maven repositories, authentication credentials, and proxy server properties through command line properties when starting the Data Flow server. Alternatively, you can set the properties using SPRING_APPLICATION_JSON environment property for the Data Flow server.

The remote maven repositories need to be configured explicitly if the apps are resolved using maven repository, except for a local Data Flow server. The other Data Flow server implementations (that use maven resources for app artifacts resolution) have no default value for remote repositories. The local server has https://repo.spring.io/libs-snapshot as the default remote repository.

To pass the properties as commandline options, run the server with a command similar to the following:

java -jar <dataflow-server>.jar --maven.localRepository=mylocal
--maven.remote-repositories.repo1.url=https://repo1
--maven.remote-repositories.repo1.auth.username=repo1user
--maven.remote-repositories.repo1.auth.password=repo1pass
--maven.remote-repositories.repo2.url=https://repo2 --maven.proxy.host=proxyhost
--maven.proxy.port=9018 --maven.proxy.auth.username=proxyuser
--maven.proxy.auth.password=proxypass

You can also use the SPRING_APPLICATION_JSON environment property:

export SPRING_APPLICATION_JSON='{ "maven": { "local-repository": "local","remote-repositories": { "repo1": { "url": "https://repo1", "auth": { "username": "repo1user", "password": "repo1pass" } },
"repo2": { "url": "https://repo2" } }, "proxy": { "host": "proxyhost", "port": 9018, "auth": { "username": "proxyuser", "password": "proxypass" } } } }'

Here is the same content in nicely formatted JSON:

SPRING_APPLICATION_JSON='{
  "maven": {
    "local-repository": "local",
    "remote-repositories": {
      "repo1": {
        "url": "https://repo1",
        "auth": {
          "username": "repo1user",
          "password": "repo1pass"
        }
      },
      "repo2": {
        "url": "https://repo2"
      }
    },
    "proxy": {
      "host": "proxyhost",
      "port": 9018,
      "auth": {
        "username": "proxyuser",
        "password": "proxypass"
      }
    }
  }
}'

Depending on the Spring Cloud Data Flow server implementation, you may have to pass the environment properties by using the platform specific environment-setting capabilities. For instance, in Cloud Foundry, you would pass them as cf set-env SPRING_APPLICATION_JSON.

How do I enable DEBUG logs for platform deployments?

Spring Cloud Data Flow builds upon Spring Cloud Deployer SPI, and the platform-specific dataflow server uses the respective SPI implementations. Specifically, if we were to troubleshoot deployment specific issues, such as network errors, it would be useful to enable the DEBUG logs at the underlying deployer and the libraries used by it.

To enable DEBUG logs for the local-deployer, start the server as follows:

java -jar <dataflow-server>.jar --logging.level.org.springframework.cloud.deployer.spi.local=DEBUG

(where org.springframework.cloud.deployer.spi.local is the global package for everything local-deployer related.)

To enable DEBUG logs for the cloudfoundry-deployer, set the following environment variable and, after restaging the dataflow server, you can see more logs around request and response and see detailed stack traces for failures. The cloudfoundry deployer uses cf-java-client, so you must also enable DEBUG logs for this library.

cf set-env dataflow-server JAVA_OPTS '-Dlogging.level.cloudfoundry-client=DEBUG'
cf restage dataflow-server

(where cloudfoundry-client is the global package for everything cf-java-client related.)

To review Reactor logs, which are used by the cf-java-client, then the following commad would be helpful:

cf set-env dataflow-server JAVA_OPTS '-Dlogging.level.cloudfoundry-client=DEBUG -Dlogging.level.reactor.ipc.netty=DEBUG'
cf restage dataflow-server

(where reactor.ipc.netty is the global package for everything reactor-netty related.)

Similar to the local-deployer and cloudfoundry-deployer options as discussed above, there are equivalent settings available for Kubernetes. See the respective link:https://github.com/spring-cloud?utf8=%E2%9C%93&q=spring-cloud-deployer[SPI implementations] for more detail about the packages to configure for logging.

How do I enable DEBUG logs for application deployments?

The streaming applications in Spring Cloud Data Flow are Spring Cloud Stream applications, which are in turn based on Spring Boot. They can be independently setup with logging configurations.

For instance, if you must troubleshoot the header and payload specifics that are being passed around source, processor, and sink channels, you should deploy the stream with the following options:

dataflow:>stream create foo --definition "http --logging.level.org.springframework.integration=DEBUG | transform --logging.level.org.springframework.integration=DEBUG | log --logging.level.org.springframework.integration=DEBUG" --deploy

(where org.springframework.integration is the global package for everything Spring Integration related, which is responsible for messaging channels.)

These properties can also be specified with deployment properties when deploying the stream, as follows:

dataflow:>stream deploy foo --properties "app.*.logging.level.org.springframework.integration=DEBUG"

How do I remote debug deployed applications?

The Data Flow local server lets you debug the deployed applications. This is accomplished by enabling the remote debugging feature of the JVM through deployment properties, as shown in the following example:

stream deploy --name mystream --properties "deployer.fooApp.local.debugPort=9999"

The preceding example starts the fooApp application in debug mode, allowing a remote debugger to be attached on port 9999. By default, the application starts in a ’suspend’ mode and waits for the remote debug session to be attached (started). Otherwise, you can provide an additional debugSuspend property with value n.

Also, when there is more then one instance of the application, the debug port for each instance is the value of debugPort + instanceId.

Unlike other properties you must NOT use a wildcard for the application name, since each application must use a unique debug port.

Is it possible to aggregate Local deployments into a single log?

Given that each application is a separate process that maintains its own set of logs, accessing individual logs could be a bit inconvenient, especially in the early stages of development, when logs are accessed more often. Since it is also a common pattern to rely on a local SCDF Server that deploys each application as a local JVM process, you can redirect the stdout and stdin from the deployed applications to the parent process. Thus, with a local SCDF Server, the application logs appear in the logs of the running local SCDF Server.

Typically when you deploy the stream, you see something resembling the following in the server logs:

017-06-28 09:50:16.372  INFO 41161 --- [nio-9393-exec-7] o.s.c.d.spi.local.LocalAppDeployer       : Deploying app with deploymentId mystream.myapp instance 0.
   Logs will be in /var/folders/l2/63gcnd9d7g5dxxpjbgr0trpw0000gn/T/spring-cloud-dataflow-5939494818997196225/mystream-1498661416369/mystream.myapp

However, by setting local.inheritLogging=true as a deployment property, you can see the following:

017-06-28 09:50:16.372  INFO 41161 --- [nio-9393-exec-7] o.s.c.d.spi.local.LocalAppDeployer       : Deploying app with deploymentId mystream.myapp instance 0.
   Logs will be inherited.

After that, the application logs appear alongside the server logs, as shown in the following example:

stream deploy --name mystream --properties "deployer.*.local.inheritLogging=true”

The preceding stream definition enables log redirection for each application in the stream. The following stream definition enables log redirection for only the application named ‘my app’.

stream deploy --name mystream --properties "deployer.myapp.local.inheritLogging=true”

Likewise, you can use the same option to redirect and aggregate all logs for the launched Task applications as well. The property is the same for Tasks, too.

NOTE: Log redirect is only supported with local-deployer.

Streaming

Can I connect to existing RabbitMQ queues?

Follow the steps in the reference guide to connect with existing RabbitMQ queues.

What is the Apache Kafka versus Spring Cloud Stream compatibility?

See the compatibility matrix in the Wiki.

Batch

What is a Composed Task Runner (CTR)?

The Composed Tasks feature in Spring Cloud Data Flow (SCDF) delegates the running of the composed task to a separate application, named the Composed Task Runner (CTR). The CTR orchestrates the launching of tasks (which are defined in the composed task graph). The Composed Task Runner (CTR) parses the graph DSL and, for each node in the graph, runs a RESTful call against a specified Spring Cloud Data Flow instance to launch the associated task definition. For each task definition that is run, the Composed Task Runner polls the database to verify that the task completed. Once a task is complete, the Composed Task Runner either continues to the next task in the graph or fails based on how the DSL specified that the sequence of tasks should be run.

How do I restart a Spring Batch Job from the beginning rather than from where it failed?

In short, you need to create a new Job Instance for the new task launch. You can do so by changing an existing identifying job parameter or by adding a new identifying job parameter on the next task launch. The following example shows a typical task launch:

task launch myBatchApp --arguments="team=yankees"

Assuming that the preceding task launch fails, we can launch the task again, and a new job instance is created if we change the value of the team parameter, as the following example shows:

task launch myBatchApp --arguments="team=cubs"

However, the preferred way is to write your task or batch application such that it can handle being restarted with a new job instance. One way to do this is to set a JobParamsIncrementer for your batch job, as discussed in the Spring Batch reference guide.