Stream Distributed Tracing

This section describes how to trace the applications that were deployed as part of a Stream data pipeline.

The Data Flow distributed tracing architecture is designed around the Spring Cloud Sleuth library, to provide API for distributed tracing solutions that integrates with OpenZipkin Brave.

Spring Cloud Sleuth is able to trace your streaming pipeline messages and export the tracing information to an external system to analyze and visualize. Spring Cloud Sleuth supports OpenZipkin compatible systems such as Zipkin Server or Wavefront Distributed Tracing.

All Spring Cloud Stream Applications are pre-configured to support message distributed tracing and exporting to Zipkin Server and/or Wavefront Tracing. The tracing export is disabled by default! Use the management.metrics.export.wavefront.enabled=true and/or spring.zipkin.enabled=true to enable the tracing information export to Wavefront or Zipkin Server. Detailed instructions are provided below. Consult the spring sleuth properties for the Sleuth configuration properties.

The following image shows the general architecture of how streaming applications are monitored:

Stream Distributed Tracing Architecture

For streaming applications based on Spring Cloud Function prior version 3.1.x, the Spring Cloud Sleuth library leverages the Spring Integration for tracing instrumentation. Latter might produce unnecessary (noise) trace information for some Spring Integration internal components!

Starting with Spring Cloud Function 3.1+, the Spring Cloud Sleuth tracing instrumentation offers a better tailored tracing information for SCF based applications.

Instrument Custom Applications

To enable distributed tracing for your custom streaming application, you must add the following dependencies to your streaming application:

<dependencies>
	<dependency>
      <groupId>org.springframework.cloud</groupId>
      <artifactId>spring-cloud-starter-sleuth</artifactId>
  </dependency>
  <dependency>
      <groupId>org.springframework.boot</groupId>
      <artifactId>spring-boot-starter-actuator</artifactId>
  </dependency>
	<dependency>
      <groupId>org.springframework.cloud</groupId>
      <artifactId>spring-cloud-sleuth-zipkin</artifactId>
  </dependency>
	<dependency>
      <groupId>io.micrometer</groupId>
      <artifactId>micrometer-registry-wavefront</artifactId>
  </dependency>
</dependencies>

<dependencyManagement>
	<dependencies>
		<dependency>
			<groupId>org.springframework.cloud</groupId>
			<artifactId>spring-cloud-dependencies</artifactId>
			<version>${release.train.version}</version>
			<type>pom</type>
			<scope>import</scope>
		</dependency>
	</dependencies>
</dependencyManagement>

Also you must turn off the default tracing information exporting. Add the following properties to your application.properties:

management.metrics.export.wavefront.enabled=false
spring.zipkin.enabled=true

Visualize Distributed Tracing

You can also export the tracing information to an external system to analyze and visualize. Spring Cloud Sleuth supports OpenZipkin compatible systems such as Wavefront Distributed Tracing and Zipkin Server.

Wavefront Zipkin Server

Visualize with Wavefront

You can use the Wavefront to visualize the distributed tracing data collected from your deployed streaming pipelines. Wavefront offers different dashboards and browsers to view information on your applications and services and you can navigate from one to another to gather more information.

The Wavefront uses the application and service concepts to group the distributed traces. For the purpose of Dataflow, the Wavefront application is mapped to a streaming pipeline which the service is mapped to streaming applications. Therefore all deployed Spring Cloud Stream Application Starters are configured with the following two properties:

wavefront.application.name: The name of the stream that contains the applications that send the traces.
wavefront.application.service: The name or label of the application that reports the traces.

To find your streams traces you should navigate the Wavefront dashboard menu to Applications/Traces:

Then you can search for application names that match your deployed stream names. For example if you have deployed a stream pipeline named scdf-stream-traces you can select its traces collected in Wavefront like this:

Push the the Search button and the Wavefront dashboards will show similar to the following image:

SCDF Tracing Wavefront

Visualize with Zipkin Server

The Zipkin Server allows collection and visualization of distributed tracing data from your deployed streaming pipelines. Zipkin offers different dashboards and browsers to view information.

Also you can reach the Zipkin UI at http://your-zipkin-hostname:9411/zipkin. It defaults to (http://localhost:9411/zipkin).

If you have a trace ID in a log file, you can jump directly to it. Otherwise, you can query based on attributes such as service, operation name, tags and duration. Some interesting data will be summarized for you, such as the percentage of time spent in a service, and whether or not operations failed.

Stream Tracing Visualization - Zipkin Send

The Zipkin UI also presents a Dependency diagram showing how many traced requests went through each application. This can be helpful for identifying aggregate behavior including error paths or calls to deprecated services.

Stream Tracing Visualization - Zipkin Dependencies

Platform Installations

Following sections explain how to configure distributed tracing for different platform deployments of Spring Cloud Data Flow.

Local

This section describes how to view application distributed traces for streams that use Wavefront or Zipkin Server as the trace store. Wavefront is a cloud offering, but you still can deploy Data Flow locally and point it to a cloud-managed Wavefront system.

Wavefront Zipkin Server

Wavefront

To install Data Flow with Wavefront support, follow the Monitoring with Wavefront Docker Compose instructions. Doing so brings up Spring Cloud Data Flow, Skipper, and Apache Kafka.

The Wavefront is a SaaS offering, and you need to create a user account first. With that account, you can set the WAVEFRONT_KEY and WAVEFRONT_URI environment variables, as explained later in this document.

Once all the containers are running, deploy a simple stream that uses Kafka:

dataflow:>stream create scdf-stream-tracing --definition "time --fixed-delay=10 --time-unit=MILLISECONDS | filter --expression=payload.contains('3') | log" --deploy

Then follow the visualize with Wavefront instructions.

Zipkin Server

You would need latest Stream Application staters (2020.0.3-SNAPSHOT or newer). Use the STREAM_APPS_URI variable to set the right apps version. (TODO).

To enable message trace collection for the Zipkin Server Zipkin Server Docker Compose instructions. Doing so brings up Spring Cloud Data Flow, Skipper, Apache Kafka, Zipkin Server and enables the message tracing for it.

Once all the containers are running, you can access the Spring Cloud Data Flow Dashboard at http://localhost:9393/dashboard

To see the dashboard in action, deploy a simple stream that uses Kafka:

dataflow:>stream create stream2 --definition "time --fixed-delay=10 --time-unit=MILLISECONDS | filter --expression=payload.contains('3') | log" --deploy

Open the Zipkin UI at http://localhost:9411/zipkin and then follow the visualize with Zipkin Server instructions.

Kubernetes

This section describes how to view streams distributed traces on a cloud-managed Wavefront system.

Wavefront Zipkin Server

Wavefront

Wavefront is a SaaS offering. You need to create a user account first and obtain the API-KEY and WAVEFRONT-URI assigned to your account.

Follow the general Data Flow Kubernetes installation instructions.

Then add the following properties to your Spring Cloud Data Flow server configuration (for example, src/kubernetes/server/server-config.yaml) to enable the Wavefront integration:

management:
  metrics:
    export:
      wavefront:
        enabled: true
        api-token: <YOUR API-KEY>
        uri: <YOUR WAVEFRONT-URI>
        source: demo-scdf-source

Then follow the visualize with Wavefront instructions.

Zipkin Server

Assuming that the Zipkin Server is running at http://your-zipkin-server:9411 (it can be part of the Kubernetes cluster or and external service) can add the following environment variables to your Spring Cloud Data Flow deployment configuration:

env:
  - name: SPRING_CLOUD_DATAFLOW_APPLICATIONPROPERTIES_STREAM_SPRING_ZIPKIN_ENABLED
    value: true
  - name: SPRING_CLOUD_DATAFLOW_APPLICATIONPROPERTIES_STREAM_SPRING_ZIPKIN_BASEURL
    value: 'http://your-zipkin-server:9411'

Then follow the visualize with Zipkin Server instructions.

Cloud Foundry

This section describes how to view application distributed traces for streams that Wavefront store on Cloud Foundry.

Wavefront

Wavefront is a SaaS offering. You need to create a user account first and obtain the API-KEY and WAVEFRONT-URI assigned to your account.

To configure the Data Flow Server to send metrics data from stream applications to the Wavefront monitoring system, follow the manifest-based Wavefront configuration instructions.

TThen follow the visualize with Wavefront instructions.