Spring Cloud Sleuth: A Simple Approach to Adding Distributed Tracing

So you and your team just migrated from a monolith application to a set of microservices built with Spring Boot and running on a distributed platform. With the old platform, maybe you were able to track incoming requests and each step of the process flow with a log file on that single server, but how do you manage that now with a distributed platform? Distributed tracing via Spring Cloud Sleuth is here to help!

Distributed tracing tags and monitors a variety of activities throughout a system (across multiple services) using a unique identifier. The power of distributed tracing is that this unique identifier is carried into each of the established services, and can be done through a variety of paths (e.g. RESTful requests, messaging protocols, etc.). If your service stack is built on Spring Boot, Spring Cloud Sleuth (https://spring.io/projects/spring-cloud-sleuth) is a great way to add distributed tracing into your application suite with very little additional configuration needed. Spring Cloud Sleuth works with a wide range of libraries and can also report back to a Zipkin instance (https://zipkin.io/), if so configured.

Implementation & Usage

To start, we need to add the following dependencies to the application’s pom.xml file (where release.train.version is the latest stable Spring Cloud release train version, e.g. 2021.0.1):

<dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.springframework.cloud</groupId>
            <artifactId>spring-cloud-dependencies</artifactId>
            <version>${release.train.version}</version>
            <type>pom</type>
            <scope>import</scope>
        </dependency>
    </dependencies>
</dependencyManagement>
<dependencies>
    <dependency>
        <groupId>org.springframework.cloud</groupId>
        <artifactId>spring-cloud-starter-sleuth</artifactId>
    </dependency>
</dependencies>

For most services, adding the Spring Cloud Sleuth dependency is enough to get distributed tracing up and running. For requests into the service, you’ll now see 2 new unique IDs added into log statements, appearing after the log level component:

2022-03-16 11:40:00.400  INFO [TestApplication,d8dff210398819ab,d8dff210398819ab] 74330 --- [nio-8080-exec-1] c.e.demo.controller.TestController       : Received request for work

These IDs are the “trace” and “span” IDs, both of which are unique and auto-generated by Spring Cloud Sleuth if not provided by some upstream call. The trace ID is the parent ID (representing the entire request chain across all services), and the span ID is the unique ID for the current span (e.g. the current service, or a key logic block in a given service). With Spring Cloud Sleuth across multiple services, the trace ID is automatically included as a header in outgoing requests (in both HTTP and messaging protocols), so that trace ID can be carried into all downstream services. The default header propagation pattern is B3 (https://github.com/openzipkin/b3-propagation), so you’ll notice headers like X-B3-TraceId and X-B3-SpanId in HTTP requests, or b3 headers for messaging protocols.

With these IDs, you’ll be able to search on a centralized logging platform (e.g. Elastic/LogStash/Kibana stack, or AWS CloudWatch) to see all log statements across all services relevant to that specific request. These IDs can also be used for performance monitoring and debugging, as you can see the time it took to process the entire request, and the time it took to process each step of the request. Spans can be also be manually ended via simple Java code or Spring annotations, which is useful when wanting to target a specific block for a given span segment.

For performance monitoring, let’s say that you have a service that takes in an HTTP request, makes a database query, and then returns a response. With just the basic code, each piece of the process flow will be logged with the same trace & span ID:

2022-03-16 11:40:00.400  INFO [TestApplication,d8dff210398819ab,d8dff210398819ab] 74330 --- [nio-8080-exec-1] c.e.demo.controller.TestController       : Received request
2022-03-16 11:40:00.401  INFO [TestApplication,d8dff210398819ab,d8dff210398819ab] 74330 --- [nio-8080-exec-1] c.e.demo.controller.TestController       : Calling JPA method
2022-03-16 11:40:00.600  INFO [TestApplication,d8dff210398819ab,d8dff210398819ab] 74330 --- [nio-8080-exec-1] c.e.demo.dao.DataService       : Starting DB query
2022-03-16 11:40:02.800  INFO [TestApplication,d8dff210398819ab,d8dff210398819ab] 74330 --- [nio-8080-exec-1] c.e.demo.dao.DataService       : Ended DB query
2022-03-16 11:40:02.900  INFO [TestApplication,d8dff210398819ab,d8dff210398819ab] 74330 --- [nio-8080-exec-1] c.e.demo.controller.TestController       : Finished JPA method
2022-03-16 11:40:03.100  INFO [TestApplication,d8dff210398819ab,d8dff210398819ab] 74330 --- [nio-8080-exec-1] c.e.demo.controller.TestController       : Completed HTTP request

However, if we update that DAO object to include the “@NewSpan” annotation on the public method, we’ll get separate span IDs for that individual piece of the flow:

2022-03-16 11:40:00.400  INFO [TestApplication,d8dff210398819ab,d8dff210398819ab] 74330 --- [nio-8080-exec-1] c.e.demo.controller.TestController       : Received request
2022-03-16 11:40:00.401  INFO [TestApplication,d8dff210398819ab,d8dff210398819ab] 74330 --- [nio-8080-exec-1] c.e.demo.controller.TestController       : Calling JPA method
2022-03-16 11:40:00.600  INFO [TestApplication,d8dff210398819ab,c5d3e679e3ec9933] 74330 --- [nio-8080-exec-1] c.e.demo.dao.DataService       : Starting DB query
2022-03-16 11:40:02.800  INFO [TestApplication,d8dff210398819ab,c5d3e679e3ec9933] 74330 --- [nio-8080-exec-1] c.e.demo.dao.DataService       : Ended DB query
2022-03-16 11:40:02.900  INFO [TestApplication,d8dff210398819ab,d8dff210398819ab] 74330 --- [nio-8080-exec-1] c.e.demo.controller.TestController       : Finished JPA method
2022-03-16 11:40:03.100  INFO [TestApplication,d8dff210398819ab,d8dff210398819ab] 74330 --- [nio-8080-exec-1] c.e.demo.controller.TestController       : Completed HTTP request

With the separate span IDs, it can be easier to filter to just the log statements that are relevant for that piece of the flow (which can be incredibly useful when there are many log statements, or if that particular span covers a sizable number of classes).

Common Pitfalls

In my experience, there have only been a couple of scenarios where Spring Cloud Sleuth didn’t work as expected:

If the application is using a custom log pattern, Spring Cloud Sleuth won’t be able to automatically apply the IDs into the log statements. The ID fields should either be added to the pattern, or switched to the standard pattern.

Spring Cloud Sleuth can only instrument on Spring beans. If you end up creating a client via the “new” command, Spring Cloud Sleuth can’t hook into that library and the trace context will be lost.

Good implementation:

@Autowired 
private RestTemplate
restTemplate;
public String makeARequest() {
    return restTemplate
      .getForObject("http://example.com", String.class);
}

Bad implementation:

public String makeARequest() {         
  return new RestTemplate()
    .getForObject("http://example.com", String.class);     
}

Roundup

This article just scratches the surface of what’s available within Spring Cloud Sleuth. However, you can see that with minimal code changes, your application suite can quickly utilize the power of distributed tracing via Spring Cloud Sleuth, where tracing and debugging via logs becomes much easier in a distributed environment.