How I write Go (HTTP) Services (Part 2)

Source

The source code used throughout this series can be found here: https://github.com/anthoturc/token-service/.

So Far…

I covered how I set up Go web services and included:

  1. HTTP Server Config
  2. Graceful shutdown
  3. Managing Application Configuration

If you haven’t read that article yet, check it out here! This post will cover the hows and whys of telemetry and set up tracing in our HTTP web service.

Table Of Contents

Open Table Of Contents

Telemetry

If you Google “telemetry” you’ll get hits related to “measurement” or “sensors” within a system. For a long time, that would really refer to physical instruments attached to a car or plane. In software the idea is largely the same! In fact, you have probably run into prompts asking your permission to send anonymized telemetry/stats. E.g. JetBrains IDEs may ask for this so that the core devs can figure out what parts of the IDE are not as enjoyable as they could be. You can think of it as:

The data used to measure the health and performance of a live system.

Another way to put it: application monitoring! In our case, that is really just logs and metrics.

Why Telemetry

For the same reasons that Cars, Airplanes, and Satelites have sensors. It is so important to know what your system is doing at all times! If anything goes wrong with your car, your mechanic will probably use and OBD2 scanner to inspect issues in the car’s OS. Similarly, if paying customers are using your Web Service, then you should monitor the health of the service. If something does go wrong — and it will — you need to know how to prevent it from happening again. It is that simple.

However, telemetry data isn’t a silver bullet. I mean, imagine having to sift through thousands of logs or metrics to try and find the number of faults (think 5xx) that took place in the last 24 hours. If you have a low traffic service — like one or two requests an hour — then it might not be a big deal to manually inspect all your metrics (I still wouldn’t want to). Even if you use something like Grafana or Prometheus to visualize that data, the data and patterns in and of themselves aren’t all that useful unless you are reviewing them regularly.

Telemetry becomes powerful when you can do things like setup alarming on those metrics/logs. For example, you can have your service emit a Fault metric each time an unexpected error takes place. If enough of those (i.e., a threshold) happen in a one minute period, you can set an alarm for someone to go and inspect what is going on more closely and take action to get the system back to a stable state. You can also use telemetry to get an idea of the lifecycle of a request in your system. If you have 5-10 microservices talking to each other, then it gets hard to trace requests if you don’t instrument your system properly and use tooling to ingest that information and display it.

Setup Telemetry

For this guide, I am going to opt for the opentelemetry.io Go SDK.

To keep things simple, I will just focus on creating spans that will be helpful to trace API requests. Let’s set that up:

// telemetry.go
package main

import (
  "context"
	"fmt"

	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/attribute"
	"go.opentelemetry.io/otel/exporters/otlp/otlptrace"
	"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
	"go.opentelemetry.io/otel/sdk/resource"
	tracesdk "go.opentelemetry.io/otel/sdk/trace"
	semconv "go.opentelemetry.io/otel/semconv/v1.24.0"
	"go.opentelemetry.io/otel/trace"
)

const (
	service = "token-service" // This guide was originally written for building a minimal token service
)

func tracerProvider(envName string) (*tracesdk.TracerProvider, error) {
	exp, err := otlptrace.New(context.Background(), otlptracehttp.NewClient(otlptracehttp.WithInsecure()))
	if err != nil {
		return nil, fmt.Errorf("trace provider: %v", err)
	}

	tp := tracesdk.NewTracerProvider(
		tracesdk.WithBatcher(exp), // Batch the spans for improved performance
		tracesdk.WithResource(resource.NewWithAttributes( // The service is the trace provider's resource
			semconv.SchemaURL,
			semconv.ServiceNameKey.String(service),
			attribute.String("environment", envName),
			attribute.Int64("id", 1),
		)),
	)

	return tp, nil
}

Great! We have a TracerProvider. The TracerProvider is what gives our application the ability to retrieve Tracers. The opentelemetry docs mention that the TracerProvider should be accessed in a central place. main.go is a great place to put things like this.

Back in main.go we need to register this TracerProvider globally.

// main.go
package main

import (
  // -- snip --

  "go.opentelemetry.io/otel"
)

func main() {
  // -- snip --

  tp, err := tracerProvider(envName)
	if err != nil {
		log.Fatalf("failed to initialize trace provider: %v", err)
	}
	otel.SetTracerProvider(tp)

  // -- snip --
}

We can now access this trace provider globally. Let’s get that set up.

// telemetry.go
package main

// -- snip --

// NewSpan will create a span and context from the global trace provider.
func NewSpan(ctx context.Context, name string) (context.Context, trace.Span) {
  ctx, span := otel.Tracer("").Start(ctx, fmt.Sprintf("%s-%s", service, name))
	return ctx, span
}

Our application logic can now use NewSpan anytime we need it. Here is a short example:


import (
  "net/http"
)

type TokenService struct {
}

// CreateToken will register generate a new AuthToken.
func (ts *TokenService) CreateToken(rw http.ResponseWriter, r *http.Request) {
  _, span := NewSpan(r.Context(), "create-token")
  defer span.End()

  // -- snip --
}

In this case, I don’t have a use for the context, but if I was going to make several function calls (e.g. to a DB or another API) then I would use the context.

Example Usage

Now we have these spans that are generated when we make API calls but we don’t have a way, as humans, to consume them. We can do that pretty easily using Jaeger’s all in one tracing image. In this case, I will open a terminal and run:

docker run --name jaeger --rm --network host -e COLLECTOR_OTLP_ENABLED=true -p 16686:16686 -p 4317:4317 -p 4318:4318 jaegertracing/all-in-one:latest

Note: I am intentionally skipping what Jaeger is, its history, etc. For now you can think of it as a tool that we can use to visualize spans. Uber was kind enough to open source this tool! For more information, check out their docs: https://www.jaegertracing.io/docs/1.20/.

Now I can run my service and make an API call.

go run .
curl -X POST localhost:8080/api/token

A span will have been generated and I can use the Jaeger UI to view it!

As JSON:

{
  "key": "internal.span.format",
  "type": "string",
  "value": "otlp"
}
{
  "key": "environment",
  "type": "string",
  "value": "dev"
}

We get a lot of solid information right out of the box. E.g., how long the API call took and a few tags that provide contextual information about what happened in this request. Sure, there isn’t much now but we can easily add more information if we need to using attributes.

Conclusion

In this post, we covered:

  1. The how and why of telemetry
  2. Configuring OpenTelemetry to generate spans
  3. Using Jaeger to visualize the generated spans

The next few posts in this series will cover how to go further. E.g. Docker-izing the application and deploying it. Stay tuned!