Observability

DealDome's Go backend is instrumented with three complementary observability tools: OpenTelemetry for distributed tracing, Sentry for error tracking, and BetterStack for structured logging. Together they give the team full visibility into what the system is doing, what went wrong, and why.

Overview

Running AI agents, Shopify API calls, and database queries across multiple stores means a lot can go wrong — and when it does, you need to find the problem fast. DealDome's observability stack is built on three pillars:

Pillar	Tool	Purpose
Tracing	OpenTelemetry	Distributed traces across HTTP handlers, database queries, AI agent calls, and Shopify API requests
Error tracking	Sentry	Captures panics, unhandled exceptions, and errors with full stack traces and context
Logging	BetterStack	Structured JSON log aggregation, search, and alerting

Each layer covers a different angle. Traces show you the full journey of a request. Sentry tells you when something broke. Logs give you the raw detail to reconstruct exactly what happened.

Tracing with OpenTelemetry

OpenTelemetry (OTel) provides distributed tracing across the entire backend. Every incoming HTTP request, database query, AI agent invocation, and external API call gets its own span, and they are all linked together under a single trace.

What gets traced

HTTP requests — every Fiber handler is wrapped with OTel middleware that creates a root span with method, path, status code, and duration.
Database operations — pgx queries are traced with the SQL statement, parameters, and execution time.
AI agent invocations — calls to the Claude API include the agent name, model, token usage, and tool calls as span attributes.
Shopify API calls — outbound requests to the Shopify Admin API are traced with the endpoint, store domain, and response status.

Traces are exported to an OTel-compatible collector, so you can use any backend that speaks OTLP — Jaeger, Grafana Tempo, Honeycomb, or Axiom all work out of the box.

Tracing setup in Go

import (
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
    sdktrace "go.opentelemetry.io/otel/sdk/trace"
)

func initTracer(ctx context.Context) (*sdktrace.TracerProvider, error) {
    exporter, err := otlptracehttp.New(ctx,
        otlptracehttp.WithEndpoint(os.Getenv("OTEL_EXPORTER_ENDPOINT")),
    )
    if err != nil {
        return nil, err
    }

    tp := sdktrace.NewTracerProvider(
        sdktrace.WithBatcher(exporter),
        sdktrace.WithResource(resource.NewWithAttributes(
            semconv.SchemaURL,
            semconv.ServiceNameKey.String("dealdome-api"),
        )),
    )
    otel.SetTracerProvider(tp)
    return tp, nil
}

Tracing adds minimal overhead to each request. The OTel SDK batches spans and exports them asynchronously, so the hot path is just a few microseconds of span creation. You will not notice any performance impact in production.

Error tracking with Sentry

Sentry captures panics, errors, and unhandled exceptions across the backend. When something goes wrong, Sentry collects the full stack trace, request context, and any custom tags — then groups similar errors together so you are not drowning in duplicate alerts.

What Sentry captures

Panics — a recovery middleware catches any panic in HTTP handlers and reports it to Sentry before returning a 500 response.
Explicit errors — the codebase calls sentry.CaptureException(err) for errors that should not happen but are not fatal (e.g., a failed Shopify API call after retries).
Context enrichment — each error report includes the user ID, chat ID, agent name, and request path so you can trace it back to the exact interaction.

Sentry also integrates with the team's alerting workflow — critical errors trigger notifications so issues get addressed quickly.

Sentry integration

import "github.com/getsentry/sentry-go"

func initSentry() error {
    return sentry.Init(sentry.ClientOptions{
        Dsn:              os.Getenv("SENTRY_DSN"),
        Environment:      os.Getenv("APP_ENV"),
        TracesSampleRate: 0.2,
        EnableTracing:    true,
    })
}

Logging with BetterStack

BetterStack handles structured log aggregation and search. The Go backend emits JSON-formatted logs with consistent fields, and BetterStack ingests them in real time for querying, filtering, and alerting.

How logging works

All logs are structured as JSON with standard fields: level, msg, timestamp, and contextual fields like agent, chat_id, store, or request_id. This makes it easy to filter logs in BetterStack — for example, finding all errors from the Roos agent in the last hour.

Log levels follow the standard hierarchy:

debug — verbose output for local development.
info — normal operations (request handled, task completed).
warn — something unexpected but recoverable (rate limit hit, retry triggered).
error — something broke and needs attention.

Structured logging

import "log/slog"

func initLogger() *slog.Logger {
    return slog.New(slog.NewJSONHandler(os.Stdout, &slog.HandlerOptions{
        Level: slog.LevelInfo,
    }))
}

Configuration

All three observability services are configured through environment variables. Here is what you need to set:

Name
OTEL_EXPORTER_ENDPOINT
Type
string
Description
The OTLP-compatible endpoint for exporting traces (e.g., otel-collector.internal:4318 or a hosted provider URL).
Name
SENTRY_DSN
Type
string
Description
The Sentry DSN (Data Source Name) for your project. Found in your Sentry project settings under Client Keys.
Name
BETTERSTACK_TOKEN
Type
string
Description
The BetterStack source token for log ingestion. Created in the BetterStack dashboard under Sources.

Environment setup

OTEL_EXPORTER_ENDPOINT=otel-collector.internal:4318
SENTRY_DSN=https://abc123@o456.ingest.sentry.io/789
BETTERSTACK_TOKEN=bst_live_a1b2c3d4e5f6...

All three services initialise at startup. If any of the tokens are missing, the corresponding service is disabled gracefully — the backend still runs, you just lose that observability layer. This makes local development easier since you do not need all three services running to work on the codebase.