Overview
Syllabus
Intro
Metrics deep-dive: Outline 1. Goals & timeline 2. Data & semantic models 3. Export features & costs
Goals: Who is this for? Platform engineer: install SDKS and collectors, configure resources, cloud.region* metrics receivers, export pipelines Software engineer: use metrics APIs, write instrumentation pkgs End user observe and monitor!
Open Telemetry mandates a strong separation of the API, the SDK, and exporters Decoupling these avoids vendor "lock-in".
Goals: OpenCensus requirements are met SDK has programmable processing interfaces: . configurable aggregation • high performance
measurements are sum totals "cumulative" reporting symbol: Σ
Resources: attributes describing the process or entity producing metric and trace data, distinct from per-timeseries attributes (a.k.a. "labels", "tags"), an OpenTelemetry-wide concept
Data model: Support both and A Temporality can be Cumulative or Delta Instrument Temporality: describes whether a metric instrument input at the API is a change or a total Aggregation Temporality: describes whether output sums and counts are reset on collection.
Data model: Why temporality? Instrument temporality? This choice makes the application stateless with respect to the SDK. Counter instruments capture transactional changes (A), while SumObserver instruments capture current totals (E) Aggregation temporality? Offers a trade between memory and reliability costs when configuring an export pipeline.
In both Prometheus and Statsd APIs, the Counter instruments capture changes in a sum (A) In OpenTelemetry: Counter.Add(): increments are non-negative (monotonic) UpDown Counter.Add(): positive and negative increments.
Data model: Individual measurements In both Prometheus and Statsd APIs, the Gauge and Histogram instruments capture individual measurements, have the same semantic type. OpenTelemetry creates a new distinction: ValueRecorder. Record(): application calls the SDK ValueObserver.Observe(): SDK calls the application.
Data model: Why restrict Gauge? Prometheus and Stats Gauges are sometimes used to capture cumulative sums, sometimes individual values, i.e., different semantic kinds of data. In Open Telemetry: SumObserver. Observe(): observe a monotonic sum UpDownSumObserver.Observe(): observer a sum
Export: Variable-boundary histograms The histogram data point type is a work-in-progress Choose one: DDSketch, Circlhist, HDR histogram...., just use consistent parameters. with sub-linear bucket
Export: Configurable cardinality control Metric attributes (a.k.a. labels, tags) can be expensive and valuable, what controls do we have? • Erase high-cardinality attributes in the export pipeline (in the Collector, in the SDK) through aggregation • Stateless export pipeline sets the aggregation temporality equal to the instrument temporality, shifting memory requirements to an agent or the downstream service.
Export: In-process stateless cumulative exporter An on-host collector agent gathers host metrics itself The in-process export pipeline is stateless, the collector must keep long-term state.
Export: In-process stateless Prometheus exporter Prometheus Remote Write exporter configured The in-process export pipeline is stateless, the collector must keep long-term state.
Export: Push/Pull, Stateless/Cumulative compared Pull/Cumulative (default): Push/Stateless (optional): • Export to Prometheus • Similar to Trace export • Easlier reliability (average dropped data) idempotency tokens • High-cardinality costs • High-cardinality costs long-term memory
Export: Collector export pipeline on Kubernetes Open Telemetry Collector with Host and Kubernetes metrics receivers Configure with OpenMetrics, Statsd, and OTLP receivers.
Taught by
CNCF [Cloud Native Computing Foundation]