Profiling for Chronos
Go to file
Giuliano Mega 95da0353d2
Merge pull request #1 from codex-storage/feat/new-children-semantics
Refine children semantics and cover with test cases
2024-03-07 14:39:19 -03:00
.github/workflows update nimble before build 2024-03-01 10:53:00 -03:00
chroprof fix API, add test for nested children 2024-03-07 14:34:13 -03:00
tests fix API, add test for nested children 2024-03-07 14:34:13 -03:00
.gitignore track children metrics correctly even when they finish after the parent 2024-03-05 19:12:48 -03:00
README.md Update README.md 2024-03-03 11:46:19 -03:00
chroprof.nim run nph 2024-03-01 13:44:06 -03:00
chroprof.nimble run nph 2024-03-01 13:44:06 -03:00
config.nims add metrics collector 2024-02-29 20:54:00 -03:00

README.md

chroprof - Profiling for Chronos

This repo contains a usable profiler for Chronos. For the time being, it requires a modified version of Chronos V4 which has profiling hooks enabled. Before attempting to use this, make sure you understand the limitations. Some of the rationale for the design and implementation of the profiler can be found here.

  1. Enabling profiling
  2. Looking at metrics
  3. Enabling profiling with Prometheus metrics
  4. Limitations

Enabling Profiling

Compile-time flag. Profiling requires the -d:chronosProfiling compile-time flag. If you do not pass it, importing chroprof will fail.

Enabling the profiler. The profiler must be enabled per event loop thread. To enable it, you need to call, from the thread that will run your event loop:

import chroprof

enableProfiling()

Looking at Metrics

At any time during execution, you can get a snapshot of the profiler metrics by calling getMetrics(). This will return a MetricsTotals object which is a table mapping FutureTypes to AggregateMetrics. You may then print, log, or do whatever you like with those, including export them to Prometheus.

getMetrics() will return the metrics for the event loop that is running (or ran) on the calling thread.

Enabling profiling with Prometheus metrics

You can export metrics on the top-k async procs that are occupying the event loop thread the most by enabling the profiler's nim-metrics collector:

import chroprof/collector

# Exports metrics for the 50 heaviest procs
enableProfilerMetrics(50)

with the help of Grafana, one can visualize and readily identify bottlenecks:

Grafana screenshot

the cumulative chart on the left shows that two procs (with the bottom one turning out to be a child of the top one) were dominating execution time at a certain point, whereas the one on the right shows a number of peaks and anomalies which, in the context of a bug, may help identify the cause.

Limitations

  • Nested waitFor calls are not supported (as in, it'll crash your app);
  • Prometheus metrics only work with refc because nim-metrics only works with refc;
  • the Prometheus metrics collector can only be enabled for one event loop; i.e., you cannot have multiple loops in different threads publishing metrics to Prometheus.