Profiling for Chronos

Go to file

Giuliano Mega 95da0353d2 Merge pull request #1 from codex-storage/feat/new-children-semantics Refine children semantics and cover with test cases		2024-03-07 14:39:19 -03:00
.github/workflows	update nimble before build	2024-03-01 10:53:00 -03:00
chroprof	fix API, add test for nested children	2024-03-07 14:34:13 -03:00
tests	fix API, add test for nested children	2024-03-07 14:34:13 -03:00
.gitignore	track children metrics correctly even when they finish after the parent	2024-03-05 19:12:48 -03:00
README.md	Update README.md	2024-03-03 11:46:19 -03:00
chroprof.nim	run nph	2024-03-01 13:44:06 -03:00
chroprof.nimble	run nph	2024-03-01 13:44:06 -03:00
config.nims	add metrics collector	2024-02-29 20:54:00 -03:00

README.md

chroprof - Profiling for Chronos

This repo contains a usable profiler for Chronos. For the time being, it requires a modified version of Chronos V4 which has profiling hooks enabled. Before attempting to use this, make sure you understand the limitations. Some of the rationale for the design and implementation of the profiler can be found here.

Enabling profiling
Looking at metrics
Enabling profiling with Prometheus metrics
Limitations

Enabling Profiling

Compile-time flag. Profiling requires the -d:chronosProfiling compile-time flag. If you do not pass it, importing chroprof will fail.

Enabling the profiler. The profiler must be enabled per event loop thread. To enable it, you need to call, from the thread that will run your event loop:

import chroprof

enableProfiling()

Looking at Metrics

At any time during execution, you can get a snapshot of the profiler metrics by calling getMetrics(). This will return a MetricsTotals object which is a table mapping FutureTypes to AggregateMetrics. You may then print, log, or do whatever you like with those, including export them to Prometheus.

getMetrics() will return the metrics for the event loop that is running (or ran) on the calling thread.

Enabling profiling with Prometheus metrics

You can export metrics on the top-k async procs that are occupying the event loop thread the most by enabling the profiler's nim-metrics collector:

import chroprof/collector

# Exports metrics for the 50 heaviest procs
enableProfilerMetrics(50)

with the help of Grafana, one can visualize and readily identify bottlenecks:

the cumulative chart on the left shows that two procs (with the bottom one turning out to be a child of the top one) were dominating execution time at a certain point, whereas the one on the right shows a number of peaks and anomalies which, in the context of a bug, may help identify the cause.

Limitations

Nested waitFor calls are not supported (as in, it'll crash your app);
Prometheus metrics only work with refc because nim-metrics only works with refc;
the Prometheus metrics collector can only be enabled for one event loop; i.e., you cannot have multiple loops in different threads publishing metrics to Prometheus.