Branch: refs/heads/main
Home: https://github.com/WebKit/WebKit
Commit: 1e34af6e2696cfee789e5dfc6322e10a30718b53
https://github.com/WebKit/WebKit/commit/1e34af6e2696cfee789e5dfc6322e10a30718b53
Author: Marcus Plutowski <[email protected]>
Date: 2025-12-15 (Mon, 15 Dec 2025)
Changed paths:
M Source/bmalloc/CMakeLists.txt
M Source/bmalloc/bmalloc.xcodeproj/project.pbxproj
M Source/bmalloc/libpas/libpas.xcodeproj/project.pbxproj
M Source/bmalloc/libpas/src/libpas/pas_allocation_mode.h
M Source/bmalloc/libpas/src/libpas/pas_bitfit_page_inlines.h
M Source/bmalloc/libpas/src/libpas/pas_config.h
M Source/bmalloc/libpas/src/libpas/pas_local_allocator_inlines.h
A Source/bmalloc/libpas/src/libpas/pas_stats.c
A Source/bmalloc/libpas/src/libpas/pas_stats.h
M Source/bmalloc/libpas/src/libpas/pas_try_allocate_common.h
M Source/bmalloc/libpas/src/libpas/pas_utils.h
Log Message:
-----------
[libpas] Introduce stat-counter system for libpas
https://bugs.webkit.org/show_bug.cgi?id=299311
rdar://160953463
Reviewed by Dan Hecht.
This patch adds a new stat-counter system to libpas, with the intention
of tracking allocator-internal statistics and performance data. A
motivating example is to measure how many allocations are made per
size-class for the purposes of better tuning libpas' configuration
parameters-- we've done this in the past, but it's always been ad-hoc
and thus inefficient / less trustworthy than a built-in and maintained
system would be.
By default, these are compiled out of the build, and even when built in
they include a runtime configuration option (PAS_STATS_ENABLE) which can
be set to turn some/all stat-counters on,
e.g. `PAS_STATS_ENABLE=counter_a,counter_b`, or `PAS_STATS_ENABLE=1` to
turn on all available stat counters.
By default, stats are logged to stdout (as json blobs), but setting
`PAS_STATS_LOG_FILE` will instead log them out to the specified file.
Since not all environments in which libpas runs have a clean 'exit' hook
(e.g. Safari tends to call terminate() in order to ensure it exits
quickly) this system instead logs all counters periodically based on
the total number of stat-count-events which have taken place across all
threads. I tried doing this in the scavenger but that doesn't work very
well since the scavenger runs infrequently enough that we end up
missing a lot of counter events near the end of a test.
A rough json schema with the current counters:
```
{
"pid": <INT>,
"time_ns": <INT>,
"per_stat_data": {
"malloc_info_allocations": {
"total_count": <INT>,
"count_by_heap_type": [<INT>, <INT>, <INT>],
"count_by_size": [<INT>, <INT>, <INT>, <INT>, ..., <INT>],
},
"malloc_info_bytes": {
<SAME AS FOR malloc_info_allocations>
}
}
}
```
This is meant to be extensible and flexible enough to handle different
kinds of stat counters; I would have preferred to do this with
C++, and did look into doing so, but the interface layer ends up being
slower and un-ergonomic so I ultimately went with C to better fit in
with the rest of libpas. This means that we use everyone's favorite
macro-for-each (PAS_STATS_FOR_EACH_COUNTER) to define new counters.
I wasn't able to find a clean way to incorporate the definition of the
PAS_RECORD_STAT_<statname> shims, so registering a new counter does
require changes in two places -- but they're close by so I think it's
acceptable.
Since this framework is intended for use inside of an allocator, it is
intended to have low overhead (both for enabled and disabled counters) and to
have minimal use of heap-allocated memory -- however, there is room for
improvement on both counts.
Re.: heap-allocated memory:
on the logging path we currently do rely on heap allocations to make it easier
for people to add new counters, as using a fixed-size static allocation per
counter would mean every counter would need to pre-compute the theoretical
maximum size of its json payload.
Normally the utility heap would be a good fit for this use-case, but to avoid
reentrancy we do not use libpas to allocate this memory -- even by going
through the system heap. Instead, we call `malloc` directly. These buffers are
cached so it shouldn't happen often but it would be better to be able to remove
that dependency.
That exception notwithstanding, pas_stats.h does't malloc anything anywhere
else,
so if we do figure out a way to improve this then we only need to change
`pas_stats_ensure_print_buffer` to match.
Re.: performance:
the current design is not bad but does introduce a lot of atomic traffic and
cross-core contention. Ideally, we would instead have a per-thread 'local stat
counter cache' which we would then periodically accumulate into a global
stat-counter object. Individual stat counters would have to be aware since they
need to implement their own accumulate functions.
Even better than thread-local would be if we had something like Linux' rseqs,
as we could then store this data per-CPU and avoid any migration whatsoever.
In both cases though, we would risk under-counting statistics unless we
implemented an analog of what pas-TLCs do where they iterate over other
threads' TLCs and collect data out. Doing so generically across all kinds of
stat counters seems like a challenge.
Canonical link: https://commits.webkit.org/304490@main
To unsubscribe from these emails, change your notification settings at
https://github.com/WebKit/WebKit/settings/notifications