> Diagnostic command for zeroing unused parts of the heap
> 
> I propose to add a new diagnostic command `System.zero_unused_memory` which 
> zeros out all unused parts of the heap. The name of the command is 
> intentionally GC/heap agnostic because in the future it might be extended to 
> also zero unused parts of the Metaspace and/or CodeCache.
> 
> Currently `System.zero_unused_memory` triggers a full GC and afterwards zeros 
> unused parts of the heap. Zeroing can help snapshotting technologies like 
> [CRIU][1] or [Firecracker][2] to shrink the snapshot size of VMs/containers 
> with running JVM processes because pages which only contain zero bytes can be 
> easily removed from the image by making the image *sparse* (e.g. with 
> [`fallocate -p`][3]).
> 
> Notice that uncommitting unused heap parts in the JVM doesn't help in the 
> context of virtualization (e.g. KVM/Firecracker) because from the host 
> perspective they are still dirty and can't be easily removed from the 
> snapshot image because they usually contain some non-zero data. More details 
> can be found in my FOSDEM talk ["Zeroing and the semantic gap between host 
> and guest"][4].
> 
> Furthermore, removing pages which only contain zero bytes (i.e. "empty 
> pages") from a snapshot image not only decreases the image size but also 
> speeds up the restore process because empty pages don't have to be read from 
> the image file but will be populated by the kernel zero page first until they 
> are used for the first time. This also decreases the initial memory footprint 
> of a restored process. 
> 
> An additional argument for memory zeroing is security. By zeroing unused heap 
> parts, we can make sure that secrets contained in unreferenced Java objects 
> are deleted. Something that's currently impossibly to achieve from Java 
> because even if a Java program zeroes out arrays with sensitive data after 
> usage, it can never guarantee that the corresponding object hasn't already 
> been moved by the GC and an old, unreferenced copy of that data still exists 
> somewhere in the heap.
> 
> A prototype implementation for this proposal for Serial, Parallel, G1 and 
> Shenandoah GC is available in the linked pull request.
> 
> [1]: https://criu.org
> [2]: 
> https://github.com/firecracker-microvm/firecracker/blob/main/docs/snapshotting/snapshot-support.md
> [3]: https://man7.org/linux/man-pages/man1/fallocate.1.html
> [4]: 
> https://fosdem.org/2024/schedule/event/fosdem-2024-3454-zeroing-and-the-semantic-gap-between-host-and-guest/

Volker Simonis has updated the pull request incrementally with one additional 
commit since the last revision:

  Fix build error on MacOs (with clang, if we use 'override' for a virtual 
method we have to use it for all methods to avoid 
'-Winconsistent-missing-override')

-------------

Changes:
  - all: https://git.openjdk.org/jdk/pull/18521/files
  - new: https://git.openjdk.org/jdk/pull/18521/files/b06aa327..4264e53d

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=18521&range=01
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=18521&range=00-01

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/18521.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/18521/head:pull/18521

PR: https://git.openjdk.org/jdk/pull/18521

Reply via email to