Michael Dürig created OAK-5469:
----------------------------------

             Summary: TarMK: scaling the content
                 Key: OAK-5469
                 URL: https://issues.apache.org/jira/browse/OAK-5469
             Project: Jackrabbit Oak
          Issue Type: Epic
          Components: segment-tar
            Reporter: Michael Dürig
             Fix For: 1.8


Production experience has shown that big repositories are prone to thrashing:

{quote}
Monitoring showed as massive level of major page faults, load averages several 
times the number of cores, system cpu levels well above 50% and extreme levels 
of IO. As more IOPS was provisioned the instance consumed all available IOPS. 
The TechOps team reported many TB of read IO per hour and hardly any write IO.

Investigation revealed that the repository size was just larger than the 
available RAM on the machine. The instance was running in MMAPED mode and the 
IO was due to major page faults mapping in and out pages of memory. This was 
made worse by transparent huge page settings causing huge pages to be mapped 
proactively on major page faults. Compaction reduced the repository size to 
less than RAM. The TechOps team now monitor the total tar file size and dont 
let it exceed the RAM on the machine, scheduling compactions to keep within 
limits. Since the default to TarMK was to run memory mapped rather than on 
heap, the JVM had no visibility of the mayhem being caused at OS level.
{quote}

This epic is all about improving scalability of the TarMK wrt. the content. 
Below are some initial points to consider. Let's create issues and link them to 
this epic as we go.

* What kind of internal / external monitoring do we need to understand and 
optimally predict thrashing? Can we monitor the working set (active pages)? The 
number of segments in the segment cache might be a good starting point.
* (How) can we reproduce the thrashing (easily enough)? Can we scale it down 
(i.e. to an instance with littler RAM)?
* What is the impact of transparent huge pages (and switching it off)? How much 
do we suffer from read amplification? What would be the impact of not memory 
mapping but instead increasing the size of the segment buffer accordingly? Both 
approaches aim at having finer grained control over the data actually being 
loaded into RAM.
* What other OS level tweaks should / can we look at? 
* Can we reduce the working set by keeping it more compact? E.g. running 
GC/compaction, reducing read amplification (see above), improving 
de-duplication of values, storing values more efficiently (e.g. dates, and 
boolean), can we on the fly compress buffers (e.g. segments)?
* How do we testing with big repositories?
  * What is a big repository? (Potential target: 100 GB segment store - 500M 
nodes, TBC)
  * What to measure (indicators of size): size on disk (after compaction), 
number of JCR nodes, number of node records (reachable vs. waste)
  * How to measure?
    * {{oak-run debug}} (needs improvements for better scalability)
    * one-line tool to provide all the info?
  * How to obtain big repositories (generate or re-use existing)?
  * What to analyze / monitor / debug?
    * Possible limits: number of nodes (relative to RAM) for which trashing 
starts to occur, max. number of direct children, max. concurrent requests 
during online garbage collection.
    * Platform monitoring: 
      * basic: disc size, IO, CPU, memory
      * Asses impact of hardware upgrades on performance. E.g. what impact does 
doubling RAM/IO/CPU have on our test scenarios.
      * in depth: page faults, writes / reads per process, working set of 
nodes, commit statistics, incoming requests vs Oak operations, other hiccups
      * Tools: [Ganglia|http://ganglia.info/], 
[jHiccups|https://github.com/giltene/jHiccup], 
[AppDynamics|https://www.appdynamics.com/]





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to