#25100: Make CollecTor's webstats module use less RAM and CPU time -------------------------------+-------------------------------- Reporter: karsten | Owner: iwakeh Type: enhancement | Status: needs_revision Priority: High | Milestone: Component: Metrics/CollecTor | Version: Severity: Normal | Resolution: Keywords: | Actual Points: Parent ID: | Points: Reviewer: | Sponsor: -------------------------------+--------------------------------
Comment (by iwakeh): Replying to [comment:8 karsten]: > Replying to [comment:7 iwakeh]: > > True, so far we didn't trade memory for time, but got some improvements that could be picked easily even winning some time here. > > Keeping counts of different sanitized lines in memory could also help and might be only a small change; I'm looking into this next. > > Aha! That sounds very promising, too. Maybe even leave out the date part from sanitized lines and keep a bag of dates containing sanitized lines. Something like `Map<String, Bag<LocalDate>>` (yes, I know that there's no `Bag` type in Java; time to add Apache Commons Collections?). And later when we write sanitized logs, we simply put in the date. Depending on the target scenarios it might be also very fruitful and a reusable approach for other CollecTor modules, not no implement 'compression' (which the above is) by hand, but rather use some in-memory database that compresses the highly redundant data at hand. Reasoning: the above mentioned 8867 logs from weschniakowsky and meronense combined are just 60M when xz compressed and roughly 20G (plus/minus x) deflated. If the in-memory db achieves a compression about ten times less efficient than xz, still only 600M were needed. Plus we'd get some sql (like) query support in addition. If it works, we'd have a useful approach to recycle widely in metrics' code base. Thoughts? -- Ticket URL: <https://trac.torproject.org/projects/tor/ticket/25100#comment:9> Tor Bug Tracker & Wiki <https://trac.torproject.org/> The Tor Project: anonymity online
_______________________________________________ tor-bugs mailing list tor-bugs@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs