#22983: add a descriptor interface and implementation for web-logs ---------------------------------+----------------------------------- Reporter: iwakeh | Owner: metrics-team Type: enhancement | Status: needs_revision Priority: Medium | Milestone: metrics-lib 2.1.0 Component: Metrics/metrics-lib | Version: Severity: Normal | Resolution: Keywords: | Actual Points: Parent ID: | Points: Reviewer: | Sponsor: ---------------------------------+----------------------------------- Changes (by karsten):
* status: needs_review => needs_revision Comment: Hmm, no, as of today I'm not convinced that this is a good idea. It might be and I'm not seeing it yet, or how it would work for other sanitized descriptors. But I'd rather merge something simple in the next couple of days and not rush this somewhat major design change. I agree that there's no actual sanitizing code in metrics-lib. But except for this new interface, there's also no other interface in metrics-lib where one can plug in sanitizing code. Avoiding duplicate code is good, but keeping interfaces small and intuitive is good, too. Let's open a ticket for that and do it in a separate step when we have more time to think about it. Here's some more feedback on the `LogDescriptor` interface: - The documentation of `TYPE` says that the name should include a string. But who should make sure that this is the case? The application developer? Can you rephrase that to say what is expected here? - Some of the method descriptions are written in 3rd person ("Returns ..."), some in 2nd person ("Return ..."). I believe that 3rd person is preferred, though we're not doing that consistently in the rest of metrics-lib. But we should at least try to do it consistently in one interface. - "yyymmdd" -> "yyyymmdd" in `getLogDate()` - I'm unclear what to expect as return value from `getLogType()`. What types exist? Do I get a class name, or what? - Maybe rename `getLogMillis()` to `getLogDateMillis()` to indicate that we're just returning the milliseconds of the date at 00:00:00 UTC, not the milliseconds of whatever time of the day the log was produced. - Please avoid abbreviations like "msec" and instead write "milliseconds since the epoch", for consistency. - The JavaDoc for `getRawDescriptorBytes()` should not copy the known compression types, but refer to `getCompressionType()` for the list. Right now, there's already an inconsistency between the list regarding `bz2`. - Are uncompressed logs supported? The documentation for `getRawDescriptorBytes()` doesn't indicate that, nor does `getCompressionType()` say what it would return in that case. - Shouldn't `getRawDescriptorBytes()` return the uncompressed bytes and a separate method `getCompressedBytes()` return the compressed bytes? Thinking of being consistent with other descriptors where `getRawDescriptorBytes()` returns uncompressed bytes, too. Not sure about this one. - "added in future" -> "added in the future" - We briefly discussed above to include the first, say, 100 unrecognized lines in `getUnrecognizedLines()`, but the documentation says it doesn't reply any. Why? Because it's not implemented yet? - As explained above, let's take out the `Sanitizer` subinterface and related methods. I'd like to wait for your response and a revised branch before doing another review of the remaining code. I'm not around most of Saturday but can take a look after that. Or on Monday, if you'd rather enjoy your weekend. :) Thanks! -- Ticket URL: <https://trac.torproject.org/projects/tor/ticket/22983#comment:12> Tor Bug Tracker & Wiki <https://trac.torproject.org/> The Tor Project: anonymity online _______________________________________________ tor-bugs mailing list tor-bugs@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs