#24218: Implement new metrics-web module for IPv6 relay statistics --------------------------------+-------------------------------- Reporter: karsten | Owner: metrics-team Type: enhancement | Status: needs_revision Priority: Medium | Milestone: Component: Metrics/Statistics | Version: Severity: Normal | Resolution: Keywords: | Actual Points: Parent ID: | Points: Reviewer: iwakeh | Sponsor: --------------------------------+-------------------------------- Changes (by iwakeh):
* status: needs_review => needs_revision Comment: Starting with the SQL as this is most important. I'll add a ticket later (unless I find an existing one) for (finally) writing up sql coding guide lines and adapting all existing scripts. From reading intensely through the older sql scripts (for the tech report) I know that the naming in init-servers-ipv6.sql is consistent with the old naming, but there are parts that are really hard to read and often only understandable when reading the Java code in addition. Here some topics that should go into the guide doc (b/c of the upcoming guide doc, I'm trying to be verbose): * Use names (not numbers) for group-by and order-by, which is fine in this script. * types and function names etc. shouldn't be used as column or field identifiers, for example `count`, `date`, `timestamp`, and `server` (the enum type defined in the script). For the server enum I'd suggest using `server_enum`, so the column (e.g. in table statuses) can be defined as `server server_enum NOT NULL,`, which makes immediately clear, what that column will contain. Below I add comments to the various table definitions. * Names should be self-explanatory as much as reasonably possible. Some names only make sense after reading the table's comment and for others only from reading the java code in addition. For example, `count` in table aggregated first of all shouldn't be used because of the function `count()` and secondly it's meaning can only be derived from the table's comment. Changing `count` to `server_count' would make the meaning obvious. Similarly, `advertised_bandwidth` to `advertised_bandwidth_bytes`. * For multi-line comments C-style `/* ...*/` could and maybe should be used and `-- ...` only for one-liners. (That's of minor importance, though.) * Indentation of code in functions for readability. Detailed comments (suggesting name changes, asking questions): * server_descriptors: {{{ #!sql CREATE TABLE server_descriptors ( digest BYTEA PRIMARY KEY, -- digest of what? Maybe: sha1_desc_digest advertised_bandwidth INTEGER, -- in bytes? Maybe adv_bandwidth_bytes? announced BOOLEAN NOT NULL, -- announced_ipv6 exiting BOOLEAN -- exit_flag or exit_relay (making obvious that this will be null for bridges) ); }}} * server enum: {{{ #!sql CREATE TYPE server_enum AS ENUM ('relay', 'bridge'); }}} * statuses {{{ #!sql CREATE TABLE statuses ( status_id SERIAL PRIMARY KEY, server server NOT NULL, -- rather: server server_enum NOT NULL, timestamp TIMESTAMP WITHOUT TIME ZONE NOT NULL, -- valid_after running INTEGER NOT NULL, -- running_count (otherwise I'd assume this to be a boolean) UNIQUE (server, timestamp) ); }}} * status_entries {{{ #!sql CREATE TABLE status_entries ( status_id INTEGER REFERENCES statuses (status_id) NOT NULL, digest BYTEA NOT NULL, -- as above in server_descriptors guard BOOLEAN, -- guard_relay exit BOOLEAN, -- exit_relay confirmed BOOLEAN, -- confirmed_ipv6_relay UNIQUE (status_id, digest) ); }}} * aggregated {{{ #!sql CREATE TABLE aggregated ( -- aggregated_flags_ipv6 status_id INTEGER REFERENCES statuses (status_id) NOT NULL, guard BOOLEAN, -- cf. above exit BOOLEAN, -- cf. above announced BOOLEAN, -- cf. above exiting BOOLEAN, -- cf. above confirmed BOOLEAN, -- cf. above count INTEGER NOT NULL, -- server_count or server_count_sum advertised_bandwidth BIGINT, -- adv_bandwidth_bytes or adv_bandwidth_bytes_sum CONSTRAINT aggregated_unique UNIQUE (status_id, guard, exit, announced, exiting, confirmed) ); }}} * function aggregate: Maybe, call it `aggregate_flags_ipv6`? For the aggregate function I got the following error running the script: {{{ psql --dbname=reviewipv6 -f modules/servers-ipv6/src/main/resources/init- servers-ipv6.sql CREATE TABLE CREATE TYPE CREATE TABLE CREATE TABLE CREATE TABLE psql:modules/servers-ipv6/src/main/resources/init-servers-ipv6.sql:75: ERROR: syntax error at or near "ON" LINE 9: ON CONFLICT ON CONSTRAINT aggregated_unique ^ CREATE VIEW }}} * servers_ipv6: Maybe: `ipv6servers`? And, the `date` column need to be renamed. Some java related questions (from only skimming the code): The package name as mentioned at the top. Why are the classes not declared public? Couldn't the 'ParsedNetwork' of the class names ParsedNetworkStatus and ParsedNetworkDescriptor be ommitted here? General naming: Module name should be same as java package (especially without dashes) and maybe change java package to `ipv6servers`? It seems more readable (to me) than `serversipv6`. I continue with an in-depth Java review when the SQL is settled. The structure looks fine regarding the upcoming re-factoring of metrics- web. -- Ticket URL: <https://trac.torproject.org/projects/tor/ticket/24218#comment:8> Tor Bug Tracker & Wiki <https://trac.torproject.org/> The Tor Project: anonymity online
_______________________________________________ tor-bugs mailing list tor-bugs@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs