[Bug 72651] Spike: Assess feasibility and effort to add fields to webrequest logs
https://bugzilla.wikimedia.org/show_bug.cgi?id=72651 --- Comment #4 from christ...@quelltextlich.at --- (In reply to ewulczyn from comment #3) Another thing that came up in our research group meeting today is to add the browser session cookie. “browser session cookie” can mean two things: * The whole HTTP Cookie header: Adding the whole HTTP Cookie header would add more bytes to the log lines than I'd be comfortable with. (Just doing the back of the envelope computation. We're currently around 700 bytes per log line that goes through kafka. Adding the HTTP Cookies header would add somewhere around 200-500 bytes [1] on top of those 700 bytes for around 1/3 requests. So that would be a quite considerable increase.) * Really only the session identifier: (So for example on enwiki, only the value of “enwikiSession”. No centralnotice_* cookie values, no centralauth_* cookie values) That would be more harmless in terms of data size. But it needs to get extracted on the varnish machines themselves. So the same objection as in comment #1 applies. Regardless of which of the above interpretations you aimed for, the But we should not track people without their consent. So getting their consent is more important to me. from comment #1 still stands for me. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 72651] Spike: Assess feasibility and effort to add fields to webrequest logs
https://bugzilla.wikimedia.org/show_bug.cgi?id=72651 ewulc...@wikimedia.org changed: What|Removed |Added CC||ewulc...@wikimedia.org --- Comment #3 from ewulc...@wikimedia.org --- Another thing that came up in our research group meeting today is to add the browser session cookie. I added this to the etherpad. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 72651] Spike: Assess feasibility and effort to add fields to webrequest logs
https://bugzilla.wikimedia.org/show_bug.cgi?id=72651 christ...@quelltextlich.at changed: What|Removed |Added Status|NEW |RESOLVED Resolution|--- |FIXED --- Comment #1 from christ...@quelltextlich.at --- Not sure where to respond since it covers Trello, Etherpad, Email, IRC, and now bugzilla. Responding in bugzilla, since this is at least a public medium that cannot be changed. New fields / headers * page_id Several people want this. Even I want it :-) It would be helpful for so many things. Even for per page pageviews too. It seems the to-be-written XAnalytics extension would be the place to do it [1]. Feasible: Yes. Effort: Once the XAnalytics extension is there, ~4 man-days. (Only a few hours of coding) * unique id token ** Is it possible to move the unique app install id (currently appended by Wikimedia apps to the URI requested) to a dedicated key= value in x-analytics? It would be possible to do the rewriting on the varnishes. We try to do as little processing on the varnishes as possible, so I would not want to parse out things there. We could do in the ETL step, But ETL is not there yet, and we have some tasks to do before we can start implementing it. But we should not track people without their consent. So getting their consent is more important to me. Feasible: No, as the “user consent issue” is to big right now. Effort: Once the ETL step is there, ~2 man-days. (Only a few hours of coding) * logged in flag Since this information is (currently) sent only as Cookie (and not as plain HTTP header), it would also need assistance of for example the to-be-written XAnalytics extension. See above. (We could do the rewriting on varnish, but as we try to do as little as possible on the varnishes, this does not sound too thrilling) (Note that this information is not sent to bits or upload, so it would not allow to track media consumption per user.) Feasible: Yes. Effort: Once the XAnalytics extension is there, ~3 man-days. (Only a few hours of coding) --- (The etherpad also asks about format changes. But since this bug is about adding fields, I guess format changes are out of scope for this bug.) [1] https://gerrit.wikimedia.org/r/#/c/157841/ -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 72651] Spike: Assess feasibility and effort to add fields to webrequest logs
https://bugzilla.wikimedia.org/show_bug.cgi?id=72651 --- Comment #2 from christ...@quelltextlich.at --- The estimations from comment #1 assume that having those fields in HDFS (not udp2log) is sufficient. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l