[Bug 72651] Spike: Assess feasibility and effort to add fields to webrequest logs

2014-11-04 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=72651

--- Comment #4 from christ...@quelltextlich.at ---
(In reply to ewulczyn from comment #3)
 Another thing that came up in our research group meeting today is to add the
 browser session cookie.

“browser session cookie” can mean two things:

* The whole HTTP Cookie header:

Adding the whole HTTP Cookie header would add more bytes to the log
lines than I'd be comfortable with.

(Just doing the back of the envelope computation. We're currently
around 700 bytes per log line that goes through kafka. Adding the HTTP
Cookies header would add somewhere around 200-500 bytes [1] on top of
those 700 bytes for around 1/3 requests. So that would be a quite
considerable increase.)

* Really only the session identifier:

(So for example on enwiki, only the value of “enwikiSession”. No
centralnotice_* cookie values, no centralauth_* cookie values)

That would be more harmless in terms of data size. But it needs to get
extracted on the varnish machines themselves. So the same objection as
in comment #1 applies.




Regardless of which of the above interpretations you aimed for, the

  But we should not track people without their consent. So getting their
  consent is more important to me.

from comment #1 still stands for me.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 72651] Spike: Assess feasibility and effort to add fields to webrequest logs

2014-10-30 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=72651

ewulc...@wikimedia.org changed:

   What|Removed |Added

 CC||ewulc...@wikimedia.org

--- Comment #3 from ewulc...@wikimedia.org ---
Another thing that came up in our research group meeting today is to add the
browser session cookie. I added this to the etherpad.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 72651] Spike: Assess feasibility and effort to add fields to webrequest logs

2014-10-29 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=72651

christ...@quelltextlich.at changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #1 from christ...@quelltextlich.at ---
Not sure where to respond since it covers Trello, Etherpad, Email,
IRC, and now bugzilla. Responding in bugzilla, since this is at least
a public medium that cannot be changed.

 New fields / headers
 * page_id

Several people want this. Even I want it :-)
It would be helpful for so many things. Even for per page pageviews too.
It seems the to-be-written XAnalytics extension would be the place to
do it [1].

Feasible: Yes.
Effort: Once the XAnalytics extension is there, ~4 man-days.
(Only a few hours of coding)

 * unique id token
 ** Is it possible to move the unique app install id (currently
appended by Wikimedia apps to the URI requested) to a dedicated
key= value in x-analytics?

It would be possible to do the rewriting on the varnishes.
We try to do as little processing on the varnishes as possible, so I
would not want to parse out things there.
We could do in the ETL step,
But ETL is not there yet, and we have some tasks to do before we can
start implementing it.

But we should not track people without their consent. So getting their
consent is more important to me.

Feasible: No, as the “user consent issue” is to big right now.
Effort: Once the ETL step is there, ~2 man-days.
(Only a few hours of coding)

 * logged in flag

Since this information is (currently) sent only as Cookie (and not
as plain HTTP header), it would also need assistance of for example
the to-be-written XAnalytics extension. See above.
(We could do the rewriting on varnish, but as we try to do as little
as possible on the varnishes, this does not sound too thrilling)

(Note that this information is not sent to bits or upload, so it would
not allow to track media consumption per user.)

Feasible: Yes.
Effort: Once the XAnalytics extension is there, ~3 man-days.
(Only a few hours of coding)

---

(The etherpad also asks about format changes. But since this bug is
about adding fields, I guess format changes are out of scope for this
bug.)


[1] https://gerrit.wikimedia.org/r/#/c/157841/

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 72651] Spike: Assess feasibility and effort to add fields to webrequest logs

2014-10-29 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=72651

--- Comment #2 from christ...@quelltextlich.at ---
The estimations from comment #1 assume that having those fields in HDFS
(not udp2log) is sufficient.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l