Would it be possible to generate a log or statistics of searches on
Wikipedia using the Go button that did not immediately reach an article?
Properly anonymized of course. I think it would be useful for finding
missing articles and redirects to create. There would be a lot of crap of
course, but
On Thu, Jan 14, 2010 at 9:37 AM, Apoc 2400 apoc2...@gmail.com wrote:
Would it be possible to generate a log or statistics of searches on
Wikipedia using the Go button that did not immediately reach an article?
Properly anonymized of course. I think it would be useful for finding
missing
Magnus Manske wrote:
On Thu, Jan 14, 2010 at 9:37 AM, Apoc 2400 apoc2...@gmail.com wrote:
Would it be possible to generate a log or statistics of searches on
Wikipedia using the Go button that did not immediately reach an article?
Properly anonymized of course. I think it would be useful
Robert Stojnic wrote:
Magnus Manske wrote:
On Thu, Jan 14, 2010 at 9:37 AM, Apoc 2400 apoc2...@gmail.com wrote:
Would it be possible to generate a log or statistics of searches on
Wikipedia using the Go button that did not immediately reach an article?
Also, searches made using either button
On Thu, Jan 14, 2010 at 3:27 PM, Nikola Smolenski smole...@eunet.rs wrote:
Robert Stojnic wrote:
Magnus Manske wrote:
On Thu, Jan 14, 2010 at 9:37 AM, Apoc 2400 apoc2...@gmail.com wrote:
Would it be possible to generate a log or statistics of searches on
Wikipedia using the Go button that did
This sounds like a good idea, although we could probably argue about
cut-offs. However, since this needs to be done in-house (and not on
toolserver etc because I imagine we cannot distribute raw logs) I image
it is going to go very slow as there is no-one working on it or planning
to work on
On Thu, Jan 14, 2010 at 10:47 AM, Magnus Manske
magnusman...@googlemail.com wrote:
Suggestion :
* log search and SHA1 IP hash (anonymous!)
*Any* mapping of the IP is not anonymous. Please see the AOL search
results where unique IDs were connected between searches to disclose
information.
2010/1/14 Bryan Tong Minh bryan.tongm...@gmail.com:
On Thu, Jan 14, 2010 at 4:47 PM, Magnus Manske
magnusman...@googlemail.com wrote:
* log search and SHA1 IP hash (anonymous!)
There are only 2 billion unique addresses and they can all be found in
half an hour probably.
A count of search
On Thu, Jan 14, 2010 at 11:01 AM, David Gerard dger...@gmail.com wrote:
2010/1/14 Bryan Tong Minh bryan.tongm...@gmail.com:
On Thu, Jan 14, 2010 at 4:47 PM, Magnus Manske
magnusman...@googlemail.com wrote:
* log search and SHA1 IP hash (anonymous!)
There are only 2 billion unique addresses
On Thu, Jan 14, 2010 at 11:15 AM, Gregory Maxwell gmaxw...@gmail.com wrote:
Here is what I would suggest disclosing:
#start_datetime end_datetime hits search_string
2010-01-01-0:0:4 2010-01-13-23-59-50 39284 naked people
2010-01-01-0:0:4 2010-01-13-23-59-50 23950 hot grits
...
2010/1/14 Gregory Maxwell gmaxw...@gmail.com:
On Thu, Jan 14, 2010 at 11:15 AM, Gregory Maxwell gmaxw...@gmail.com wrote:
Here is what I would suggest disclosing:
#start_datetime end_datetime hits search_string
2010-01-01-0:0:4 2010-01-13-23-59-50 39284 naked people
2010-01-01-0:0:4
* search queries are logged in a standardized fashion (for grouping),
e.g. lowercase, single spaces, no leading/trailing spaces, special
chars converted to spaces, etc.
Wiktionary is case-sensitive and so case-folding there may not be
appropriate; I personally would be interested in seeing
On 01/14/2010 05:51 PM, Aryeh Gregor wrote:
On Thu, Jan 14, 2010 at 12:22 PM, Conrad Irwin
conrad.ir...@googlemail.com wrote:
Wiktionary is case-sensitive and so case-folding there may not be
appropriate; I personally would be interested in seeing these logs
before even the NFC normalizers
On Thu, Jan 14, 2010 at 12:22 PM, Conrad Irwin
conrad.ir...@googlemail.com wrote:
Wiktionary is case-sensitive and so case-folding there may not be
appropriate; I personally would be interested in seeing these logs
before even the NFC normalizers get to them (given a lack of any other
source
Such people would be able to deny searching for such terms, I don't see
this as posing any more problems than the history dumps. Thinking
further though, it would be possible to tie a search to an IP address or
User when a page is created with the search term (as it is highly likely
if there
Aryeh Gregor wrote:
The logs are taken from the Squids, long before MediaWiki touches
them, so they shouldn't be normalized at all.
Search isn't cached, so it may be easier to just log it at the backend.
I expect many people using things like please tell me how many people
live in China, as
Hi everyone,
As y'all may know already, I've been working for the foundation for
almost a month now. I thought FOSDEM would be a great opportunity for me
to meet part of the European community. I am also really interested in
learning more about the adoption of free and open source software in
On Thu, Jan 14, 2010 at 6:32 PM, Platonides platoni...@gmail.com wrote:
Sampled search logs are unlikely to reveal them though, since what they
are repeating are the non-keywords, not the full query.
Sampling is fine, but aggregated logs aren't likely to… thats the
primary reason for reporting
18 matches
Mail list logo