and ended up on the
wiki page for Hillary Clinton.
Full details here:
https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Why_People_Use_Search_Engines
—Trey
Trey Jones
Software Engineer, Discovery
Wikimedia Foundation
___
Wikimedia-search mailing l
That's very cool! Have you stress-tested it at all? Like, what happens if
you search 10 wikipedias at once? (Because you know I want to search 10
wikis at once. )
Trey Jones
Software Engineer, Discovery
Wikimedia Foundation
On Mon, Sep 21, 2015 at 11:15 AM, Erik Bernhardson <
e
o the
onboarding template, and someone added a link to the communication section
of the Discovery process page, so a lot of that is there, but not optimally
organized.
A quick look at the history shows that Mikhail got an older copy of the
onboarding page that didn't have that info. Hmm.
—Tr
n
> to exist uncorrected in the non-enwiki.
>
Perhaps a better approach to handling non-English queries is user-specified
> alternate languages.
More details:
https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Cross_Language_Wiki_Searching
—Trey
Trey Jones
Softw
On Thu, Sep 10, 2015 at 12:28 PM, Mikhail Popov
wrote:
> The moving average is computed over 17 days. That seems like an arbitrary
> choice and it largely is but I played around with that number and 17
> yielded what seemed to me the best results. I tried 7 and such and wasn't
> satisfied with ho
in a recent meeting. Nice, but still rough around the edges.
Very cool all around!
Trey Jones
Software Engineer, Discovery
Wikimedia Foundation
On Thu, Sep 10, 2015 at 11:56 AM, Mikhail Popov
wrote:
>
>
> Yinz are popular now!
>
> Cheers~
>
> --
> *Mikhail Popov*
Smth! It looks great!
Quick question—what's the period of the moving average? Is it a week, or
more? (A week makes sense, but it looks like more, based on how much data
drops out at the beginning.) If it's not a week, can I suggest a week? It
may further smooth out some of the daily bumps
:TJones_(WMF)/Notes/Language_Detection_Evaluation#ElasticSearch_language_detection_plugin.2C_with_spaces
[2]
https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Language_Detection_Evaluation#Always_.22English.22_detector
Trey Jones
Software Engineer, Discovery
Wikimedia Foundation
On Mon, Sep 7, 2015 at
vising, because it's some
combination of easier, better, faster, and good enough.
I'm open to suggestions. Next week I'll ask Dan & Erik about how much
effort to put into alternatives.
—Trey
Trey Jones
Software Engineer, Discovery
Wikimedia Foundation
On Fri, Sep 4, 2015 at 7:
ffect of a good language detector on zero results
rate (i.e., simulate sending queries to the right place and see how much of
a difference it makes).
Moderately pretty pictures included.
—Trey
Trey Jones
Software Engineer, Discovery
Wikimedia Foundation
___
W
t does take up time, though, and based only on data from the morning of
the deployment it may not give a representative preview. It's still fun to
peek, though. ;)
Trey Jones
Software Engineer, Discovery
Wikimedia Foundation
On Mon, Aug 31, 2015 at 2:05 PM, Mikhail Popov wrote:
> Hi
terms-query
And, for reference, ES has stop word lists for >30 languages:
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-stop-tokenfilter.html
Trey Jones
Software Engineer, Discovery
Wikimedia Foundation
On Fri, Aug 28, 2015 at 1:34 AM, David Causse wrote:
> Le 27
ort the results into a more
AND-ish order.
—Trey
[1] https://en.wikipedia.org/wiki/Stop_words
[2] https://code.google.com/p/stop-words/
[3]
https://web.archive.org/web/*/http://tonyb.sk/_my/ir/stop-words-collection-2014-02-24.zip
[4]
https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Surv
There were some DOI queries, but none of
the other usual suspects.
—Trey
Trey Jones
Software Engineer, Discovery
Wikimedia Foundation
___
Wikimedia-search mailing list
Wikimedia-search@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
And that's in line with the previous experiment. If you have a 32% zero
results rate, reducing it by 38% (32% * (1-.38)) gives 19.84%. So, allow a
little rounding error in the "32", "38" and "19", and this is right on the
money.
—Trey
P.S.: 2 + 2 = 5, for
ox components and the advantage this gives—again, especially in
comparison to the way they naively adapted the queries to Wikipedia search
terms. A commensurate level of effort put into the wiki searches would give
much much better results.
Still very interesting food for thought in terms
entury is a different
question, but there are some interesting things to think about here.
So, can anyone get me a copy of the full paper?
Thanks for the pointer, Tilman!
—Trey
Trey Jones
Software Engineer, Discovery
Wikimedia Foundation
> On 25 August 2015 at 10:54, Tilman Bayer wrot
A summary... for those who haven't been able to keep up with the voluminous
emails:
https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Survey_of_Zero-Results_Queries
—Trey
___
Wikimedia-search mailing list
Wikimedia-search@lists.wikimedia.org
http
rsions"), how long versions
live, how many we support concurrently, how to fallback from unsupported
versions, etc., etc. But Dan seems like he needs more hobbies, so I'm
throwing it out there.
—Trey
Trey Jones
Software Engineer, Discovery
Wikimedia Foundation
On Thu, Jul 30, 2015 at
e variations in declension.
- jawiki has lots of " film" searches.
- ruwiki has a few non-cyrillic searches
- itwiki has lots of queries that are multi-word phrases with underscores
instead of spaces
- eswiki and frwiki have a fair number of build up searches and searches in
Arabic, and f
Thanks for all the technical details! So much going on... so much to learn!
I didn't know/remember that suggester only works on titles and redirects.
Then, obviously, using just that would be great! That's gotta be a 98%+
reduction in text.
I like your reasonable process—it's quite reasonable!
Y
nsion for queries longer than x characters, or z tokens or
something. Doing OR expansion on hundreds of words—they often look like
excerpts from books or articles—is a waste of our computational resources.
—Trey
Trey Jones
Software Engineer, Discovery
Wikimedia Foundation
On Wed, Jul 29, 2015 at 6:45
22 matches
Mail list logo