[Wikimedia-search] Why People Use Search Engines Instead of Wikimedia Search

2015-09-22 Thread Trey Jones
and ended up on the wiki page for Hillary Clinton. Full details here: https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Why_People_Use_Search_Engines —Trey Trey Jones Software Engineer, Discovery Wikimedia Foundation ___ Wikimedia-search mailing l

Re: [Wikimedia-search] Asynchronously calling elasticsearch

2015-09-21 Thread Trey Jones
That's very cool! Have you stress-tested it at all? Like, what happens if you search 10 wikipedias at once? (Because you know I want to search 10 wikis at once. ) Trey Jones Software Engineer, Discovery Wikimedia Foundation On Mon, Sep 21, 2015 at 11:15 AM, Erik Bernhardson < e

Re: [Wikimedia-search] IRC norms

2015-09-15 Thread Trey Jones
o the onboarding template, and someone added a link to the communication section of the Discovery process page, so a lot of that is there, but not optimally organized. A quick look at the history shows that Mikhail got an older copy of the onboarding page that didn't have that info. Hmm. —Tr

[Wikimedia-search] Some Results of Cross-Languae Wiki Searching

2015-09-11 Thread Trey Jones
n > to exist uncorrected in the non-enwiki. > Perhaps a better approach to handling non-English queries is user-specified > alternate languages. More details: https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Cross_Language_Wiki_Searching —Trey Trey Jones Softw

Re: [Wikimedia-search] Smoothing in dashboard(s)

2015-09-10 Thread Trey Jones
On Thu, Sep 10, 2015 at 12:28 PM, Mikhail Popov wrote: > The moving average is computed over 17 days. That seems like an arbitrary > choice and it largely is but I played around with that number and 17 > yielded what seemed to me the best results. I tried 7 and such and wasn't > satisfied with ho

Re: [Wikimedia-search] Congratulations WDQS team

2015-09-10 Thread Trey Jones
in a recent meeting. Nice, but still rough around the edges. Very cool all around! Trey Jones Software Engineer, Discovery Wikimedia Foundation On Thu, Sep 10, 2015 at 11:56 AM, Mikhail Popov wrote: > > > Yinz are popular now! > > Cheers~ > > -- > *Mikhail Popov*

Re: [Wikimedia-search] Smoothing in dashboard(s)

2015-09-10 Thread Trey Jones
Smth! It looks great! Quick question—what's the period of the moving average? Is it a week, or more? (A week makes sense, but it looks like more, based on how much data drops out at the beginning.) If it's not a week, can I suggest a week? It may further smooth out some of the daily bumps

Re: [Wikimedia-search] Analysis of ElasticSearch language detection plugin against enwiki zero-results queries

2015-09-08 Thread Trey Jones
:TJones_(WMF)/Notes/Language_Detection_Evaluation#ElasticSearch_language_detection_plugin.2C_with_spaces [2] https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Language_Detection_Evaluation#Always_.22English.22_detector Trey Jones Software Engineer, Discovery Wikimedia Foundation On Mon, Sep 7, 2015 at

Re: [Wikimedia-search] Analysis of ElasticSearch language detection plugin against enwiki zero-results queries

2015-09-04 Thread Trey Jones
vising, because it's some combination of easier, better, faster, and good enough. I'm open to suggestions. Next week I'll ask Dan & Erik about how much effort to put into alternatives. —Trey Trey Jones Software Engineer, Discovery Wikimedia Foundation On Fri, Sep 4, 2015 at 7:

[Wikimedia-search] Analysis of ElasticSearch language detection plugin against enwiki zero-results queries

2015-09-04 Thread Trey Jones
ffect of a good language detector on zero results rate (i.e., simulate sending queries to the right place and see how much of a difference it makes). Moderately pretty pictures included. —Trey Trey Jones Software Engineer, Discovery Wikimedia Foundation ___ W

Re: [Wikimedia-search] On frequency of A/B tests and peeking at the data early

2015-09-01 Thread Trey Jones
t does take up time, though, and based only on data from the morning of the deployment it may not give a representative preview. It's still fun to peek, though. ;) Trey Jones Software Engineer, Discovery Wikimedia Foundation On Mon, Aug 31, 2015 at 2:05 PM, Mikhail Popov wrote: > Hi

Re: [Wikimedia-search] Completion suggestion API demo

2015-08-28 Thread Trey Jones
terms-query And, for reference, ES has stop word lists for >30 languages: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-stop-tokenfilter.html Trey Jones Software Engineer, Discovery Wikimedia Foundation On Fri, Aug 28, 2015 at 1:34 AM, David Causse wrote: > Le 27

Re: [Wikimedia-search] Completion suggestion API demo

2015-08-27 Thread Trey Jones
ort the results into a more AND-ish order. —Trey [1] https://en.wikipedia.org/wiki/Stop_words [2] https://code.google.com/p/stop-words/ [3] https://web.archive.org/web/*/http://tonyb.sk/_my/ir/stop-words-collection-2014-02-24.zip [4] https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Surv

[Wikimedia-search] Zero Results Rate—One Month Followup

2015-08-26 Thread Trey Jones
There were some DOI queries, but none of the other usual suspects. —Trey Trey Jones Software Engineer, Discovery Wikimedia Foundation ___ Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search

Re: [Wikimedia-search] Completion suggestion API demo

2015-08-26 Thread Trey Jones
And that's in line with the previous experiment. If you have a 32% zero results rate, reducing it by 38% (32% * (1-.38)) gives 19.84%. So, allow a little rounding error in the "32", "38" and "19", and this is right on the money. —Trey P.S.: 2 + 2 = 5, for

Re: [Wikimedia-search] Academic paper comparing Wikipedia's search engine with natural language question search engines

2015-08-26 Thread Trey Jones
ox components and the advantage this gives—again, especially in comparison to the way they naively adapted the queries to Wikipedia search terms. A commensurate level of effort put into the wiki searches would give much much better results. Still very interesting food for thought in terms

Re: [Wikimedia-search] Academic paper comparing Wikipedia's search engine with natural language question search engines

2015-08-25 Thread Trey Jones
entury is a different question, but there are some interesting things to think about here. So, can anyone get me a copy of the full paper? Thanks for the pointer, Tilman! —Trey Trey Jones Software Engineer, Discovery Wikimedia Foundation > On 25 August 2015 at 10:54, Tilman Bayer wrot

Re: [Wikimedia-search] 500K multilingual wikipedia zero-results queries

2015-07-31 Thread Trey Jones
A summary... for those who haven't been able to keep up with the voluminous emails: https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Survey_of_Zero-Results_Queries —Trey ​ ___ Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org http

Re: [Wikimedia-search] Enable or disable full text search query rewriting by default for API clients.

2015-07-31 Thread Trey Jones
rsions"), how long versions live, how many we support concurrently, how to fallback from unsupported versions, etc., etc. But Dan seems like he needs more hobbies, so I'm throwing it out there. —Trey Trey Jones Software Engineer, Discovery Wikimedia Foundation On Thu, Jul 30, 2015 at

Re: [Wikimedia-search] 500K multilingual wikipedia zero-results queries

2015-07-30 Thread Trey Jones
e variations in declension. - jawiki has lots of " film" searches. - ruwiki has a few non-cyrillic searches - itwiki has lots of queries that are multi-word phrases with underscores instead of spaces - eswiki and frwiki have a fair number of build up searches and searches in Arabic, and f

Re: [Wikimedia-search] testing the value of a reverse index

2015-07-30 Thread Trey Jones
Thanks for all the technical details! So much going on... so much to learn! I didn't know/remember that suggester only works on titles and redirects. Then, obviously, using just that would be great! That's gotta be a 98%+ reduction in text. I like your reasonable process—it's quite reasonable! Y

Re: [Wikimedia-search] 500K multilingual wikipedia zero-results queries

2015-07-29 Thread Trey Jones
nsion for queries longer than x characters, or z tokens or something. Doing OR expansion on hundreds of words—they often look like excerpts from books or articles—is a waste of our computational resources. —Trey Trey Jones Software Engineer, Discovery Wikimedia Foundation On Wed, Jul 29, 2015 at 6:45