Re: [Wikimedia-search] Completion suggestion API demo
Hi! I uploaded a small HTML page to compare both approaches: http://cirrus-browser-bot.wmflabs.org/suggest.html This is very cool! From my very short testing, seems that it works pretty nicely. -- Stas Malyshev smalys...@wikimedia.org ___ Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Re: [Wikimedia-search] Completion suggestion API demo
I ran some zero result rate tests against this API today, it is a huge reduction in the zero result rate over the existing prefix search. from 32% to 19% (on a 1% sample of prefix searches for an entire day) On Wed, Aug 26, 2015 at 12:34 PM, Stas Malyshev smalys...@wikimedia.org wrote: Hi! I uploaded a small HTML page to compare both approaches: http://cirrus-browser-bot.wmflabs.org/suggest.html This is very cool! From my very short testing, seems that it works pretty nicely. -- Stas Malyshev smalys...@wikimedia.org ___ Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search ___ Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
[Wikimedia-search] Final results of the first A/B test
Hey all, Several weeks ago we ran an A/B test to try and decrease the number of searches on Wikipedia returning zero results. This consisted of a small config change that reduced the confidence needed for our systems to provide search results, along with a change to the smoothing algorithm used to bump the quality of the results now provided. Our intent was not only to reduce the zero results rate but also to prototype the actual process of A/B testing and identify issues we could fix to make future tests easier and more reliable - this was the first A/B test we had run. 5% of searches were registered as a control group, and an additional 5% subject to the reduced confidence and smoothing algorithm change. An initial analysis over the first day's 7m events was run on 7 August, and a final analysis looking at an entire week of data was completed yesterday. You can read the full results at https://github.com/wikimedia-research/SuggestItOff/blob/master/initial_analysis/presentation.Rmd and https://github.com/wikimedia-research/SuggestItOff/blob/master/final_analysis/presentation.pdf Based on what we've seen we conclude that there is, at best, a negligible effect from this change - and it's hard to tell if there even is an effect at all. Accordingly, we recommend the default behaviour be used for all users, and the experiment disabled. This may sound like a failure, but it's actually not. For one thing, we've learned that the config variables here probably aren't our venue for dramatic changes - the defaults are pretty sensible. If we're looking for dramatic changes in the rate, we should be applying dramatic changes in the system's behaviour. In addition, we identified a lot of process issues we can fix for the next round of A/B tests, making it easier to analyse the data that comes in and making the results easier to rely on. These include: 1. Who the gods would destroy, they first bless with real-time analytics. The dramatic difference between the outcome of the initial and final analysis speaks partly to the small size of the effect seen and the power analysis issues mentoned below, but it's also a good reminder that a single day of data, however many datapoints it contains, is rarely the answer. User behaviour varies dramatically depending on the day of the week or the month of the year - a week should be seen as the minimum testing period for us to have any confidence in what we see. 2. Power analysis is a must-have. Our hypothesis for the negligible and totally varying size of the effect is simply the amount of data we had; when you're looking at millions upon millions of events for a pair of options, you're going to see patterns - because with enough data stared at for long enough, pretty much anything can happen. That doesn't mean it's /real/. In the future we need to be setting our sample size using proper, a-priori power analysis - or switching to Bayesian methods where this sort of problem doesn't appear. These are not new lessons to the org as a whole (at least, I hope not) but they are nice reminders, and I hope that sharing them allows us to start building up an org-wide understanding of how we A/B test and the costs of not doing things in a very deliberate way. Thanks, -- Oliver Keyes Count Logula Wikimedia Foundation ___ Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Re: [Wikimedia-search] Measuring user user satisfaction while reducing it at the same time?
Can you think of a way of consistently identifying a user from page to page, but only in the trace following them landing on the search page, that does not include page parameters? On 26 August 2015 at 16:30, Max Semenik maxsem.w...@gmail.com wrote: While doing CR for https://gerrit.wikimedia.org/r/#/c/232896/3/modules/ext.wikimediaEvents.search.js I came to have serious doubts about this approach. In brief, it attempts to track user satisfaction with search results by measuring how long do people stay on pages. It does that by appending fromsearch=1 to links for 0.5% of users. However, this results in page views being uncached and thus increasing HTML load time by a factor of 4-5 and, consequentially, kicking even short pages' first paint outside of comfort zone of 1 second - and that's measured from the office, with ping of 2-3 ms to ulsfo. My concern here is that as a result we're trying to measure the very metric we're screwing with, resulting in experiment being inaccurate. Can we come up with a way of measurement that's less intrusive or alter the requirements of the experiment? -- Best regards, Max Semenik ([[User:MaxSem]]) ___ Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search -- Oliver Keyes Count Logula Wikimedia Foundation ___ Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
[Wikimedia-search] Zero Results Rate—One Month Followup
Hey everyone, I've re-run my big wiki zero result rate numbers to see what has changed in the last month. The results are here: https://www.mediawiki.org/wiki/User:TJones_(WMF)/Notes/Survey_of_Zero-Results_Queries#One_Month_Followup Since I was only looking at the big 52 wikis (100K+ articles), the zero results rate is under 20% (good news), but it hasn't gone down in a month (bad news). I looked very briefly at the full text zero results rate for the rest of the wikis for yesterday. That zero results rate was 56.6%! Lots of hits from wikidata and itwiki-things! There were some DOI queries, but none of the other usual suspects. —Trey Trey Jones Software Engineer, Discovery Wikimedia Foundation ___ Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search
Re: [Wikimedia-search] Measuring user user satisfaction while reducing it at the same time?
Nice catch Max. Thanks for reporting it. Do you have any suggestions for how we could alleviate this issue? Thanks, Dan On 26 August 2015 at 13:30, Max Semenik maxsem.w...@gmail.com wrote: While doing CR for https://gerrit.wikimedia.org/r/#/c/232896/3/modules/ext.wikimediaEvents.search.js I came to have serious doubts about this approach. In brief, it attempts to track user satisfaction with search results by measuring how long do people stay on pages. It does that by appending fromsearch=1 to links for 0.5% of users. However, this results in page views being uncached and thus increasing HTML load time by a factor of 4-5 and, consequentially, kicking even short pages' first paint outside of comfort zone of 1 second - and that's measured from the office, with ping of 2-3 ms to ulsfo. My concern here is that as a result we're trying to measure the very metric we're screwing with, resulting in experiment being inaccurate. Can we come up with a way of measurement that's less intrusive or alter the requirements of the experiment? -- Best regards, Max Semenik ([[User:MaxSem]]) ___ Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search -- Dan Garry Lead Product Manager, Discovery Wikimedia Foundation ___ Wikimedia-search mailing list Wikimedia-search@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikimedia-search