I realize I never responded to this thread, shame on me!

Jorge/Giovanni Kelvin looks pretty cool -- thanks for sharing it. When we
use Quepid ,we sometimes do it at places with existing relevancy test
scripts like Kelvin. Quepid/test scripts tend to satisfy different nitches.
In addition to testing, Quepid is a GUI for helping you explain/investigate
and sandbox in addition to test. Sometimes this is nice for fuzzier/more
qualitative judgments especially when you want to collaborate with
non-technical stakeholders. Its been our replacement for the "spreadsheet"
that a lot of our clients used before Quepid -- where the non-technical
folks would list

Scripts work very well for getting that pass/fail response. Its nice that
Kelvin gives you a "temperature" instead of necessarily a pass fail, that
level of fuzzyness is definitely useful.

We certainly see value in both (and will probably be doing more to
integrate Quepid with continuous integration/scripting).

Cheers,
-Doug


On Mon, May 5, 2014 at 2:47 AM, Jorge Luis Betancourt González <
jlbetanco...@uci.cu> wrote:

> One good thing about kelvin it's more a programmatic task, so you could
> execute the scripts after a few changes/deployment and get a general idea
> if the new changes has impacted into the search experience; yeah sure the
> changing catalog it's still a problem but I kind of like to be able to
> execute a few commands and presto get it done. This could become a must-run
> test in the test suite of the app. I kind of do this already but testing
> from the user interface, using the test library provided by symfony2
> (framework I'm using) and the functional tests. It's not
> test-driven-search-relevancy "perse" but we ensure not to mess up with some
> basic queries we use to test the search feature.
>
> ----- Original Message -----
> From: "Giovanni Bricconi" <giovanni.bricc...@banzai.it>
> To: "solr-user" <solr-user@lucene.apache.org>
> Cc: "Ahmet Arslan" <iori...@yahoo.com>
> Sent: Friday, April 11, 2014 5:15:56 AM
> Subject: Re: Solr relevancy tuning
>
> Hello Doug
>
> I have just watched the quepid demonstration video, and I strongly agree
> with your introduction: it is very hard to involve marketing/business
> people in repeated testing session, and speadsheets or other kind of files
> are not the right tool to use.
> Currenlty I'm quite alone in my tuning task and having a visual approach
> could be benefical for me, you are giving me many good inputs!
>
> I see that kelvin (my scripted tool) and queepid follows the same path. In
> queepid someone quickly whatches the results and applies colours to result,
> in kelvin you enter one on more queries (network cable, ethernet cable) and
> states that the result must contains ethernet in the title, or must come
> from a list of product categories.
>
> I also do diffs of results, before and after changes, to check what is
> going on; but I have to do that in a very unix-scripted way.
>
> Have you considered of placing a counter of total red/bad results in
> quepid? I use this index to have a quick overview of changes impact across
> all queries. Actually I repeat tests in production from times to time, and
> if I see the "kelvin temperature" rising (the number of errors going up) I
> know I have to check what's going on because new products maybe are having
> a bad impact on the index.
>
> I also keep counters of products with low quality images/no images at all
> or too short listings, sometimes are useful to undestand better what will
> happen if you change some bq/fq in the application.
>
> I see also that after changes in quepid someone have to check "gray"
> results and assign them a colour, in kelvin case sometimes the conditions
> can do a bit of magic (new product names still contains SM-G900F) but
> sometimes can introduce false errors (the new product name contains only
> Galaxy 5 and not the product code SM-G900F). So some checks are needed but
> with quepid everybody can do the check, with kelvin you have to change some
> line of a script, and not everybody is able/willing to do that.
>
> The idea of a static index is a good suggestion, I will try to have it in
> the next round of search engine improvement.
>
> Thank you Doug!
>
>
>
>
> 2014-04-09 17:48 GMT+02:00 Doug Turnbull <
> dturnb...@opensourceconnections.com>:
>
> > Hey Giovanni, nice to meet you.
> >
> > I'm the person that did the Test Driven Relevancy talk. We've got a
> product
> > Quepid (http://quepid.com) that lets you gather good/bad results for
> > queries and do a sort of test driven development against search
> relevancy.
> > Sounds similar to your existing scripted approach. Have you considered
> > keeping a static catalog for testing purposes? We had a project with a
> lot
> > of updates and date-dependent relevancy. This lets you create some test
> > scenarios against a static data set. However, one downside is you can't
> > recreate problems in production in your test setup exactly-- you have to
> > find a similar issue that reflects what you're seeing.
> >
> > Cheers,
> > -Doug
> >
> >
> > On Wed, Apr 9, 2014 at 10:42 AM, Giovanni Bricconi <
> > giovanni.bricc...@banzai.it> wrote:
> >
> > > Thank you for the links.
> > >
> > > The book is really useful, I will definitively have to spend some time
> > > reformatting the logs to to access number of result founds, session id
> > and
> > > much more.
> > >
> > > I'm also quite happy that my test cases produces similar results to the
> > > precision reports shown at the beginning of the book.
> > >
> > > Giovanni
> > >
> > >
> > > 2014-04-09 12:59 GMT+02:00 Ahmet Arslan <iori...@yahoo.com>:
> > >
> > > > Hi Giovanni,
> > > >
> > > > Here are some relevant pointers :
> > > >
> > > >
> > > >
> > >
> >
> http://www.lucenerevolution.org/2013/Test-Driven-Relevancy-How-to-Work-with-Content-Experts-to-Optimize-and-Maintain-Search-Relevancy
> > > >
> > > >
> > > > http://rosenfeldmedia.com/books/search-analytics/
> > > >
> > > > http://www.sematext.com/search-analytics/index.html
> > > >
> > > >
> > > > Ahmet
> > > >
> > > >
> > > > On Wednesday, April 9, 2014 12:17 PM, Giovanni Bricconi <
> > > > giovanni.bricc...@banzai.it> wrote:
> > > > It is about one year I'm working on an e-commerce site, and
> > > unfortunately I
> > > > have no "information retrieval" background, so probably I am missing
> > some
> > > > important practices about relevance tuning and search engines.
> > > > During this period I had to fix many "bugs" about bad search results,
> > > which
> > > > I have solved sometimes tuning edismax weights, sometimes creating ad
> > hoc
> > > > query filters or query boosting; but I am still not able to figure
> out
> > > what
> > > > should be the correct process to improve search results relevance.
> > > >
> > > > These are the practices I am following, I would really appreciate any
> > > > comments about them and any hints about what practices you follow in
> > your
> > > > projects:
> > > >
> > > > - In order to have a measure of search quality I have written many
> test
> > > > cases such as "if the user searches for <<nike sport watch>> the
> search
> > > > result should display at least four <<tom tom>> products with the
> words
> > > > <<nike>> and <<sportwatch>> in the title". I have written a tool that
> > > read
> > > > such tests from json files and applies them to my applications, and
> > then
> > > > counts the number of results that does not match the criterias stated
> > in
> > > > the test cases. (for those interested this tool is available at
> > > > https://github.com/gibri/kelvin but it is still quite a prototype)
> > > >
> > > > - I use this count as a quality index, I tried various times to
> change
> > > the
> > > > edismax weight to lower the whole number of error, or to add new
> > > > filters/boostings to the application to try to decrease the error
> > count.
> > > >
> > > > - The pros of this is that at least you have a number to look at, and
> > > that
> > > > you have a quick way of checking the impact of a modification.
> > > >
> > > > - The bad side is that you have to maintain the test cases: now I
> have
> > > > about 800 tests and my product catalogue changes often, this implies
> > that
> > > > some products exits the catalog, and some test cases cant pass
> anymore.
> > > >
> > > > - I am populating the test cases using errors reported from users,
> and
> > I
> > > > feel that this is driving the test cases too much toward pathologic
> > > cases.
> > > > An more over I haven't many test for cases that are working well now.
> > > >
> > > > I would like to use search logs as drivers to generate tests, but I
> > feel
> > > I
> > > > haven't picked the right path. Using top queries, manually reviewing
> > > > results, and then writing tests is a slow process; moreover many top
> > > > queries are ambiguous or are driven by site ads.
> > > >
> > > > Many many queries are unique per users. How to deal with these cases?
> > > >
> > > > How are you using your log to find out test cases to fix? Are you
> > looking
> > > > for queries where the user is not "opening" any returned results?
> Which
> > > kpi
> > > > have you chosen to find out query that are not providing good
> results?
> > > And
> > > > what are you using as kpi for the whole search, beside the conversion
> > > rate?
> > > >
> > > > Can you suggest me any other practices you are using on your
> projects?
> > > >
> > > > Thank you very much in advance
> > > >
> > > > Giovanni
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Doug Turnbull
> > Search & Big Data Architect
> > OpenSource Connections <http://o19s.com>
> >
>
> ________________________________________________________________________________________________
> I Conferencia Científica Internacional UCIENCIA 2014 en la UCI del 24 al
> 26 de abril de 2014, La Habana, Cuba. Ver http://uciencia.uci.cu
>



-- 
Doug Turnbull
Search & Big Data Architect
OpenSource Connections <http://o19s.com>

Reply via email to