[ https://issues.apache.org/jira/browse/SOLR-10359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alessandro Benedetti updated SOLR-10359: ---------------------------------------- Summary: User Events Logger Component (was: Users Events Logger Component) > User Events Logger Component > ---------------------------- > > Key: SOLR-10359 > URL: https://issues.apache.org/jira/browse/SOLR-10359 > Project: Solr > Issue Type: New Feature > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Alessandro Benedetti > Labels: CTR, evaluation > > *Introduction* > Being able to evaluate the quality of your search engine is becoming more and > more important day by day. > This issue is to put a milestone to integrate online evaluation metrics with > Solr. > *Scope* > Scope of this issue is to provide a set of components able to : > 1) Collect Search Results impressions ( results shown per query) > 2) Collect Users events ( user interactions on the search results per query > e.g. clicks, bookmarking,ect ) > 3) Calculate evaluation metrics on demand, such as Click Through Rate, DCG ... > *Technical Design* > A SearchComponent can be designed : > *UsersEventsLoggerComponent* > A property (such as storeDir) will define where the data collected will be > stored. > Different data structures can be explored, to keep it simple, a first > implementation can be a Lucene Index. > *Data Model* > The user event can be modelled in the following way : > <query> - the user query the event is related to > <result_id> - the ID of the search result involved in the interaction > <result_position> - the position in the ranking of the search result involved > in the interaction > <timestamp> - time when the interaction happened > <relevancy_rating> - 0 for impressions, a value between 1-5 to identify the > type of user event, the semantic will depend on the domain and use cases > <test_group> - this can identify a variant, in A/B testing > *Impressions Logging* > When the SearchComponent is assigned to a request handler, everytime it > processes a request and return to the user a result set for a query, the > component will collect the impressions ( results returned) and index them in > the auxiliary lucene index. > This will happen in parallel as soon as you return the results to avoid > affecting the query time. > Of course an impact on CPU load and memory is expected, will be interesting > to minimise it. > * User Events Logging * > An UpdateHandler will be exposed to accept POST requests and collect user > events. > Everytime a request is sent, the user event will be indexed in the underline > auxiliary Lucene Index. > * Stats Calculation * > A RequestHandler will be exposed to be able to calculate stats and > aggregations for the metrics : > /evaluation?metric=ctr&stats=query&compare=testA,testB > This request could calculate the CTR for our testA and testB to compare. > Showing stats in total and per query ( to highlight the queries with > lower/higher CTR). > The calculations will happen separating the <test_group> for an easy > comparison. > Will be important to keep it as simple as possible for a first version, to > then extend it as much as we like -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org