Re: How to control the number of grouped results [DRUPAL]

2019-12-02 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
This parameter referers to the Solr request, for example: https://lucene.apache.org/solr/guide/7_0/result-grouping.html#grouping-by-query Drupal should expose it in the API, I guess? Cheers, diego From: solr-user@lucene.apache.org At: 12/02/19 14:47:06To: solr-user@lucene.apache.org

Re:LTR: Normalize Feature Weights

2019-04-23 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi Kamal, You can use a MinMaxNormalizer[1], and get min and max from historical data, for the original score won't guarantee that the value will be **always** between 0..1 but it should happen in the majority of the cases, if the 0..1 constraint is not super strong I would rather use a

Re:BM25F in Solr

2019-03-20 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
If you want a 'global' IDF across different fields, maybe one solution is to use a copyfield to copy all the fields in a common field (e.g, title, authors, body, footer all copied into a copyfield call text), and then you should be able to use it with a function query or by implementing your

search devroom @ FOSDEM 2019

2018-12-03 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi all, I just noticed this and I just wanted to share with you: Full-text search is everywhere nowadays and FOSDEM 2019 will have a dedicated devroom for search on Sunday the 3rd of February. We would like to invite submissions of presentations from developers, researchers, and users of

Re: solr and diversification

2018-10-04 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
relevance. > > > > Joel Bernstein > > http://joelsolr.blogspot.com/ > > > > > > On Thu, Sep 27, 2018 at 1:39 PM Diego Ceccarelli (BLOOMBERG/ LONDON) < > > dceccarel...@bloomberg.net> wrote: > > > > > Yeah, I think Kmeans might be a way to implement the &q

Re: solr and diversification

2018-09-27 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
gt; threshold. I would allow to define the strategy and select it from the request. From: solr-user@lucene.apache.org At: 09/27/18 18:25:43To: Diego Ceccarelli (BLOOMBERG/ LONDON ) , solr-user@lucene.apache.org Subject: Re: solr and diversification I've thought about this problem a littl

solr and diversification

2018-09-27 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi, I'm considering to write a component for diversifying the results. I know that diversification can be achieved by using grouping but I'm thinking about something different and query biased. The idea is to have something that gets applied after the normal retrieval and selects the top k

Re: Learning to rank - Bad Request

2018-07-16 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi Akshay, did you run solr enabling learning to rank? ./bin/solr -e techproducts -Dsolr.ltr.enabled=true if you don't pass -Dsolr.ltr.enabled=true ltr will not be available. Cheers, Diego From: solr-user@lucene.apache.org At: 07/16/18 09:00:39To: solr-user@lucene.apache.org Subject: Re:

Re:LTR performance issues

2018-05-08 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hello ilayaraja, I think it would be good to move this discussion on the Jira item: https://issues.apache.org/jira/browse/SOLR-8776?attachmentOrder=asc You can add your comments there, and also in the page I explained how it works. On the performance you are right: at the moment it is slow.

Re:the number of docs in each group depends on rows

2018-05-04 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hello, I'm not sure 100% but I think that if you have multiple shards the number of docs matched in each group is *not* guarantee to be exact. Increasing the rows will increase the amount of partial information that each shard sends to the federator and make the number more precise. For

Re: Learning to Rank (LTR) with grouping

2018-04-18 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
I just updated the PR to upstream - I still have to fix some things in distribute mode, but unit tests in non distribute mode works. Hope this helps, Diego From: solr-user@lucene.apache.org At: 04/15/18 03:37:54To: solr-user@lucene.apache.org Subject: Re: Learning to Rank (LTR) with

Re:Support LTR RankQuery with Grouping

2018-04-06 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Patch has not been merged yet, it is available here: https://github.com/apache/lucene-solr/pull/162 You can try to apply the patch on the current master and see if it fixes. Please let us know if you have any questions. Cheers, Diego From: solr-user@lucene.apache.org At: 04/05/18

Re:Defining Document Transformers in Solr Configuration

2018-02-27 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
I don't think you can define docTrasformer in the SolrConfig at the moment, I agree it would be a cool feature. Maybe one possibility could be to use the update request processors [1], and precompute the fields at index time, it would be more expensive in disk and index time, but then it

Re:SOLR Similarity Difference

2018-02-27 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi Rick, I don't think the issue is BM25 vs TFIDF (the old similarity), it seems more due to the "matching" logic. you are asking to match: "(Action AND Technical AND Temporaries AND t/a AND CTR AND Corporation)" This (in theory) means that you want to retrieve **only** the documents that

Re:FileDictionaryFactory:- pick source file from solr instead of zk config.

2018-02-26 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
A similar problem came out with learning to rank models, and was fixed by https://issues.apache.org/jira/browse/SOLR-11250 Maybe it can be useful.. From: solr-user@lucene.apache.org At: 02/26/18 13:13:28To: solr-user@lucene.apache.org Subject: FileDictionaryFactory:- pick source file from

Benchmarking Solr Query performance

2018-02-09 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi all, We would like to perform a benchmark of https://issues.apache.org/jira/browse/SOLR-11831 The patch improves the performance of grouped queries asking only for one result per group (aka. group.limit=1). I remember seeing a page showing a benchmark of the query performance on Wikipedia,

Re:skip slow tests?

2018-02-02 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
ant -Dtests.slow=false From: solr-user@lucene.apache.org At: 02/02/18 17:07:14To: solr-user@lucene.apache.org Subject: skip slow tests? Hi *, Some (slow) tests in Solr are annotated with @Slow. Is there a way to run ant test skipping them? thanks, Diego

skip slow tests?

2018-02-02 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi *, Some (slow) tests in Solr are annotated with @Slow. Is there a way to run ant test skipping them? thanks, Diego

Re: Searching for an efficient and scalable way to filter query results using non-indexed and dynamic range values

2018-02-01 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi Luigi, I don't know much that part of Lucene, I would check blog posts and the code to understand if you can use NumericDocValues (my gut says yes). Also, I don't know if it is important, but please note that if you index all the documents at the beginning your scores will be different -

Re:Searching for an efficient and scalable way to filter query results using non-indexed and dynamic range values

2018-01-31 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi Luigi, What about using an updatable DocValue [1] for the field x ? you could initially set it to -1, and then update it for the docs in the step j. Range queries should still work and the update should be fast. Cheers [1]

Re: LTR original score feature

2018-01-29 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
I think it really depends on the particular use case. Sometime the absolute score is a good feature, sometimes no. If you are using the default bm25, I think that increasing the number of terms in the query will increase the average doc. score in the results. So maybe I would normalize the

Re: ***UNCHECKED*** Limit Solr search to number of character/words (without changing index)

2018-01-29 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
In theory it should be possible if you are indexing the positions of the tokens in your field, but I am not aware of any solr query that allows you to weight the matches based on the position, does anyone know if is possible? From: solr-user@lucene.apache.org At: 01/29/18 11:25:36To:

Re:***UNCHECKED*** Limit Solr search to number of character/words (without changing index)

2018-01-26 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi Zahid, if you want to allow searching only if the query is shorter than a certain number of terms / characters, I would do it before calling solr probably, otherwise you could write a QueryParserPlugin (see [1]) and check that the query is sound before processing it. See also:

RE: Using lucene to post-process Solr query results

2018-01-23 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
And you want to show to the users only the Lucene documents that matched the original query sent to Solr? (what if a lucene document matches only part of the query?) From: solr-user@lucene.apache.org At: 01/23/18 13:55:46To: Diego Ceccarelli (BLOOMBERG/ LONDON ) , solr-user

Re: Using lucene to post-process Solr query results

2018-01-23 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Rahul, can you provide more details on how you decide that the smaller lucene objects are part of the same solr document? From: solr-user@lucene.apache.org At: 01/23/18 09:59:17To: solr-user@lucene.apache.org Subject: Re: Using lucene to post-process Solr query results Hi Rahul, Looks like

Re:Frequently Used Search Terms.

2018-01-18 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi Fiz, It is not possible at the moment, you will have to log the queries (from solr, or before you sent them) and use external tools to do that. There is a jira item on that if you are interested: https://issues.apache.org/jira/browse/SOLR-10359 Diego From: solr-user@lucene.apache.org At:

Re: Learning to Rank (LTR) with grouping

2018-01-11 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
not the case then how to proceed with using fix in >>> > master-solr-8776 with branch_6_6 can a new patch be created for this? >>> > >>> > Thank you, >>> > Roopa >>> > >>> > On Mon, Dec 11, 2017 at 9:54 AM, Roopa Rao <roop..

Re: Personalized search parameters

2018-01-08 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
I'm assuming that you are writing the cosine similarity and you have two vectors containing the pairs . The two vectors could have different sizes because they only contain the terms that have tfidf != 0. if you want to compute cosine similarity between the two lists you just have

Re: Personalized search parameters

2018-01-05 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
From: solr-user@lucene.apache.org At: 01/05/18 15:35:46To: solr-user@lucene.apache.org Subject: Re: Personalized search parameters In particular we have to retrieve the documents with a normal search followed by a result reranking phase where we calculate the cosine similarity between the

Re:Personalized search parameters

2018-01-05 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Why you want the personalization to happen into Similarity? Similarity will score all the docs matching your query, so it has too be really fast. Unless your personalization is very easy (e.g., tf/idf computed in a different way based on the user) I would not put it there.. Did you consider

Re: SOLR 7.2 and LTR

2017-12-29 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
> at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume. > executeProduceConsume(ExecuteProduceConsume.java:303) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume. > produceConsume(ExecuteProduceConsume.java:148) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run( > Exec

Re: SOLR 7.2 and LTR

2017-12-28 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) at java.lang.Thread.run(Unknown Source) Best regards, Dariusz Wojtas On Thu, Dec 28, 2017 at 1:03 PM, Diego Ceccarelli (BLOOMBERG/ LONDON) < dceccarel...@bloomberg.net> wrote: > Hello Dariusz, > > Can you look into the solr l

Re:SOLR 7.2 and LTR

2017-12-28 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hello Dariusz, Can you look into the solr logs for a stack trace or ERROR logs? From: solr-user@lucene.apache.org At: 12/27/17 19:01:29To: solr-user@lucene.apache.org Subject: SOLR 7.2 and LTR Hi, I am using SOLR 7.0 and use the ltr parser. The configuration I use works nicely under SOLR

Re: Learning to Rank (LTR) with grouping

2017-12-11 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi Roopa, If you look at the diff: https://github.com/apache/lucene-solr/pull/162/files I didn't change much in SolrIndexSearcher, you can try to skip the file when applying the patch and redo the changes after. Alternatively, the feature branch is available here:

Re:Given path of Ranklib model in Solr Model Json

2017-11-08 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hello isspek, Unfortunately no, it would be nice to patch RankLib to output the model in json. Jfyi, I've a script to convert the xml into the json format https://github.com/bloomberg/lucene-solr/blob/ltr-demo-lucene-solr/py-solr-buzzwords/tree_model.py Cheers, Diego From:

vespa

2017-09-27 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi all, Yesterday Yahoo open sourced Vespa (i.e.: The open big data serving engine: Store, search, rank and organize big data at user serving time.), looking at the API they provide search. I did a quick search on the code for lucene, getting only 5 results. Does anyone know more about the

Re: Is there a way to delete multiple documents using wildcard?

2017-09-21 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
https://wiki.apache.org/solr/FAQ#How_can_I_delete_all_documents_from_my_index.3F have a look also at the last post here: https://gist.github.com/nz/673027 I think there's a way to disallow delete by *:* in the solrconfig.xml but I can't find it (I would take a look in the solrconfig just in

Re: Rescoring from 0 - full

2017-09-21 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi Dariusz, If you use *:* you'll rerank only the top N random documents, as Emir said, that will not produce interesting results probably. If you want to replace the original score, you can take a look at the learning to rank module [1], that would allow you to reassign a new score to the top

Re: Learn To Rank Questions

2017-06-02 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi, Sorry for the delay, here are my replies: 1. I'm not yet a spark user (but I'm working on that :)) 2. I'm not sure I understand how you would use a feature that is not a float into a model, in my experience all the learning to rank methods always train and predict from a list of floats.

Support RankQuery in grouping

2017-05-11 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi All, At the moment RankQueries [1] are not supported when you perform grouping: if you perform a ReRankQuery and ask for the groups, reranking will be ignored in the scoring. In SOLR-8776, I added support for ReRankQueries in grouping and I opened a PR on github [2]. ReRankQueries are

Re: How to train the model using user clicks when use ltr(learning to rank) module?

2017-01-06 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi Jeffery, I submitted a patch to the README of the learning to rank example folder, trying to explain better how to produce a training set given a log with interaction data. Patch is available here: https://issues.apache.org/jira/browse/SOLR-9929 And you can see the new version of the

Re: Solr Support for BM25F

2016-04-14 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Hi David, I implemented bm25f for Europeana on Solr 4.x a couple of years ago, you can find it here: https://github.com/europeana/contrib/tree/master/bm25f-ranking maybe I should contribute it back.. Please do not hesitate to contact me if you need help :) Cheers, Diego From: