Re: Multiple merge-runs from same set of segments

2021-05-24 Thread Ravikumar Govindarajan
Thanks Patrick for the help! May I know what lucene version you're using? > We are using an older version of lucene as of now (4.7.x) and I believe the FilterCodecReader of current version is akin to FilterAtomicReader & should do the job for us! If it is not available, I'm not sure whether the

Re: Multiple merge-runs from same set of segments

2021-05-24 Thread Patrick Zhai
Hi Ravi, 1. May I know what lucene version you're using? As far as I know the SortingMergePolicy has been deprecated and replaced by IndexWriterConfig.setIndexSort in newer lucene version. So if the "setIndexSort" is available I would suggest using that to achieve the sorted index (as you might ha

Re: Multiple merge-runs from same set of segments

2021-05-24 Thread Ravikumar Govindarajan
Thanks Michael! This was just what I was looking for!!. Just a couple of questions. - When we call addIndexes(IndexReader...), does the merge happen via MergePolicy? We use a SortingMergePolicy and would like to maintain the sort-order in newly created segments too - Concurrency is a

Re: Multiple merge-runs from same set of segments

2021-05-24 Thread Michael McCandless
Are you trying to rewrite your already created index into a different segment geometry? Maybe have a look at the new IndexRearranger tool ? It is already doing something like what you enumerated below, including mocking LiveDocs to get the right

Re: Lucene/Solr and BERT

2021-05-24 Thread Michael Wechner
Hi Russ I would like to use it for detecting duplicated questions, whereas I am currently using the project sbert.net you mention below to do the embedding with a size of 768 for indexing and querying. sbert has an example listed using "util.pytorch_cos_sim(A,B) as a brute-force approach