Re: Not able reproduce race condition issue to justify implementation of optimistic concurrency

2018-11-16 Thread Arnold Bronley
Thanks for replying, Chris. 1) depending on the number of CPUs / load on your solr server, it's possible you're just getting lucky. it's hard to "prove" with a multithreaded test that concurrency bugs exist. - Agreed. However, between 200k total calls, race condition not happening even once - I

Re: is SearchComponent the correct way?

2018-11-16 Thread Mikhail Khludnev
On Tue, Nov 13, 2018 at 6:36 AM John Thorhauer wrote: > Mikhail, > > Where do I implement the buffering? I can not do it in then collect() > method. Please clarify why exactly? Notice my statement about one segment only. > I can not see how I can get access to what I need in the finish() >

Re: Soft commits and new Searcher

2018-11-16 Thread Walter Underwood
Thanks. I don’t need openSearcher=false on soft commits. I was just musing about it. Keeping the same query result cache would be very similar to using an HTTP cache in front of Solr. Which means that it should be done with an HTTP cache, because those are straighforward and very fast. It

Re: Soft commits and new Searcher

2018-11-16 Thread Shawn Heisey
On 11/16/2018 12:21 PM, Shawn Heisey wrote: On 11/16/2018 11:54 AM, Walter Underwood wrote: I’ve been reading all the documentation and articles I can find, and they all say that soft commit makes documents visible for searching. They don’t specifically say that they invalidate the caches

Re: Soft commits and new Searcher

2018-11-16 Thread Shawn Heisey
On 11/16/2018 11:54 AM, Walter Underwood wrote: Does a soft commit always open a new Searcher? In general, yes.  To quote the oft-referenced blog post ... hard commits are about durability, soft commits are about visibility. I actually don't know if "openSearcher=false" would work on a soft

Soft commits and new Searcher

2018-11-16 Thread Walter Underwood
Does a soft commit always open a new Searcher? I’ve been reading all the documentation and articles I can find, and they all say that soft commit makes documents visible for searching. They don’t specifically say that they invalidate the caches and/or open a new Searcher. I guess I can see a

Re: Not able reproduce race condition issue to justify implementation of optimistic concurrency

2018-11-16 Thread Chris Hostetter
1) depending on the number of CPUs / load on your solr server, it's possible you're just getting lucky. it's hard to "prove" with a multithreaded test that concurrency bugs exist. 2) a lot depends on what your updates look like (ie: the impl of SolrDocWriter.atomicWrite()), and what the

Not able reproduce race condition issue to justify implementation of optimistic concurrency

2018-11-16 Thread Arnold Bronley
Hi, Before implementing optimistic concurrency solution, I had written one test case to check if two threads atomically writing two different fields (say f1 and f2) of the same document (say d) run into conflict or not. Thread t1 atomically writes counter c1 to field f1 of document d, commits and

Re: Extracting important multi term phrases from the text

2018-11-16 Thread Alexandre Rafalovitch
Good catch Pratik. It is in Javadoc, but not in the reference guide: https://lucene.apache.org/core/6_3_0/analyzers-common/org/apache/lucene/analysis/shingle/ShingleFilterFactory.html . I'll try to fix that later (SOLR-12996). Regards, Alex. On Fri, 16 Nov 2018 at 10:44, Pratik Patel wrote:

Re: Extracting important multi term phrases from the text

2018-11-16 Thread David Hastings
Thanks, I would be really curious to see your url call if you dont mind. I am just getting started with the skg stuff and finding this conversation in particular has really helped On Fri, Nov 16, 2018 at 10:44 AM Pratik Patel wrote: > @Markus @Walter, @Alexandre is right. The culprit was not

Re: Extracting important multi term phrases from the text

2018-11-16 Thread Pratik Patel
@Markus @Walter, @Alexandre is right. The culprit was not StopWord Filter, it was ShingleFilter. I could not find parameter filterToken in documentation, is it a new addition? BTW, I tried that and it works. Thanks! I still ended up using pattern replacement filter because I did not want any

RE: indexing multiple levels of data

2018-11-16 Thread Martin Frank Hansen (MHQ)
Hi Jan, Thanks for your quick reply! I was fearing that you would suggest this  I have already moved much of the indexing application out of Solr which gives me the desired flexibility, but I am a bit concerned about the time consumption doing so. Right now I have about 20,000 xml documents

Re: indexing multiple levels of data

2018-11-16 Thread Jan Høydahl
Hi Martin, For a complex use case as this I would recommend you write a separate indexer application that crawls the files, looks up the correct metadata XMLs based on given business rules, and then constructs the full Solr document to send to Solr. Even parsing full-text from PDF etc I would

Re: Extracting important multi term phrases from the text

2018-11-16 Thread David Hastings
Which function of the SKG are you using? significantTerms? On Thu, Nov 15, 2018 at 7:09 PM Alexandre Rafalovitch wrote: > I think the underscore actually comes from the Shingles (parameter > fillerToken). Have you tried setting it to empty string? > > Regards, >Alex. > On Thu, 15 Nov 2018

indexing multiple levels of data

2018-11-16 Thread Martin Frank Hansen (MHQ)
Hi, I am trying to add meta data and files to Solr, but are experiencing some problems. Data is divided on three two, cases and files. For each case the meta-data is given in an xml document, while meta data for the files is given in another xml document, and the actual files are kept in yet