continuing gc when too many search threads

2014-09-19 Thread Li Li
I have an index of about 30 million short strings, the index size is about 3GB in disk I have give jvm 5gb memory with default setting in ubuntu 12.04 of sun jdk 7. When I use 20 theads, it's ok. But If I run 30 threads. After a while. The jvm is doing nothing but gc. lucene version si 4.10.0

Case sensitivity

2014-09-19 Thread John Cecere
Is there a way to set up Lucene so that both case-sensitive and case-insensitive searches can be done without having to generate two indexes? -- John Cecere Principal Engineer - Oracle Corporation 732-987-4317 / john.cec...@oracle.com

Re: Case sensitivity

2014-09-19 Thread Paul Libbrecht
two fields? paul On 19 sept. 2014, at 15:07, John Cecere wrote: > Is there a way to set up Lucene so that both case-sensitive and > case-insensitive searches can be done without having to generate two indexes? > > -- > John Cecere > Principal Engineer - Oracle Corporation > 732-987-4317 / j

Re: Case sensitivity

2014-09-19 Thread John Cecere
I've considered this, but there are two problems with it. First of all, it feels like I'm still taking up twice the storage, I'm just doing it using a single index rather than two of them. This doesn't sound like it's buying me anything. The second problem with this is simply that I haven't figu

Re: Case sensitivity

2014-09-19 Thread Ian Lea
PerFieldAnalyzerWrapper is the way to mix and match fields and analyzers. Personally I'd simply store the case-insensitive field with a call to toLowerCase() on the value and equivalent on the search string. You will of course use more storage, but you don't need to store the text contents for bo

Re: Case sensitivity

2014-09-19 Thread Sujit Pal
Hi John, Take a look at the PerFieldAnalyzerWrapper. As the name suggests, it allows you to create different analyzers per field. -sujit On Fri, Sep 19, 2014 at 6:50 AM, John Cecere wrote: > I've considered this, but there are two problems with it. First of all, it > feels like I'm still taki

Re: Quickest way to collect one field from the searched docs....

2014-09-19 Thread Sujit Pal
Hi Shouvik, not sure if you have already considered this, but you could put the database primary key for the record into the index - ie, reverse your insert to do DB first, get the record_id and then add this to the Lucene index as "record_id" field. During retrieval you can minimize the network tr

Re: Quickest way to collect one field from the searched docs....

2014-09-19 Thread Shouvik Bardhan
Sujit, thanks for the response. I have already done what you said. My issue is that after setting up the data in lucene index and the DB, when a query comes and say it matches 25 million docs in Lucene, then I need to get all the 25 million values of this field (record_id in your example) quickly.

Re: Quickest way to collect one field from the searched docs....

2014-09-19 Thread Neil Bacon
Hi Have you looked at DocFieldValue / DocField? It's fast for this use case. Regards Neil Sent from my mobile doovalaki On 20/09/2014 6:44 am, Shouvik Bardhan wrote: Sujit, thanks for the response. I have already done what you said. My issue is that after setting up the data in lucene index and