Re: Dealing with special cases in analyser

2010-03-17 Thread Paul Taylor
Grant Ingersoll wrote: On Mar 17, 2010, at 11:34 AM, Paul Taylor wrote: Grant Ingersoll wrote: What's your current chain of TokenFilters? How many exceptions do you expect? That is, could you enumerate them? Very few, yes I could enumerate them, but not sure what exactly y

Re: OutOfMemory ParallelMultisearcher

2010-03-17 Thread Jamie
Hi Ian Thanks for the info. Its difficult to reuse searchers as my users are performing realtime searches, so I need to open an IndexReader for every live search query. I've since tracked the OutOfMemory issue down to sort on date. I am using too high a precision (down to the second) which i

Re: exact query match?

2010-03-17 Thread Erick Erickson
You might get some joy from WhitespaceAnalyzer, but beware of case and punctuation. You could pre-process your indexing and querying to remove non-alphanumerics. Or you could create your own analyzer, see SynonymAnalyzer in Lucene In Action, and there's another example here: http://mext.at/?p=26.

exact query match?

2010-03-17 Thread Joachim De Beule
Hi All, I have a corpus of documents which I want to search for phrases. I only want to get those documents that exactly contain a phrase. for example if: doc1 = "x 11 windowing system" doc2 = "x windowing system" doc3 = "the x 11 windowing system" then I want the query "x 11 windowing system" t

RE: Increase number of available positions?

2010-03-17 Thread Steven A Rowe
Hi Rene, On 03/17/2010 at 11:17 AM, Rene Hackl-Sommer wrote: > > > > > > > t293 > t4979 > > > > L_2 > > > > > > > > > t293 > t4979 > > > > L_3 > > > > > > Shouldn't this query only leave documents, where t293 and t4979 are in > the same L_2, but not within the same L_3? I'

Re: Get info wheter a field is multivalued

2010-03-17 Thread Stefan Trcek
On Wednesday 17 March 2010 18:42:10 mark harwood wrote: > Not the fastest thing in the world but works: > > Term startTerm=new Term("myFieldName",""); > TermEnum te=reader.terms(startTerm); > BitSet docsRead=new BitSet(reader.maxDoc()); >

Re: Get info wheter a field is multivalued

2010-03-17 Thread mark harwood
Not the fastest thing in the world but works: Term startTerm=new Term("myFieldName",""); TermEnum te=reader.terms(startTerm); BitSet docsRead=new BitSet(reader.maxDoc()); boolean multiValued=false;

Get info wheter a field is multivalued

2010-03-17 Thread Stefan Trcek
Hello Is there an api that indicates whether a field is multivalued, just like IndexReader.getFieldNames(IndexReader.FieldOption fldOption) does it for fields beeing indexed/stored/termvector? Of course I could track it at index time. Stefan ---

RE: Batch Indexing - best practice?

2010-03-17 Thread Murdoch, Paul
Thanks. Timing the different parts of the indexing process led me to the real cause of the problem. I wasn't reusing my threaded indexWriter. By keeping the indexWriter open, I'm now able to index 500 documents in less than 1 second. That's huge improvement. Thanks again, Paul -Original

Re: Dealing with special cases in analyser

2010-03-17 Thread Grant Ingersoll
On Mar 17, 2010, at 11:34 AM, Paul Taylor wrote: > Grant Ingersoll wrote: >> What's your current chain of TokenFilters? How many exceptions do you >> expect? That is, could you enumerate them? >> > Very few, yes I could enumerate them, but not sure what exactly you are > suggesting, what I

Re: Dealing with special cases in analyser

2010-03-17 Thread Paul Taylor
Grant Ingersoll wrote: What's your current chain of TokenFilters? How many exceptions do you expect? That is, could you enumerate them? Very few, yes I could enumerate them, but not sure what exactly you are suggesting, what I was going to do would be add to the charConvertMap (when I pos

Re: Increase number of available positions?

2010-03-17 Thread Rene Hackl-Sommer
Hi, I was looking at SpanNotQuery to see if I could make do without the position increment gaps. A search requirement that's causing me some trouble to implement is when two terms are supposed to be on the same L_2, yet on different L_3's (L_3's are hierarchically below L_2). With the positi

Re: Dealing with special cases in analyser

2010-03-17 Thread Grant Ingersoll
What's your current chain of TokenFilters? How many exceptions do you expect? That is, could you enumerate them? On Mar 12, 2010, at 5:27 AM, Paul Taylor wrote: > Hi, I'm using a custom analyser based on standardanalyser with good results > to search artists (i.e rolling stones/beatles) but i

London open-source search social - 6th April

2010-03-17 Thread Richard Marr
Hi all, We're meeting up at the Elgin just by Ladbroke Grove on the 6th for a bit of relaxed chat about search, and related technology. Come along, we're nice. http://www.meetup.com/london-search-social/calendar/12781861/ It's a regular event, so if you want prior warning about future meetups you

Re: OutOfMemory ParallelMultisearcher

2010-03-17 Thread Ian Lea
Hi Caching searchers at some level should help keep memory usage down - and will help performance too. Searchers themselves don't generally consume large amounts of memory, but if you've got loads of them then obviously things will add up. Unless you can change the whole design of your app (sin

Re: score and multiValued fields

2010-03-17 Thread Marc Sturlese
Confirmed, supposition 2 is the right one. Erick Erickson wrote: > > Have you looked at: > http://lucene.apache.org/java/2_4_0/scoring.html > > even though it's for > 2.4, > I don't think there's any relevant changes for 3.x... > > I'm pretty