IndexWriter.MaxFieldLength.UNLIMITED at what price?

2009-12-10 Thread Rob Staveley (Tom)
I was wondering where I might read about the cost of using IndexWriter.MaxFieldLength.UNLIMITED versus IndexWriter.MaxFieldLength.LIMITED. Are thee any consequences over and above the obvious one that you are going to analyse more content in your IndexWriter when you have more than 10,000 chara

Re: IndexWriter.MaxFieldLength.UNLIMITED at what price?

2009-12-10 Thread Michael McCandless
LIMITED is basically an insurance policy, protecting you from accidentally indexing an immense document, leading to OOME. It also protects you in case your analyzer is accidentally letting in bogus terms (say, if you indexed a large exe file, or there was a large base64-encoded attachment on an em

Re: heap memory issues when sorting by a string field

2009-12-10 Thread Michael McCandless
The big challenge w/ a global sort ords is handling reopen, ie, for apps that need to frequently reopen their index. If you have a static index, then this approach is a good one -- you make a one time investment, after indexing, to compute your global ords, store them on disk. Then at searching y

Re: Problem searching field with % as value

2009-12-10 Thread kanayo
Hi, Thanks Ian for your reply. What i did was to put a check in my indexing such that when the stirng to index is just % it stores it un_analyzed else then analyze and i then use the TermQuery to search for it and it is now working. Thanks a lot for your tip. Cheers. Ian Lea wrote: > > If y

Re: heap memory issues when sorting by a string field

2009-12-10 Thread Michael McCandless
On Thu, Dec 10, 2009 at 2:05 AM, Ganesh wrote: > I think, This problem will happen for all sorted fields. I am sorting on > integer field. Integer field should take much less RAM than String, today, for sorting. And there's no efficiency gained by doing this globally (per segment is just fine).

RE: IndexWriter.MaxFieldLength.UNLIMITED at what price?

2009-12-10 Thread Rob Staveley (Tom)
Many thanks, Mike. UNLIMITED is right for me then. Happily, it is a reasonably controlled environment. -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: 10 December 2009 10:03 To: java-user@lucene.apache.org Subject: Re: IndexWriter.MaxFieldLength.UNLIMI

Re: heap memory issues when sorting by a string field

2009-12-10 Thread Toke Eskildsen
Regarding LUCENE-1990, I've been experimenting a bit with packed positive integer arrays. I'd love to take a look at it, but I simply do not have the time before next year. My approach was to pack the bits rights after each other, but it does represent certain challenges. Going for smallest possib

TermEnum.skipTo in 3.0.0 replacement

2009-12-10 Thread Konstantyn Smirnov
Hi all in the Lucene 2.3.2 there was a method in TermEnum skipTo( term ) In the 3.0.0 it's missing... Are there any other way to skip terms? - Konstantyn Smirnov, CTO http://www.poiradar.ru www.poiradar.ru http://www.poiradar.com.ua www.poiradar.com.ua http://www.poiradar.com www.poirad

Re: TermEnum.skipTo in 3.0.0 replacement

2009-12-10 Thread Ian Lea
>From the 2.9.1 javadocs: "Deprecated. This method is not performant and will be removed in Lucene 3.0. Use IndexReader.terms(Term) to create a new TermEnum positioned at a given term." It is recommended to upgrade to 2.9.1, fix all the deprecations (see the javadocs) and then upgrade, with a rec

Re: TermEnum.skipTo in 3.0.0 replacement

2009-12-10 Thread Erick Erickson
An easy way to find this kind of thing is to go to the 2.9.1 documentation and see where the deprecation alert sends you. FOr instance this is from TermEnum.skipTo(Term target) *Deprecated.* *This method is not performant and will be removed in Lucene 3.0. Use IndexReader.terms(Term)

RE: TermEnum.skipTo in 3.0.0 replacement

2009-12-10 Thread Uwe Schindler
This method was always a linear scan and very slow (because it was never implemented correctly). If you want to skip, close the termenum and retrieve a new one with IndexReader.terms(Term seekTo) [internally e.g. NumericRangeQuery does this, look into source code]. See also the deprecation message

Re: TermEnum.skipTo in 3.0.0 replacement

2009-12-10 Thread Konstantyn Smirnov
thanks guys! :) another question, what is faster indexReader.terms( t ) or 10 times termEnum.next() ? - Konstantyn Smirnov, CTO http://www.poiradar.ru www.poiradar.ru http://www.poiradar.com.ua www.poiradar.com.ua http://www.poiradar.com www.poiradar.com http://www.poiradar.de www.poirad

"IN" Query for NumericFields

2009-12-10 Thread comparis . ch - Roman Baeriswyl
Hi, I do have some indices where I need to get results based on a fixed number list (not a range) Let's say I have a field named "CategoryID" and I now need all results where "CategoryID" is 1,3 or 7. In Lucene 2.4 I created a QueryParser which looked like: "CategoryID:(1 3 7)". But the Query

Re: "IN" Query for NumericFields

2009-12-10 Thread Shashi Kant
Have you looked at BooleanQuery? Create individual TermQuery and OR them using BooleanQuery. On Thu, Dec 10, 2009 at 10:34 AM, comparis.ch - Roman Baeriswyl < roman.baeris...@comparis.ch> wrote: > Hi, > > I do have some indices where I need to get results based on a fixed number > list (not a ran

RE: "IN" Query for NumericFields

2009-12-10 Thread Uwe Schindler
You can override QP's newTermQuery method. Look into the list archives and search for both keywords. There it is also explained how to use NumericRangeQuery with QP. The ideal solution to hit exact terms are to use NumericRangeQuery with upper lower bound indentical and inclusive. Uwe - Uwe

RE: "IN" Query for NumericFields

2009-12-10 Thread comparis . ch - Roman Baeriswyl
I tried Query q = new BooleanQuery(); ((BooleanQuery)q).Add(NumericRangeQuery.NewLongRange("CategoryID", 1, 1, true, true), BooleanClause.Occur.MUST); ((BooleanQuery)q).Add(NumericRangeQuery.NewLongRange("CategoryID", 3, 3, true, true), BooleanClause.Occur.MUST); ((BooleanQuery)q).Add(NumericRan

RE: "IN" Query for NumericFields

2009-12-10 Thread Uwe Schindler
Cannot be :-) Is the precstep identical? - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: comparis.ch - Roman Baeriswyl [mailto:roman.baeris...@comparis.ch] > Sent: Thursday, December 10, 2009 5:24 PM > T

RE: "IN" Query for NumericFields

2009-12-10 Thread Uwe Schindler
Sorry, if you have an IN query, it must be BooleanClause.Occur.SHOULD, as the CategoryID can be 1, or 3 or 7. You query should not match any doc (I verified this). - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message-

Re: "IN" Query for NumericFields

2009-12-10 Thread Matthew Hall
I suspect he's running the query through an analyzer that is dropping out single digit numerics, which would basically be a query that pulls back everything from the indexes.. or at least I think so. Uwe Schindler wrote: Sorry, if you have an IN query, it must be BooleanClause.Occur.SHOULD, as

Re: MatchAllDocsQuery and InstantiatedIndex on Lucene 2.9.1

2009-12-10 Thread Karl Wettin
https://issues.apache.org/jira/browse/LUCENE-2144 9 dec 2009 kl. 23.22 skrev Uwe Schindler: This is a bug in InstantiatedIndex. The termDoc(null) was added to get all documents. This was never implemented in Instantiated Index. Can you open an issue? There maybe other queries fail because

Re: MatchAllDocsQuery and InstantiatedIndex on Lucene 2.9.1

2009-12-10 Thread Jason Fennell
Thanks for the fast reply & patch! On Thu, Dec 10, 2009 at 12:20 PM, Karl Wettin wrote: > https://issues.apache.org/jira/browse/LUCENE-2144 > > 9 dec 2009 kl. 23.22 skrev Uwe Schindler: > > > This is a bug in InstantiatedIndex. The termDoc(null) was added to get all >> documents. This was never

Re: Index file compatibility and a migration plan to lucene 3

2009-12-10 Thread Nigel
I have a follow-up question to this thread on Field.Store.COMPRESS in 2.9.1 and beyond. I'm getting a bit confused between the changes in 2.9.1 and 3.0 so I want to make sure I know what's going on. We also use old-style compressed fields and are about to upgrade to 2.9.1. Is the following accur

Recover special terms from StandardTokenizer

2009-12-10 Thread Weiwei Wang
Hi, all, I designed a ftp search engine based on Lucene. I did a few modifications to the StandardTokenizer. My problem is: C++ is tokenized as c from StandardTokenizer and I want to recover it from the TokenStream from StandardTokenizer What should I do? -- Weiwei Wang Alex Wang 王巍巍 Room

Returns nothing when sorting

2009-12-10 Thread Michel Nadeau
Hi ! I have a quite small Lucene 3.0.0 index with around 400,000 documents in it. I'm trying to sort my results like this : TopDocs td; td = searcher.search(q, cluCF, 10, cluSort); ScoreDoc[] hits = td.scoreDocs; My cluCF is a ChainedFilter containing at least one filter, and cluSort is a float