Upgrading from v2.2.0 to v2.3.2

2008-08-26 Thread Mark Lassau
I am a developer on the JIRA Issue tracker, and we are considering upgrading our Lucene version from v2.2.0 to v2.3.2. I have been charged with doing the risk analysis, and project work. I have read the change lists, and the bugs reported on the Lucene Issue Tracker (JIRA of course ;), and

Re: Upgrading from v2.2.0 to v2.3.2

2008-08-26 Thread tom
AUTOMATIC REPLY Tom Roberts is out of the office till 2nd September 2008. LUX reopens on 1st September 2008 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Re: Upgrading from v2.2.0 to v2.3.2

2008-08-26 Thread tom
AUTOMATIC REPLY Tom Roberts is out of the office till 2nd September 2008. LUX reopens on 1st September 2008 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Deleted document terms

2008-08-26 Thread John Patterson
Hi, I just discovered some strange behaviour with deleted documents. I do a search for documents with a certain query and delete one using IndexWriter.deleteDocuments(Term) using a key for the term. Then I repeat the search and the document is still there because I use a custom HitCollector

Re: How to get all terms with a special field and document after indexed

2008-08-26 Thread Michael McCandless
It sounds like TermVectors may apply here? The TermVectors for a doc are like a miniature inverted index just for that one document. It lets you retrieve all terms and their frequencies, plus optionally offset and positions information for each term occurrence. Mike Beijing2008 wrote:

Re: Clarification about segments

2008-08-26 Thread Michael McCandless
Before 2.3, each doc was in fact a separate segment in memory, and then these segments were merged together to flush a single segment in the Directory. As of 2.3, IndexWriter now writes directly into RAM the data structures that are needed to create the segment, and then flushing the

Re: Deleted document terms

2008-08-26 Thread Michael McCandless
John Patterson wrote: I just discovered some strange behaviour with deleted documents. I do a search for documents with a certain query and delete one using IndexWriter.deleteDocuments(Term) using a key for the term. Then I repeat the search and the document is still there because I

Re: Upgrading from v2.2.0 to v2.3.2

2008-08-26 Thread Michael McCandless
Mark Lassau wrote: I am a developer on the JIRA Issue tracker, and we are considering upgrading our Lucene version from v2.2.0 to v2.3.2. I have been charged with doing the risk analysis, and project work. I have read the change lists, and the bugs reported on the Lucene Issue Tracker

Re: MultiPhrase search

2008-08-26 Thread Daniel Naber
On Dienstag, 26. August 2008, Andre Rubin wrote: For some reason, the TermQuery is not returning any results, even when querying for a single word (like on*). Sorry, I meant PrefixQuery. Also, do not add the * to the search string when creating the PrefixQuery. Regards Daniel --

Re: Deleted document terms

2008-08-26 Thread Kalani Ruwanpathirana
Hi John, Are you sure you made the id tokenized while indexing? I could overcome this issue by having a tokenized field, which was used for the deletion as below. document.add(new Field(id, id, Field.Store.YES, *Field.Index.TOKENIZED*)); Thanks On Tue, Aug 26, 2008 at 2:15 PM, Michael

Re: Deleted document terms

2008-08-26 Thread John Patterson
That was the problem - the id was not tokenized. Thanks for your help. Kalani Ruwanpathirana wrote: Hi John, Are you sure you made the id tokenized while indexing? I could overcome this issue by having a tokenized field, which was used for the deletion as below. document.add(new

Re: Deleted document terms

2008-08-26 Thread Michael McCandless
Normally an ID should be indexed as Field.Index.UN_TOKENIZED. Mike John Patterson wrote: That was the problem - the id was not tokenized. Thanks for your help. Kalani Ruwanpathirana wrote: Hi John, Are you sure you made the id tokenized while indexing? I could overcome this issue

RE: Case Sensitivity

2008-08-26 Thread Dino Korah
A little more case sensitivity questions. Based on the discussion on http://markmail.org/message/q7dqr4r7o6t6dgo5 and on this thread, is it right to say that a field, if either UN_TOKENIZED or NO_NORMS-ized, it doesn't get analyzed while indexing? Which means we need to case-normalize (down-case)

Add IndexReader#getCommitPoints()

2008-08-26 Thread Noble Paul നോബിള്‍ नोब्ळ्
Is there any way to know what are the available commit points available in an index? This would be helpful to provide a rollback feature to rollback to a commitpoint . --Noble - To unsubscribe, e-mail: [EMAIL PROTECTED] For

Re: Add IndexReader#getCommitPoints()

2008-08-26 Thread Michael McCandless
Yes -- there's now (in trunk) a static IndexReader.listCommits(Directory) method. Mike Noble Paul നോബിള്‍ नोब्ळ् wrote: Is there any way to know what are the available commit points available in an index? This would be helpful to provide a rollback feature to rollback to a commitpoint .

Re: SpanQuery and FilteredQuery

2008-08-26 Thread Eran Sevi
Hi Chris, I asked exactly the same question a little while ago and got a pretty good answer from Paul Elschot. Try searching the archives for 'Filtering a SpanQuery'. It was around the 13/5/08. Hope it helps, Eran. On Mon, Aug 25, 2008 at 8:18 PM, Christopher M Collins [EMAIL PROTECTED]wrote:

Re: How do TeeTokenizer and SinkTokenizer work?

2008-08-26 Thread Grant Ingersoll
On Aug 25, 2008, at 7:29 PM, Teruhiko Kurosaka wrote: Thank you, Grant and (Koji) Sekiguchi-san. but I don't understand how the input from reader1 and reader2 are mixed together. Will sink1 first reaturn the reader1 text, and reader2? It depends on the order the fields are added. If

RE: Case Sensitivity

2008-08-26 Thread Dino Korah
I think I should rephrase my question. [ Context: Using out of the box StandardAnalyzer for indexing and searching. ] Is it right to say that a field, if either UN_TOKENIZED or NO_NORMS-ized ( field.setOmitNorms(true) ), it doesn't get analyzed while indexing? Which means that when we search, it

RE: How to search

2008-08-26 Thread Jiao, Jason (NSN - CN/Cheng Du)
The lucene FAQ says: What wildcard search support is available from Lucene? Lucene supports wild card queries which allow you to perform searches such as book*, which will find documents containing terms such as book, bookstore, booklet, etc. Lucene refers to this type of a query as a 'prefix

Luke issues Unknown format version: -6

2008-08-26 Thread Jiao, Jason (NSN - CN/Cheng Du)
Hi there, I use luke v0.8.1 which build base on lucene 2.3.0. First, I run lucene/demo/IndexFiles to build index successfully. Then I use luke to open index, but luke issues Unknown format version: -6 . I check the documentation of lucene which said lucene 2.3.2 does not contain any new

Re: Luke issues Unknown format version: -6

2008-08-26 Thread Michael McCandless
I think you need to triple check your CLASSPATH? It seems like you are somehow getting and older version of Luke. The file format definitely did not change from 2.3.0 -- 2.3.2. Mike Jiao, Jason (NSN - CN/Cheng Du) wrote: Hi there, I use luke v0.8.1 which build base on lucene

Combining Wildcard and Term Queries?

2008-08-26 Thread Chris Bamford
Can you combine these two queries somehow so that they behave like a PhraseQuery? I have a custom query parser which takes a phrase like *at sat and produces a BooleanQuery consisting of a WildcardQuery ('*at') and a TermQuery ('sat'). This works, but matches more widely than expected (by

Re: MultiPhrase search

2008-08-26 Thread Andre Rubin
That worked great! Thanks Daniel. I just have one more use case. I want the same prefix search as before, plus another match in another field. I was using MultiFieldQueryParser.parse(), but then I have the same problem with the One Tw* query, cause MultiFieldQueryParser.parse() returns a

Re: Combining Wildcard and Term Queries?

2008-08-26 Thread Daniel Naber
On Dienstag, 26. August 2008, Chris Bamford wrote: Can you combine these two queries somehow so that they behave like a PhraseQuery? You can use MultiPhraseQuery, see http://lucene.apache.org/java/2_3_2/api/core/org/apache/lucene/search/MultiPhraseQuery.html Regards Daniel --

Re: MultiPhrase search

2008-08-26 Thread Daniel Naber
On Dienstag, 26. August 2008, Andre Rubin wrote: I just have one more use case. I want the same prefix search as before, plus another match in another field. Not sure if I'm following you, but you can create your own BooleanQuery programmatically, and then add the original PrefixQuery and any

Re: MultiPhrase search

2008-08-26 Thread Andre Rubin
Now I was the one who didn't follow: How do I add a query to an existing query? Let me be more clear on my use case: I have two documents: 1) label:One Two Three type:sequence 2) label:One Two FOUR type:other I want to be able to make the same kind of search as you described earlier using

Re: Combining Wildcard and Term Queries?

2008-08-26 Thread Chris Bamford
Daniel, That sounds like what I'm after - but how do I get hold of the IndexReader so I can call IndexReader.terms(Term) ? The code where I am doing this work is getFieldQuery(String field, String queryText) of my custom query parser ... Thanks, - Chris Daniel Naber wrote: On Dienstag,

Re: Combining Wildcard and Term Queries?

2008-08-26 Thread Daniel Naber
On Dienstag, 26. August 2008, Chris Bamford wrote: That sounds like what I'm after - but how do I get hold of the IndexReader so I can call IndexReader.terms(Term) ? The code where I am doing this work is getFieldQuery(String field, String queryText) of my custom query parser ... QueryParser

Re: MultiPhrase search

2008-08-26 Thread Daniel Naber
On Dienstag, 26. August 2008, Andre Rubin wrote: Now I was the one who didn't follow: How do I add a query to an existing query? Something like this should work: BooleanQuery bq = new BooleanQuery(); PrefixQuery pq = new PrefixQuery(...); bq.add(pq, BooleanClause.Occur.MUST); TermQuery tq =

Re: MultiPhrase search

2008-08-26 Thread Andre Rubin
Thanks again Daniel, It's working now. But for some reason, TermQuery is not working for me (i think because I have special characters in the query). I replaced the TermQuery with the query below and I got the results I was expecting. Thanks String escapedType = QueryParser.escape(type);

lucene 3.0 feature list?

2008-08-26 Thread Darren Govoni
Hi, Sorry if I missed this somewhere or maybe its not released yet, but I was anxiously curious about lucene 3.0's expected features/improvements. Is there a list yet? thanks! Darren - To unsubscribe, e-mail: [EMAIL

Re: lucene 3.0 feature list?

2008-08-26 Thread Karl Wettin
27 aug 2008 kl. 00.52 skrev Darren Govoni: Hi, Sorry if I missed this somewhere or maybe its not released yet, but I was anxiously curious about lucene 3.0's expected features/ improvements. Is there a list yet? If everything goes as planned then Lucene 3.0 will be the same as Lucene

system design for big numbers

2008-08-26 Thread Giovanni Mascia
I've been wandering for a while through this list and other Lucene resources on the web trying to figure out the possible outlines of a search solution which could fit my case. But as a Lucene newbie I decided to ask for your help. Now this is the scenario. I am building a webmail application

Re: Upgrading from v2.2.0 to v2.3.2

2008-08-26 Thread Mark Lassau
Mike, Thanks for the prompt response. Michael McCandless wrote: Mark Lassau wrote: I am a developer on the JIRA Issue tracker, and we are considering upgrading our Lucene version from v2.2.0 to v2.3.2. I have been charged with doing the risk analysis, and project work. I have read the

Re: system design for big numbers

2008-08-26 Thread Otis Gospodnetic
Giovanni, You could try the approach you described - one index per user. When I built Simpy (see http://simpy.com ) a few years ago I chose the same approach and I never regretted it. The hardware behind Simpy is very modest, usage is high, and I never had problems with too many indices open

Re: Re: system design for big numbers

2008-08-26 Thread tom
AUTOMATIC REPLY Tom Roberts is out of the office till 2nd September 2008. LUX reopens on 1st September 2008 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Re: Re: system design for big numbers

2008-08-26 Thread tom
AUTOMATIC REPLY Tom Roberts is out of the office till 2nd September 2008. LUX reopens on 1st September 2008 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Case Sensitivity

2008-08-26 Thread Otis Gospodnetic
Dino, you lost me half-way through your email :( NO_NORMS does not mean the field is not tokenized. UN_TOKENIZED does mean the field is not tokenized. Otis-- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Dino Korah [EMAIL PROTECTED] To:

Re: Case Sensitivity

2008-08-26 Thread Otis Gospodnetic
Dino, If a field is not tokenized then it is indexed as is. For example: Dino Korah would get indexed just like that. It would not get split into multiple tokens, it would not be lowercased, it would not have any stop words removed from it, etc. Otis -- Sematext -- http://sematext.com/ --