Re: Field.omitTF

2008-12-18 Thread John Wang
Thanks Mark for the pointer! -John On Thu, Dec 18, 2008 at 6:13 PM, Mark Miller wrote: > No, not a bug, certainly its the intended behavior (though the name is a > bit tricky isn't it? I've actually thought about that in the past myself). > If you check out the javadoc on Fieldable youll find: >

Re: Approximate release date for Lucene 2.9

2008-12-18 Thread Mark Miller
Mark Miller wrote: TrieRangeQuery has been added to contrib. Super awesome, super efficient, large scale sorting. Sorry. Its way past my bedtime. Large scale numerical range searching. Sorting on the brain. - To unsubscr

Re: Approximate release date for Lucene 2.9

2008-12-18 Thread Mark Miller
Well look at the issues and see for yourself :) Its a subjective call I think. Heres my take: There are not going to be too many sweeping changes in the next release. There are tons of little bug fixes and improvements, but not a lot of the bullet point type stuff that you mention in your wish

Re: optimize: went from 14488449 to 38449

2008-12-18 Thread 1world1love
Ganesh - yahoo wrote: > > Optimize will remove the deletes and rearrange the document numbers. > > Have you done some deletes before deleting 1.3 million docs? > > No, that is the crazy part. I haven't done anything to this index since it was first compiled until I did the deletes. That is

Re: optimize: went from 14488449 to 38449

2008-12-18 Thread Ganesh
Optimize will remove the deletes and rearrange the document numbers. Have you done some deletes before deleting 1.3 million docs? Regards Ganesh - Original Message - From: "1world1love" To: Sent: Friday, December 19, 2008 9:49 AM Subject: optimize: went from 14488449 to 38449 Ok

Re: Approximate release date for Lucene 2.9

2008-12-18 Thread Ganesh
Does Lucene 2.9 has real time search? Any improvements in sorting? Any facility to store a payload per document (without updating document)? Please highlight the important feature? Regards Ganesh - Original Message - From: "Michael McCandless" To: Sent: Friday, December 19, 2008 3:

optimize: went from 14488449 to 38449

2008-12-18 Thread 1world1love
Ok. This is crazy. I have an index with 14,488,449 docs in it. Today I did a CheckIndex on it and everything looked fine. I made a copy of the index, ran a delete on about 1.3 million docs and then did an optimize and now my doc count is 38449. The index was originally built with 2.3, but I am no

Re: Field.omitTF

2008-12-18 Thread Mark Miller
No, not a bug, certainly its the intended behavior (though the name is a bit tricky isn't it? I've actually thought about that in the past myself). If you check out the javadoc on Fieldable youll find: /** Expert: * * If set, omit term freq, positions and payloads from postings for this f

Re: Field.omitTF

2008-12-18 Thread John Wang
Thanks Mark!I don't think it is documented (at least the ones I've read), should this be considered as a bug or ... ? Thanks -John On Thu, Dec 18, 2008 at 2:05 PM, Mark Miller wrote: > Drops positions as well. > > - Mark > > > > On Dec 18, 2008, at 4:57 PM, "John Wang" wrote: > > Hi: >> In

Re: Approximate release date for Lucene 2.9

2008-12-18 Thread Michael McCandless
Well... there are a couple threads on java-dev discussing this "now": http://www.nabble.com/2.9-3.0-plan---Java-1.5-td20972994.html http://www.nabble.com/2.9,-3.0-and-deprecation-td20099343.html though they seem to have petered out. Also we have 29 open issues for 2.9: https://issues.a

Re: Field.omitTF

2008-12-18 Thread Mark Miller
Drops positions as well. - Mark On Dec 18, 2008, at 4:57 PM, "John Wang" wrote: Hi: In lucene 2.4, when Field.omitTF() is called, payload is disabled as well. Is this intentional? My understanding is payload is independent from the term frequencies. Thanks -John

Field.omitTF

2008-12-18 Thread John Wang
Hi: In lucene 2.4, when Field.omitTF() is called, payload is disabled as well. Is this intentional? My understanding is payload is independent from the term frequencies. Thanks -John

Approximate release date for Lucene 2.9

2008-12-18 Thread Kay Kay
Hi - I am just curious - what is the approximate release target date that we have for Lucene 2.9 ( currently in beta in dev). - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: j

Re: Combining results of multiple indexes

2008-12-18 Thread Erick Erickson
I would recommend, very strongly, that you don't rely on the doc IDs being the same in two different indexes. Doc IDs are just incremented by one for each doc added, but. optimization can change the doc ID. and is guaranteed to change at least some of them if there are deletions from your inde

Re: Combining results of multiple indexes

2008-12-18 Thread Michael McCandless
These results are surprising. I'd expect single IndexWriter with 2 threads to do better than a single thread, but in your test two threads are significantly worse than one. Is it possible there's a bottleneck outside of Lucene in sourcing the documents? How many segments are produced a

RE: double metaphone for misspellings

2008-12-18 Thread Geoff Hendrey
I would think that if the place names are English, which those in Boston would be, then they would be reasonable candidates for soundex and double metaphone. I am considering an approach where I store SOUNDEX, refined SOUNDEX, doublemetaphone, and I'll look into ngram as well, and search against ea

Re: RESOLVED: help: java.lang.ArrayIndexOutOfBoundsException ScorerDocQueue.downHeap

2008-12-18 Thread Paul Elschot
Op Wednesday 17 December 2008 22:49:08 schreef 1world1love: > Just an FYI in case anyone runs into something similar. > > Essentially I had indexes that I have been searching from a java > stored procedure in Oracle without issue for awhile. All of a sudden, > I started getting the error I alluded

Re: Combining results of multiple indexes

2008-12-18 Thread Preetham Kajekar
Hi, I noticed that the doc id is the same. So, if I have HitCollector, just collect the doc-ids of both Searchers (for the two indexes) and find the intersection between them, it would work. Also, get the doc is even where there are large number of hits is fast. Of course, I am using somethin

Re: lucene 2.4 sorting slowness

2008-12-18 Thread Chris Salem
that makes it much faster (<100ms after the first run). thanks alot. also, the index will be updated oftenly throughout the day, will keeping the indexreader open recognize updates to the index? Sincerely, Chris Salem Development Team Main Sequence Technologies, Inc. PCRecruiter.net - PCRecruite

Re: Combining results of multiple indexes

2008-12-18 Thread Preetham Kajekar
Thanks. Yep the code is very easy. However, it take about 3 mins to complete merging. Looks like I will need to have an out of band merging of indexes once they are closed (planning to store about 50mil entries in each index partition) However, as the data is being indexed, is there any oth

Re: How to search on all fields with one query ?

2008-12-18 Thread Erick Erickson
A lot depends upon what you mean by "search across all fields". For single-term queries, that's pretty straight forward, but for, say, (a AND b) what does it mean to "search across all fields"? Should you get a hit if a appears only in field1 and b appears only in field 2? Or should you only get a

Re: TR: How to search on all fields with one query ?

2008-12-18 Thread Paul Libbrecht
I would do query expansion: - receive the query, parse it the way you want (e.g. use QueryParser) - then expand your query along the various fields If using different analyzers per field (e.g. soundex), you'll have to adjust things when coming into the term-query. paul Le 18-déc.-08 à 16:0

TR: How to search on all fields with one query ?

2008-12-18 Thread DELAVENNE Guillaume
Hi, I'm beginner on Lucene. I'm working on a Poc Lucene project at Generali France company. I have 40 fields (max ten words by field) in my index of about 6 millions documents. I need to search a word in all fields. Must I create a field "content" with all the informations of the others fields ?

Re: How to search documents taking in account the dates ???

2008-12-18 Thread Ariel
Thank you, it works very good. Regards Ariel On Thu, Dec 18, 2008 at 8:22 AM, Erick Erickson wrote: > Use the setSort that takes an array of Sort objects... > > On Thu, Dec 18, 2008 at 8:11 AM, Ariel wrote: > > > What I am doing is this: > > > >Sort sort = new Sort(); > >

Re: How to search documents taking in account the dates ???

2008-12-18 Thread Erick Erickson
Use the setSort that takes an array of Sort objects... On Thu, Dec 18, 2008 at 8:11 AM, Ariel wrote: > What I am doing is this: > >Sort sort = new Sort(); >sort.setSort("year", true); >hits = searcher.search(pquery,sort); > > > How I must put my code to sort

Re: How to search documents taking in account the dates ???

2008-12-18 Thread John Byrne
Hi, I think this should do it... SortField dateSortField = new SortField("year", false);//the second argument reverses the sort direction if set to true SortField scoreSortField= new SortField(null, SortField.SCORE, false); // value of null for field, since 'score' is not reall

Re: Combining results of multiple indexes

2008-12-18 Thread Erick Erickson
You will be stunned at how easy it is. The merging code should be a dozen lines (and that only if you are merging 6 or so indexes) See IndexWriter.addIndexes or IndexWriter.addIndexesNoOptimize Best Erick On Thu, Dec 18, 2008 at 5:03 AM, Preetham Kajekar wrote: > Hi, > I tried out a single

Re: How to search documents taking in account the dates ???

2008-12-18 Thread Ariel
What I am doing is this: Sort sort = new Sort(); sort.setSort("year", true); hits = searcher.search(pquery,sort); How I must put my code to sort first by date an then by score ??? Greetings Ariel On Thu, Dec 18, 2008 at 4:48 AM, Ian Lea wrote: > Lucene let

RE: double metaphone for misspellings

2008-12-18 Thread Max Metral
Somehow I seem to have missed (and can't find) your original mail, but it seems like you're asking about using double metaphone for place names. We've done this on our site (http://boston.povo.com) for street and place names, and I can't say we've been happy with the results. We're toying with ngr

Re: Order of fields returned by Document.getFields()

2008-12-18 Thread Grant Ingersoll
On Dec 17, 2008, at 11:56 AM, Yonik Seeley wrote: On Wed, Dec 17, 2008 at 10:32 AM, Patrick Johnstone wrote: As I said in the original email, my issue is that I don't think Lucene is returning the fields in the original order anymore. Hmmm, you're right. http://wiki.apache.org/jakarta-luce

Re: Persian (Farsi) Language Analyzer

2008-12-18 Thread Grant Ingersoll
I don't know of any. I'd google for "Persian Lucene" or "Farsi Lucene". When I did that, I did see some researchers who did some experiments w/ Lucene and Persian. On Dec 17, 2008, at 8:12 AM, Ian Vink wrote: I have ported the Java version of the Arabic analyzer recently committed to Lu

Re: IndexReader delete

2008-12-18 Thread Ganesh
I am planning to keep indexing and searching in a single process and expose the search functionality as a service. In any case, i want the deletion to be done by reader, so that it could be reflected immediately in search. If it is done by writer, then i need to commit the changes, reopen the se

Re: addIndexesNoOptimize question

2008-12-18 Thread Michael McCandless
This was an attempt on addIndexesNoOptimize's part to "respect" the maxMergeDocs (which prevents large segments from being merged) you had set on IndexWriter. However, the check was too pedantic, and was removed as of 2.4, under this issue: https://issues.apache.org/jira/browse/LUCENE

Re: Combining results of multiple indexes

2008-12-18 Thread Preetham Kajekar
Hi, I tried out a single IndexWriter used by two threads to index different fields. It is slower than using two separate IndexWriters. These are my findings All Fields (9) using 1 IndexWriter 1 Thread - 38,000 object per sec 5 Fields using 1 IndexWriter 1 Thread - 62,000 object per sec A

Re: IndexReader delete

2008-12-18 Thread Ian Lea
Well, if the indexing is happening in a separate process then that will have locked the index and you won't be able to delete by reader in your search process. I'd suggest passing the deletions to the indexer process. In my experience everything works smoother when all index modifications happen

Re: Cache Used by IndexReader/IndexSearcher

2008-12-18 Thread Ian Lea
Hi Are all the queries broadly similar or are the later ones more complex? What happens if you switch the order and run the later queries first? Any complications like sorting? Has your jvm got enough memory? There is no IndexSearcher cache that you can increase. -- Ian. On Wed, Dec 17, 20

Re: How to search documents taking in account the dates ???

2008-12-18 Thread Ian Lea
Lucene lets you sort by multiple fields, including score. See the javadocs for Sort and SortField, specifically SortField.SCORE. -- Ian. On Wed, Dec 17, 2008 at 8:15 PM, Ariel wrote: > Hi: > This solution have a problem. > the results are sorted bye the year criteria but I need that after sort