RE: Newbie question: lucene sorting problems and stored fields

2006-09-14 Thread Doron Cohen
Yes this is correct - updating the doc this way the unstored fields are getting lost. This update scenario was also discussed in http://www.nabble.com/Updating-documents-tf70183.html#a189951. "Alan Boshier" <[EMAIL PROTECTED]> wrote on 14/09/2006 15:39:31: > > Thanks for your help - I think I may

Boosting specific Searchable

2006-09-14 Thread Shane
When using the MultiSearcher to search over a set of indexes, I would like to increase the boost factor for documents coming from a specific index. Using the example below, I would like to tell the MultiSearcher to boost documents coming from index0: Searcher[] searchers = new Searcher[3]; se

RE: best way to get specific results

2006-09-14 Thread Lee_Gary
Hi, I have the same situation where Im interested in returning a subset of results from the whole set, such as results 500 to 550. However, I have already implemented a Filter that will return the results I want without additional query processing needed (i.e. no need to use the IndexSearcher.sear

RE: Newbie question: lucene sorting problems and stored fields

2006-09-14 Thread Alan Boshier
Thanks for your help - I think I may have stumbled on the answer but if someone can confirm it I would be most grateful. My guess is that, if we do the following 1. Retrieve a Document instance D from the index using e.g. IndexSearcher.search() 2. Delete the original Document corresponding t

RE: UTF8 accents & umlauts filter?

2006-09-14 Thread Binkley, Peter
We use ICU4J to do the filtering based on Unicode blocks. See http://icu.sourceforge.net/userguide/Transform.html for a sense of what you can do. It's worth it for us because we need to normalize cyrillic as well as roman text; it might be overkill for other situations. But it does good work. The f

Re: SV: SV: SV: Changing the Scoring api

2006-09-14 Thread Chris Hostetter
I obviously missunderstood your goal ... my reading of your question was that you wanted the sum of the scores of individual terms (based on the tf and idf) to matter, and you wanted the field norm values of the docs to be taken into account (for "date boosting" purposes), but you did not want doc

Re: Newbie question: lucene sorting problems and stored fields

2006-09-14 Thread Yonik Seeley
On 9/14/06, Alan Boshier <[EMAIL PROTECTED]> wrote: That was my understanding (that they had to be indexed) but making them stored seems to have fixed the problem we were seeing, which is odd. Not being an expert on how lucene works internally, I'm struggling to see how this change could have ma

Re: Newbie question: lucene sorting problems and stored fields

2006-09-14 Thread Erick Erickson
from the 2.0 javadoc, the Sort class, so I don't know if it applies. <<>> Is it possible you're tokenizing it? I'm at a loss as to why *storing* it would change the behavior, but I guess it's a possibility. Erick On 9/14/06, Alan Boshier <[EMAIL PROTECTED]> wrote: That was my un

Re: Newbie question: lucene sorting problems and stored fields

2006-09-14 Thread Yonik Seeley
On 9/14/06, Alan Boshier <[EMAIL PROTECTED]> wrote: Is it a requirement when creating a field for sorting to make it stored? No, stored doesn't matter... it must be indexed though. -Yonik http://incubator.apache.org/solr Solr, the open-source Lucene search server -

Field boosting in MemoryIndex

2006-09-14 Thread Garrick Toubassi
Hi, I am playing with MemoryIndex for a situation in which I have a large number of small, ephemeral documents that I need to fire queries at. It appears to be at least 5x faster than RAMDirectory for my usage, which is large enough to be interesting. However MemoryIndex does not seem to support

RE: Newbie question: lucene sorting problems and stored fields

2006-09-14 Thread Alan Boshier
That was my understanding (that they had to be indexed) but making them stored seems to have fixed the problem we were seeing, which is odd. Not being an expert on how lucene works internally, I'm struggling to see how this change could have made any difference. -Original Message- Fro

RE: Newbie question: lucene sorting problems and stored fields

2006-09-14 Thread Mordo, Aviran (EXP N-NANNATEK)
AFIK, the field has to be indexed, but I don't think it has to be stored (but then again maybe I'm wrong) Aviran http://www.aviransplace.com -Original Message- From: Alan Boshier [mailto:[EMAIL PROTECTED] Sent: Thursday, September 14, 2006 11:39 AM To: java-user@lucene.apache.org Subjec

ParallelMultiSearcher and docFreq

2006-09-14 Thread Yura Smolsky
Hello. Here is the situation. I have ParallelMultiSearcher object initializated with two or more RemoteSearchable's. I run PrefixQuery search on some keyword field, say "link". When I run search starting just with letter "w" (link:w*) then I should have like 5k results. As I know when I perform

Newbie question: lucene sorting problems and stored fields

2006-09-14 Thread Alan Boshier
Hi We are seeing intermittent problems with searches that use sorted fields (in lucene 1.4.3). If we add the fields to our Documents as 'unstored' then we start to see results that have been sorted by Document ID. The problem goes away if we add the fields as 'stored'. Is it a requirement when

SV: SV: SV: Changing the Scoring api

2006-09-14 Thread Marcus Falck
Apparently some modifications in the DisjunctionSumScorer class seems to give me exactly what I'm looking for. So it was possible =) -Ursprungligt meddelande- Från: Marcus Falck [mailto:[EMAIL PROTECTED] Skickat: den 14 september 2006 09:56 Till: java-user@lucene.apache.org Ämne: SV: S

Using Lucene to index Meta-data from txt, html, PDF etc files.

2006-09-14 Thread Aditya Gollakota
Hi Guys, Just wondering how you would go about indexing meta-data from files. I've used the demo package IndexHTMLjava and have updated the HTMLDocument.java with the following: DataInput input = new DataInputStream(new BufferedInputStream(new FileInputStream(f))); Content content = Conten

SV: SV: SV: Changing the Scoring api

2006-09-14 Thread Marcus Falck
Yeah Hoss you are right this isn't java it's the .NET port. But I have to ask at this mail list since it contains a lot of people with a lot more insight in lucene then on the .NET user list. And I have a hard time to believe that they wouldn't have ported the scoring parts correctly. First