Re: Question regarding adding documents

2008-01-06 Thread Daniel Noll
On Monday 07 January 2008 11:35:59 chris.b wrote: > is it possible to add a document to an index and, while doing so, get the > terms in that document? If so, how would one do this? :x My first thought would be: when adding fields to the document, use the Field constructors which accept a TokenSt

Fullwidth alphanumeric characters, plus a question on Korean ranges

2008-01-06 Thread Daniel Noll
Hi all. We discovered that fullwidth letters are not treated as and fullwidth digits are not treated as . This in itself is probably easy to fix (including the filter for normalising these back to the normal versions) but while sanity checking the blocks in StandardTokenizer.jj I found some s

Question regarding adding documents

2008-01-06 Thread chris.b
is it possible to add a document to an index and, while doing so, get the terms in that document? If so, how would one do this? :x thanks :) -- View this message in context: http://www.nabble.com/Question-regarding-adding-documents-tp14656336p14656336.html Sent from the Lucene - Java Users mail

Re: Merging Lucene documents

2008-01-06 Thread Erick Erickson
Well, I wonder if offsets mean anything in this context. I suppose another way of asking this is "why do you care?". What is it that you're trying to make happen? Or prevent from happening? The offsets are used for a variety of purposes, but what does it mean for two offsets to be "close" in this

Re: Merging Lucene documents

2008-01-06 Thread Developer Developer
Hi Eric, No, you are not off base. You are on track, but here is my problem. I have a requirement to create one lucene document per site i.e suppose I crawl www.xxx.com which has 1000 pages in it. If I use nutch then it will create 1000 lucene documents i.e 1 document per page. My requirement is

Re: Merging Lucene documents

2008-01-06 Thread Erick Erickson
I don't get what you mean about extracting tokenstreams. Tokenstreams are, as far as I understand, an analysis-time class. That is, either when originally indexing the document or when analyzing a query. If you do not have the entire document stored in the index, you have to do something like reco

Merging Lucene documents

2008-01-06 Thread Developer Developer
Hello Friends, I have a unique requirement of merging two or more lucene indexed documents into just one indexed document . For example Document newDocutmet = doc1+doc2+doc3 In order to do this I am planning to extract tokenstreams form each document ( i.e doc1, doc2 and doc3) , and use them to

Query processing with Lucene

2008-01-06 Thread Marjan Celikik
Dear all, Maybe this topic is already discussed (then can I get a reference please?)... I would like to know how does Lucene actually process the query. For example, take a 2-word query "x y". Does Lucene fetch the lists of "x" and "y" and intersect them, or do they do something more fancy, f

Re: CachingWrapperFilter: why cache per IndexReader?

2008-01-06 Thread Timo Nentwig
On Wednesday 02 January 2008 08:03:48 Chris Hostetter wrote: > 1) there is a semi-articulated goal of moving away from "under the > coveres" weakref caching to more explicit and controllable caching ... YES! BTW why havin caching been removed from QueryFilter at all? Isn't caching the only sens