On Monday 07 January 2008 11:35:59 chris.b wrote:
> is it possible to add a document to an index and, while doing so, get the
> terms in that document? If so, how would one do this? :x
My first thought would be: when adding fields to the document, use the Field
constructors which accept a TokenSt
Hi all.
We discovered that fullwidth letters are not treated as and fullwidth
digits are not treated as .
This in itself is probably easy to fix (including the filter for normalising
these back to the normal versions) but while sanity checking the blocks in
StandardTokenizer.jj I found some s
is it possible to add a document to an index and, while doing so, get the
terms in that document? If so, how would one do this? :x
thanks :)
--
View this message in context:
http://www.nabble.com/Question-regarding-adding-documents-tp14656336p14656336.html
Sent from the Lucene - Java Users mail
Well, I wonder if offsets mean anything in this context. I suppose another
way of asking this is "why do you care?". What is it that you're trying
to make happen? Or prevent from happening?
The offsets are used for a variety of purposes, but what does it mean
for two offsets to be "close" in this
Hi Eric,
No, you are not off base. You are on track, but here is my problem.
I have a requirement to create one lucene document per site i.e suppose I
crawl www.xxx.com which has 1000 pages in it. If I use nutch then it will
create 1000 lucene documents i.e 1 document per page. My requirement is
I don't get what you mean about extracting tokenstreams. Tokenstreams
are, as far as I understand, an analysis-time class. That is, either when
originally indexing the document or when analyzing a query.
If you do not have the entire document stored in the index, you have to
do something like reco
Hello Friends,
I have a unique requirement of merging two or more lucene indexed documents
into just one indexed document . For example
Document newDocutmet = doc1+doc2+doc3
In order to do this I am planning to extract tokenstreams form each document
( i.e doc1, doc2 and doc3) , and use them to
Dear all,
Maybe this topic is already discussed (then can I get a reference
please?)... I would like to know how does Lucene actually process the
query. For example, take a 2-word query "x y". Does Lucene fetch the
lists of "x" and "y" and intersect them, or do they do something more
fancy, f
On Wednesday 02 January 2008 08:03:48 Chris Hostetter wrote:
> 1) there is a semi-articulated goal of moving away from "under the
> coveres" weakref caching to more explicit and controllable caching ...
YES!
BTW why havin caching been removed from QueryFilter at all? Isn't caching the
only sens