I'm still not sure I understand ...
If the first document includes "Lucene in Action. Lucene" (two sentences,
the 2nd one with Lucene only) and the second "Lucene for Dummies", then what
exactly do you want to get for the queries "\"Lucene in Action\"" and
"\"Lucene\""?
If I understand correctly,
Not good!
Can you describe how your threads work with Lucene?
Is this just a local filesystem (disk) under vista?
Mike
On Wed, Apr 14, 2010 at 7:41 AM, jm wrote:
> Hi,
>
> I am trying to chase an issue in our code and it is being quite
> difficult. We have seen two instances (see below) where
I don't think there's an existing tool, but it shouldn't be too hard to create.
Create a new SegmentInfos(), then call its .read(oldDir) to read all
segments. Look up the SegmentInfo(s) you want to copy and call their
.files() methods to see which files to copy. Copy them. Remove all
other segm
Is there a program available that makes a new index with one or more
segments from an existing index? (The immediate use case for this is
doing forensics on corrupted indexes.)
The user interface would be:
extract -segments _ab,_g9 oldindex newindex
This would copy the files for segments _ab and
Thanks, the problem was with tokenizer, which didn't index any numbers, so I
tried writing my own, and it works perfectly! :)
Sincerely,
Kristjan Siimson
On Wed, Apr 14, 2010 at 2:12 PM, Uwe Schindler wrote:
> You can add the terms with Field.Index.NOT_ANALYZED multiple times to the
> same fiel
One addition:
If you are indexing millions of numeric fields, you should also try to reuse
NumericField and Document instances (as described in JavaDocs). NumericField
creates internally a NumericTokenStream and lots of small objects (attributes),
so GC cost may be high. This is just another ide
Hi Tomislav,
indexing with NumericField takes longer (at least for the default precision
step of 4, which means out of 32 bit integers make 8 subterms with each 4 bits
of the value). So you produce 8 times more terms during indexing that must be
handled by the indexer. If you have lots of docum
You can add the terms with Field.Index.NOT_ANALYZED multiple times to the same
field. If you use an analyzer like WhitespaceAnalyzer and you analyze your
tersm, you must also pass the analyzed term through analyzer when building a
TermQuery. This may explain, why you don’t get those IDs.
But fo
Hmmm... Seems like a lot of work to be done. I will try these options and
update.
Thanks a lot.
Best.
--
View this message in context:
http://n3.nabble.com/Problem-with-search-tp717137p719604.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
--
Run this:
svn co https://svn.apache.org/repos/asf/lucene/java/branches/lucene_2_9
lucene.29x
Then apply the patch, then, run "ant jar-core", and in that should
create the lucene-core-2.9.2-dev.jar.
Mike
On Wed, Apr 14, 2010 at 1:28 PM, Woolf, Ross wrote:
> How do I get to the 2.9.x branch?
>From your PyLucene thread it looks like this may be a known mem leak
in PyLucene 2.4 (fixed in 2.9)?
Mike
On Wed, Apr 14, 2010 at 11:13 AM, Herbert Roitblat wrote:
> Thanks, Michael.
>
> I have not had a chance to try your whittled example yet. Another problem
> captured my attention.
>
> What
Hi Kristjan,
which Tokenizer and Filters are you using for the ID field?
Rene
Am 14.04.2010 21:15, schrieb Kristjan Siimson:
Hello,
I have document for which I'd like to index an array of indexes. For
example, there is a product that belongs to categories with IDs 12, 15, 16,
145, 148. I'd li
Hello,
I have document for which I'd like to index an array of indexes. For
example, there is a product that belongs to categories with IDs 12, 15, 16,
145, 148. I'd like to index these categories, and then be able to use them
in queries, so that I can search for product which's name is "Bottle" a
Hi,
is it normal for indexing time to increase up to 10 times after
introducing NumericField instead of Field (for two fields)?
I've changed two date fields from String representation (Field) to
NumericField, now it is:
doc.add(new NumericField("time").setIntValue(date.getTime()/24/3600))
and a
How do I get to the 2.9.x branch? Every link I take from the Lucene site takes
me to the trunk which I assume is the 3.x version. I've tried to look around
svn but can't find anything labeled 2.9.x. Is there a daily build of 2.9.x or
do I need to build it myself. I would like to try out the
Actually the doc1 with the terms to be searched, has two words "Lucene in
Action" and "Lucene". I want when I pass "Lucene in Action", it shows the
result and remove the word not to be found when I pass only the term
"Lucene". In short, the term "Lucene" not find the phrase "Lucene in
Action", sinc
Thanks, Michael.
I have not had a chance to try your whittled example yet. Another problem
captured my attention.
What I have done, is use a single reader over and over. It does not seem to
make any difference. I don't close it at all, now. It sped up my process a
bit (12 docs/second rathe
Hi Franz,
The likely problem is that you're using an index-time analyzer that strips out
the parentheses. StandardAnalyzer, for example, does this; WhitespaceAnalyzer
does not.
Remember that hits are the result of matches between index-analyzed terms and
query-analyzed terms. Except in the c
Hi all,
say I have an Index with one field named "category". There are two documents
one with value "(testvalue)" and one with value "test value".
Now somone search with "test". My Searchenine uses the
org.apache.lucene.search.PrefixQuery and finds 2 documents. Maybe he estimated
only one hit;
Hi,
I am trying to chase an issue in our code and it is being quite
difficult. We have seen two instances (see below) where we get the
same error. I have been trying to reproduce but it has been impossible
so far.I have several threads, some might be creating indices and
adding documents, others c
It looks like the mailing list software stripped your image attachments...
Alas these fixes are only committed on 3.1.
But I just posted the patch on LUCENE-2387 for 2.9.x -- it's a tiny
fix. I think the other issue was part of LUCENE-2074 (though this
issue included many other changes) -- Uwe c
I don't know if that proposal is the most efficient one, but you can try it.
In general, what you're looking for is a GROUP BY Bill-Id feature and then
select the most recent one, right? Only you don't need all the Versions of
the same Bill, and therefore you can hold the most recent Version-Id onl
22 matches
Mail list logo