On 21 Nov 2005, at 19:39, [EMAIL PROTECTED] wrote:
This is the results for the StandardTokenizer:
input - output token -
output type
1. 1.2 - 1.2 -
2. 1.2. - 1.2 -
3. a.b - a.b -
4. a.b. - a.b. -
5.
www.apache.org - www.apache.org -
6. www.apac
On 11/21/05, Erik Hatcher <[EMAIL PROTECTED]> wrote:
> Neither. It'll throw an exception.
Just don't rely on it to throw an exception either though... the
checking is not comprehensive.
One should treat sorting on a field with more than one value per
document as undefined.
-Yonik
Now hiring --
Sorry for the bad looking table. Retrying...
input string - output token
(output type)
1. 1.2 - 1.2 ()
2. 1.2. - 1.2 ()
3. a.b - a.b
()
4. a.b. - a.b. ()
5. www.apache.org - www.apache.org ()
6. www.apache.org. - www.apache.org. ()
--- java-user@lucene.apache.org
wrote:
This is the results for
This is the results for the StandardTokenizer:
input - output token -
output type
1. 1.2 - 1.2 -
2. 1.2. - 1.2 -
3. a.b - a.b -
4. a.b. - a.b. -
5.
www.apache.org - www.apache.org -
6. www.apache.org. - www.apache.org.
-
Number 6 should still be
On 21 Nov 2005, at 16:12, John Powers wrote:
If I sort on a field called sequence, but at document creation time
I add in
//create doc A
doc.add(Field.Text("sequence", "32"));
doc.add(Field.Text("sequence", "3"));
doc.add(Field.Text("sequence", "932"));
//create doc B
doc.add(Field.Text("seq
If I add keywords to a document at the same time, will they stay in that
order?
Create New doc A
doc.add(Field.Text("category", "toys"));
doc.add(Field.Text("sequence", "235"));
doc.add(Field.Text("category", "bears"));
doc.add(Field.Text("sequence", "63"));
doc.add(Field.Text("category", "truc
Thx Peter.. worked like a charm!
-Original Message-
From: Peter Kim [mailto:[EMAIL PROTECTED]
Sent: Monday, November 21, 2005 4:32 PM
To: java-user@lucene.apache.org
Subject: RE: Lucene Index Changed event
You can check IndexReader.getCurrentVersion() to see if the index
changed from the
You can check IndexReader.getCurrentVersion() to see if the index
changed from the last time you checked. The index's version number
changes whenever the index is updated.
Peter
> -Original Message-
> From: Aigner, Thomas [mailto:[EMAIL PROTECTED]
> Sent: Monday, November 21, 2005 3:48 P
On 21 Nov 2005, at 16:09, Yonik Seeley wrote:
The Analyzer extensions seem fine, but much more general purpose
than my need.
For your need (a global increment), isn't expanding analyzer
actually easier?
analyser = new OldAnalyzer() {
public int getPositionIncrementGap(String field) {
If I sort on a field called sequence, but at document creation time I add in
//create doc A
doc.add(Field.Text("sequence", "32"));
doc.add(Field.Text("sequence", "3"));
doc.add(Field.Text("sequence", "932"));
//create doc B
doc.add(Field.Text("sequence", "1"));
doc.add(Field.Text("sequence", "300
> > For position increments, it doesn't have to be tracked. The patch to
> > DocumentWriter could also be:
> >
> > int position = fieldPositions[fieldNumber];
> > + if (position>0) position+=analyzer.getPositionIncrementGap
> > (fieldName)
>
> This could be thwarted with tokens using zer
Hi all,
Is there an index changed event that I can jump on that will
tell me when my index has been updated so I can close and reopen my
searcher to get the new changes?
I can't seem to find the event, but see some tools that might
accomplish this (DLESE DPC software components?).
: " By default, no more than 10,000 terms will be
: indexed for a field."
:
: Given your note, then the docs do not mean that no
: more than 10,000 terms will be indexed, but that some
: smaller number of terms will be indexed and only the
: first 10,000 occurrances will be tallied.
It means that
On 21 Nov 2005, at 12:55, Yonik Seeley wrote:
On 11/21/05, Erik Hatcher <[EMAIL PROTECTED]> wrote:
Modifying Analyzer as you have suggested would
require DocumentWriter additionally keep track of the field names
and note when one is used again.
For position increments, it doesn't have to be t
On Monday 21 November 2005 14:28, [EMAIL PROTECTED] wrote:
> Just to make sure that I understand this correctly,
> the docs say:
>
> " By default, no more than 10,000 terms will be
> indexed for a field."
>
> Given your note, then the docs do not mean that no
> more than 10,000 terms will be ind
On 11/21/05, Erik Hatcher <[EMAIL PROTECTED]> wrote:
> Modifying Analyzer as you have suggested would
> require DocumentWriter additionally keep track of the field names
> and note when one is used again.
For position increments, it doesn't have to be tracked. The patch to
DocumentWriter could als
Jay Booth wrote:
I had a similar problem with threading, the problem turned out to be that in
the back end of the FSDirectory class I believe it was, there was a
synchronized block on the actual RandomAccessFile resource when reading a
block of data from it... high-concurrency situations caused t
Hi, Karl,
Therer have been quite some discussions regarding the "too many open files"
problem. From my understanding, it is due to Lucene trying to open multiple
segments at the same time (during search/merging segments), and the
operating system wouldn't allow opening that many file handles.
If
Thanks Jay Booth,
I thought as much. I just verified that I'm not reaching 100% CPU, and I
found out that when using RAMDirectory and 100 threads the CPU usage is 60%,
avarage request time 40 times more that one thread, but number of requests
the same. I think I'll have to do somthing like you sug
I had a similar problem with threading, the problem turned out to be that in
the back end of the FSDirectory class I believe it was, there was a
synchronized block on the actual RandomAccessFile resource when reading a
block of data from it... high-concurrency situations caused threads to stack
up
On 11/21/05, Oren Shir <[EMAIL PROTECTED]> wrote:
> It is rather sad if 10 threads reach the CPU limit. I'll check it and get
> back to you.
It's about performance and throughput though, not about number of
threads it takes to reach saturation.
In a 2 CPU box, I would say that the ideal situation
gekkokid,
does 1.4.3 benefit from multi-threading?
>
Sorry for not being clear. My tests show that both version does not benefit
from multi threading, but it is possible that I'm CPU bound, as Yonik kindly
reminded me.
is 1.9 the version in the source repository?
1.9 is the version in source re
This is expected behavior: you are probably quickly becoming CPU bound
(which isn't a bad thing). More threads only help when some threads
are waiting on IO, or if you actually have a lot of CPUs in the box.
-Yonik
Now hiring -- http://forms.cnet.com/slink?231706
On 11/21/05, Oren Shir <[EMAIL
Oren Shir wrote:
I tested this in version 1.4.3 and 1.9rc1, and they are both the same in
this aspect. 1.9rc1 is faster, but does not benefit from multi threading.
some newbie questions i have,
does 1.4.3 benefit from multi-threading?
is 1.9 the version in the source repository?
_gk
---
> > To get a higher limit. Of course, you could also change the Lucene source
> > file and recompile it. Note that you CANNOT just set the property in your
> > code, in general, as the Lucene class puts it into a static final int,
> > meaning it examines the value of the property (once) at
Hi,
I tried stressing Lucene in a controlled environment: one static
IndexSearcher for an index that doesn't change, and in same process I create
a number of Threads that call this Searcher concurrently for a limited time.
I expected the number of successful queries to increase when using more
thr
On 21 Nov 2005, at 08:37, Michael Curtin wrote:
That's probably because there is a limit built into Lucene where it
ignores any tokens in a field past the first 10,000. There is a
property you can set to increase this limit. I dont' have the
source in front of me right now, but if you go
> When I go and retrieve the term frequency vectors, for
> any document under about 90k, everything looks as
> expected. However for larger documents (I haven't
> found the exact point, but I know that those over 128k
> qualify) the sum of the term frequencies in the vector
> seems to max out at 1
Just to make sure that I understand this correctly,
the docs say:
" By default, no more than 10,000 terms will be
indexed for a field."
Given your note, then the docs do not mean that no
more than 10,000 terms will be indexed, but that some
smaller number of terms will be indexed and only the
fi
Hi,
I am using lucene 1.4.3. The basic functionality of the search is
simple, put in the keyword as java and it will display you all the books
having java keyword.
Now I have to add a feature which also shows the name of top authors (lets
say top 5 authors) with the number of books,
On 21 Nov 2005, at 04:26, Erik Hatcher wrote:
What about adding an offset to Field, setPositionOffset(int
offset)? Looking at DocumentWriter, it looks like this would be
the simplest thing that could work, without precluding the
interesting option of modifying Analyzer to allow with flags
Yonik,
Thanks for your carefully thought out and detailed reply.
On 20 Nov 2005, at 12:00, Yonik Seeley wrote:
Does it make sense to add an IndexWriter setting to
specify a default position increment gap to use when multiple fields
are added in this way?
Per-field might be nice...
The good
By default, documents get truncated at 10,000 terms (maybe there is
an off-by-one where it is going to 10,001 though?).
To increase this, and I always do, set the max field length on your
IndexWriter, and re-index. In 1.4.3, you set the maxFieldLength
variable of IndexWriter directly. We'
On Monday 21 November 2005 02:16, [EMAIL PROTECTED] wrote:
> Hi. I was wondering if anyone else has seen this
> before. I'm using lucene 1.4.3 and have indexed
> about 3000 text documents using the statement:
>
> doc.add(Field.Text("contents", new FileReader(f),
> true));
>
> When I go and ret
34 matches
Mail list logo