date:20051121

Re: Inconsistent StandardTokenizer behaviour

2005-11-21 Thread Erik Hatcher

On 21 Nov 2005, at 19:39, [EMAIL PROTECTED] wrote: This is the results for the StandardTokenizer: input - output token - output type 1. 1.2 - 1.2 - 2. 1.2. - 1.2 - 3. a.b - a.b - 4. a.b. - a.b. - 5. www.apache.org - www.apache.org - 6. www.apac

Re: How does lucene choose a field for sort?

2005-11-21 Thread Yonik Seeley

On 11/21/05, Erik Hatcher <[EMAIL PROTECTED]> wrote: > Neither. It'll throw an exception. Just don't rely on it to throw an exception either though... the checking is not comprehensive. One should treat sorting on a field with more than one value per document as undefined. -Yonik Now hiring --

Re: Inconsistent StandardTokenizer behaviour

2005-11-21 Thread yahootintin . 11533894

Sorry for the bad looking table. Retrying... input string - output token (output type) 1. 1.2 - 1.2 () 2. 1.2. - 1.2 () 3. a.b - a.b () 4. a.b. - a.b. () 5. www.apache.org - www.apache.org () 6. www.apache.org. - www.apache.org. () --- java-user@lucene.apache.org wrote: This is the results for

Inconsistent StandardTokenizer behaviour

2005-11-21 Thread yahootintin . 11533894

This is the results for the StandardTokenizer: input - output token - output type 1. 1.2 - 1.2 - 2. 1.2. - 1.2 - 3. a.b - a.b - 4. a.b. - a.b. - 5. www.apache.org - www.apache.org - 6. www.apache.org. - www.apache.org. - Number 6 should still be

Re: How does lucene choose a field for sort?

2005-11-21 Thread Erik Hatcher

On 21 Nov 2005, at 16:12, John Powers wrote: If I sort on a field called sequence, but at document creation time I add in //create doc A doc.add(Field.Text("sequence", "32")); doc.add(Field.Text("sequence", "3")); doc.add(Field.Text("sequence", "932")); //create doc B doc.add(Field.Text("seq

Custom sort/basic question

2005-11-21 Thread John Powers

If I add keywords to a document at the same time, will they stay in that order? Create New doc A doc.add(Field.Text("category", "toys")); doc.add(Field.Text("sequence", "235")); doc.add(Field.Text("category", "bears")); doc.add(Field.Text("sequence", "63")); doc.add(Field.Text("category", "truc

RE: Lucene Index Changed event

2005-11-21 Thread Aigner, Thomas

Thx Peter.. worked like a charm! -Original Message- From: Peter Kim [mailto:[EMAIL PROTECTED] Sent: Monday, November 21, 2005 4:32 PM To: java-user@lucene.apache.org Subject: RE: Lucene Index Changed event You can check IndexReader.getCurrentVersion() to see if the index changed from the

RE: Lucene Index Changed event

2005-11-21 Thread Peter Kim

You can check IndexReader.getCurrentVersion() to see if the index changed from the last time you checked. The index's version number changes whenever the index is updated. Peter > -Original Message- > From: Aigner, Thomas [mailto:[EMAIL PROTECTED] > Sent: Monday, November 21, 2005 3:48 P

Re: Spans, appended fields, and term positions

2005-11-21 Thread Erik Hatcher

On 21 Nov 2005, at 16:09, Yonik Seeley wrote: The Analyzer extensions seem fine, but much more general purpose than my need. For your need (a global increment), isn't expanding analyzer actually easier? analyser = new OldAnalyzer() { public int getPositionIncrementGap(String field) {

How does lucene choose a field for sort?

2005-11-21 Thread John Powers

If I sort on a field called sequence, but at document creation time I add in //create doc A doc.add(Field.Text("sequence", "32")); doc.add(Field.Text("sequence", "3")); doc.add(Field.Text("sequence", "932")); //create doc B doc.add(Field.Text("sequence", "1")); doc.add(Field.Text("sequence", "300

Re: Spans, appended fields, and term positions

2005-11-21 Thread Yonik Seeley

> > For position increments, it doesn't have to be tracked. The patch to > > DocumentWriter could also be: > > > > int position = fieldPositions[fieldNumber]; > > + if (position>0) position+=analyzer.getPositionIncrementGap > > (fieldName) > > This could be thwarted with tokens using zer

Lucene Index Changed event

2005-11-21 Thread Aigner, Thomas

Hi all, Is there an index changed event that I can jump on that will tell me when my index has been updated so I can close and reopen my searcher to get the new changes? I can't seem to find the event, but see some tools that might accomplish this (DLESE DPC software components?).

Re: TermFrequencies vector limits?

2005-11-21 Thread Chris Hostetter

: " By default, no more than 10,000 terms will be : indexed for a field." : : Given your note, then the docs do not mean that no : more than 10,000 terms will be indexed, but that some : smaller number of terms will be indexed and only the : first 10,000 occurrances will be tallied. It means that

Re: Spans, appended fields, and term positions

2005-11-21 Thread Erik Hatcher

On 21 Nov 2005, at 12:55, Yonik Seeley wrote: On 11/21/05, Erik Hatcher <[EMAIL PROTECTED]> wrote: Modifying Analyzer as you have suggested would require DocumentWriter additionally keep track of the field names and note when one is used again. For position increments, it doesn't have to be t

Re: TermFrequencies vector limits?

2005-11-21 Thread Paul Elschot

On Monday 21 November 2005 14:28, [EMAIL PROTECTED] wrote: > Just to make sure that I understand this correctly, > the docs say: > > " By default, no more than 10,000 terms will be > indexed for a field." > > Given your note, then the docs do not mean that no > more than 10,000 terms will be ind

Re: Spans, appended fields, and term positions

2005-11-21 Thread Yonik Seeley

On 11/21/05, Erik Hatcher <[EMAIL PROTECTED]> wrote: > Modifying Analyzer as you have suggested would > require DocumentWriter additionally keep track of the field names > and note when one is used again. For position increments, it doesn't have to be tracked. The patch to DocumentWriter could als

Re: Throughput doesn't increase when using more concurrent threads

2005-11-21 Thread Doug Cutting

Jay Booth wrote: I had a similar problem with threading, the problem turned out to be that in the back end of the FSDirectory class I believe it was, there was a synchronized block on the actual RandomAccessFile resource when reading a block of data from it... high-concurrency situations caused t

Re: Urgent - File Lock in Lucene 1.2

2005-11-21 Thread jian chen

Hi, Karl, Therer have been quite some discussions regarding the "too many open files" problem. From my understanding, it is due to Lucene trying to open multiple segments at the same time (during search/merging segments), and the operating system wouldn't allow opening that many file handles. If

Re: Throughput doesn't increase when using more concurrent threads

2005-11-21 Thread Oren Shir

Thanks Jay Booth, I thought as much. I just verified that I'm not reaching 100% CPU, and I found out that when using RAMDirectory and 100 threads the CPU usage is 60%, avarage request time 40 times more that one thread, but number of requests the same. I think I'll have to do somthing like you sug

RE: Throughput doesn't increase when using more concurrent threads

2005-11-21 Thread Jay Booth

I had a similar problem with threading, the problem turned out to be that in the back end of the FSDirectory class I believe it was, there was a synchronized block on the actual RandomAccessFile resource when reading a block of data from it... high-concurrency situations caused threads to stack up

Re: Throughput doesn't increase when using more concurrent threads

2005-11-21 Thread Yonik Seeley

On 11/21/05, Oren Shir <[EMAIL PROTECTED]> wrote: > It is rather sad if 10 threads reach the CPU limit. I'll check it and get > back to you. It's about performance and throughput though, not about number of threads it takes to reach saturation. In a 2 CPU box, I would say that the ideal situation

Re: Throughput doesn't increase when using more concurrent threads

2005-11-21 Thread Oren Shir

gekkokid, does 1.4.3 benefit from multi-threading? > Sorry for not being clear. My tests show that both version does not benefit from multi threading, but it is possible that I'm CPU bound, as Yonik kindly reminded me. is 1.9 the version in the source repository? 1.9 is the version in source re

Re: Throughput doesn't increase when using more concurrent threads

2005-11-21 Thread Yonik Seeley

This is expected behavior: you are probably quickly becoming CPU bound (which isn't a bad thing). More threads only help when some threads are waiting on IO, or if you actually have a lot of CPUs in the box. -Yonik Now hiring -- http://forms.cnet.com/slink?231706 On 11/21/05, Oren Shir <[EMAIL

Re: Throughput doesn't increase when using more concurrent threads

2005-11-21 Thread gekkokid

Oren Shir wrote: I tested this in version 1.4.3 and 1.9rc1, and they are both the same in this aspect. 1.9rc1 is faster, but does not benefit from multi threading. some newbie questions i have, does 1.4.3 benefit from multi-threading? is 1.9 the version in the source repository? _gk ---

Re: TermFrequencies vector limits?

2005-11-21 Thread Michael Curtin

> > To get a higher limit. Of course, you could also change the Lucene source > > file and recompile it. Note that you CANNOT just set the property in your > > code, in general, as the Lucene class puts it into a static final int, > > meaning it examines the value of the property (once) at

Throughput doesn't increase when using more concurrent threads

2005-11-21 Thread Oren Shir

Hi, I tried stressing Lucene in a controlled environment: one static IndexSearcher for an index that doesn't change, and in same process I create a number of Threads that call this Searcher concurrently for a limited time. I expected the number of successful queries to increase when using more thr

Re: TermFrequencies vector limits?

2005-11-21 Thread Erik Hatcher

On 21 Nov 2005, at 08:37, Michael Curtin wrote: That's probably because there is a limit built into Lucene where it ignores any tokens in a field past the first 10,000. There is a property you can set to increase this limit. I dont' have the source in front of me right now, but if you go

Re: TermFrequencies vector limits?

2005-11-21 Thread Michael Curtin

> When I go and retrieve the term frequency vectors, for > any document under about 90k, everything looks as > expected. However for larger documents (I haven't > found the exact point, but I know that those over 128k > qualify) the sum of the term frequencies in the vector > seems to max out at 1

Re: TermFrequencies vector limits?

2005-11-21 Thread marigoldcc

Just to make sure that I understand this correctly, the docs say: " By default, no more than 10,000 terms will be indexed for a field." Given your note, then the docs do not mean that no more than 10,000 terms will be indexed, but that some smaller number of terms will be indexed and only the fi

Grouping results on the basis of a field

2005-11-21 Thread Samarendra Pratap

Hi, I am using lucene 1.4.3. The basic functionality of the search is simple, put in the keyword as java and it will display you all the books having java keyword. Now I have to add a feature which also shows the name of top authors (lets say top 5 authors) with the number of books,

Re: Spans, appended fields, and term positions

2005-11-21 Thread Erik Hatcher

On 21 Nov 2005, at 04:26, Erik Hatcher wrote: What about adding an offset to Field, setPositionOffset(int offset)? Looking at DocumentWriter, it looks like this would be the simplest thing that could work, without precluding the interesting option of modifying Analyzer to allow with flags

Re: Spans, appended fields, and term positions

2005-11-21 Thread Erik Hatcher

Yonik, Thanks for your carefully thought out and detailed reply. On 20 Nov 2005, at 12:00, Yonik Seeley wrote: Does it make sense to add an IndexWriter setting to specify a default position increment gap to use when multiple fields are added in this way? Per-field might be nice... The good

Re: TermFrequencies vector limits?

2005-11-21 Thread Erik Hatcher

By default, documents get truncated at 10,000 terms (maybe there is an off-by-one where it is going to 10,001 though?). To increase this, and I always do, set the max field length on your IndexWriter, and re-index. In 1.4.3, you set the maxFieldLength variable of IndexWriter directly. We'

Re: TermFrequencies vector limits?

2005-11-21 Thread Paul Elschot

On Monday 21 November 2005 02:16, [EMAIL PROTECTED] wrote: > Hi. I was wondering if anyone else has seen this > before. I'm using lucene 1.4.3 and have indexed > about 3000 text documents using the statement: > > doc.add(Field.Text("contents", new FileReader(f), > true)); > > When I go and ret

Re: Inconsistent StandardTokenizer behaviour

Re: How does lucene choose a field for sort?

Re: Inconsistent StandardTokenizer behaviour

Inconsistent StandardTokenizer behaviour

Re: How does lucene choose a field for sort?

Custom sort/basic question

RE: Lucene Index Changed event

RE: Lucene Index Changed event

Re: Spans, appended fields, and term positions

How does lucene choose a field for sort?

Re: Spans, appended fields, and term positions

Lucene Index Changed event

Re: TermFrequencies vector limits?

Re: Spans, appended fields, and term positions

Re: TermFrequencies vector limits?

Re: Spans, appended fields, and term positions

Re: Throughput doesn't increase when using more concurrent threads

Re: Urgent - File Lock in Lucene 1.2

Re: Throughput doesn't increase when using more concurrent threads

RE: Throughput doesn't increase when using more concurrent threads

Re: Throughput doesn't increase when using more concurrent threads

Re: Throughput doesn't increase when using more concurrent threads

Re: Throughput doesn't increase when using more concurrent threads

Re: Throughput doesn't increase when using more concurrent threads

Re: TermFrequencies vector limits?

Throughput doesn't increase when using more concurrent threads

Re: TermFrequencies vector limits?

Re: TermFrequencies vector limits?

Re: TermFrequencies vector limits?

Grouping results on the basis of a field

Re: Spans, appended fields, and term positions

Re: Spans, appended fields, and term positions

Re: TermFrequencies vector limits?

Re: TermFrequencies vector limits?

34 matches

Site Navigation

Mail list logo

Footer information