Message no tvx file

2005-11-20 Thread Daniel Cortes
Hello, my index works fine but now I activated the last parameter to aad function of indexwriter. writer.add(Field.UnStored("CONTENTS",content,true)); Now I obtain frequently the message no tvx file. What can I do? thks for any replies. ---

TermFrequencies vector limits?

2005-11-20 Thread marigoldcc
Hi. I was wondering if anyone else has seen this before. I'm using lucene 1.4.3 and have indexed about 3000 text documents using the statement: doc.add(Field.Text("contents", new FileReader(f), true)); When I go and retrieve the term frequency vectors, for any document under about 90k, everyth

Re: High CPU utilization with sort

2005-11-20 Thread Chris Hostetter
: In tests for our implementation (25 concurrent connections generating : search/sort requests), we've seen performance in terms of requests/second : drop by a factor of 10, compared to similar tests executing only search In case it's not clear from Yonik's response: reuse the same IndexReader/In

RE: What is stemming?

2005-11-20 Thread anton
About stemmers you can read on http://snowball.tartarus.org/ -Original Message- From: Koji Sekiguchi [mailto:[EMAIL PROTECTED] Sent: Monday, November 21, 2005 2:37 AM To: java-user@lucene.apache.org Subject: RE: What is stemming? Gekkokid, Daniel, Giovanni, Thank you very much for your

Re: Re-Opening IndexSearcher

2005-11-20 Thread Yonik Seeley
Karl, You are opening IndexSearchers in this code but not closing them. If GC & finalizers don't happen to run before you run out of file handles, you will get exceptions. You could close the IndexSearcher after every request, but it would lead to very poor performance. Better to keep a single

Re: High CPU utilization with sort

2005-11-20 Thread Yonik Seeley
On 11/20/05, Jeff Rodenburg <[EMAIL PROTECTED]> wrote: > Why are numeric fields more onerous in filling the field-cache? Float.parseFloat() or Integer.parseInt() for each unique term. -Yonik Now hiring -- http://forms.cnet.com/slink?231706

Re: High CPU utilization with sort

2005-11-20 Thread Jeff Rodenburg
Why are numeric fields more onerous in filling the field-cache? On 11/20/05, Yonik Seeley <[EMAIL PROTECTED]> wrote: > > I haven't done measurements, but the first query with a sort on a > particular field will involve filling the field-cache and that can > take a while (especially for numeric f

RE: What is stemming?

2005-11-20 Thread Koji Sekiguchi
Gekkokid, Daniel, Giovanni, Thank you very much for your explanation. Now I'm very clear! Thank you again, Koji - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re-Opening IndexSearcher

2005-11-20 Thread Karl Koch
Hello, how do I close and open an IndexSearcher object in order to free resources that cause my system to throw an IOException saing "Too many open files" as well as trouble with an index lock file ? I have the following code: synchronized public static Hits search(String queryString, String[]

Urgent - File Lock in Lucene 1.2

2005-11-20 Thread Karl Koch
Hello group, I am running Lucene 1.2 and I have the following error message. I got this message when performing a search: Failed to obtain file lock on /tmp/qcop-msg-qpe I am running Lucene 1.2 on a Sharp Zaurus PDA with embedded Linux. When I look through the exceptions I have before that I ca

Re: High CPU utilization with sort

2005-11-20 Thread Yonik Seeley
I haven't done measurements, but the first query with a sort on a particular field will involve filling the field-cache and that can take a while (especially for numeric fields). If you haven't already, you should compare the query times of a "warmed" searcher. Sorted queries will still take long

Re: Spans, appended fields, and term positions

2005-11-20 Thread Yonik Seeley
> It depends on > Document.fields() of a stored and retrieved document: does it return > all the appended field parts as separate Fields, or does it only > return one Field with all parts appended? Separate fields. Stored fields are returned back to you verbatim. -Yonik Now hiring -- http://for

High CPU utilization with sort

2005-11-20 Thread Jeff Rodenburg
I've read many comments from users on the list indicating that sorting may/will be performance-heavy. Is high CPU utilization with a sorted search one of the expected performance hits? In tests for our implementation (25 concurrent connections generating search/sort requests), we've seen performan

Re: Spans, appended fields, and term positions

2005-11-20 Thread Paul Elschot
One more thing to consider: the field length in the index. Probably the added position increment between appended parts of a field should not be reflected in the total field size as indexed. This would also be a consideration for queries and for the field norms: when multiple fields are used they

Re: What is stemming?

2005-11-20 Thread Giovanni Novelli
[Afaik] Lucene stemming is based on Snowball (http://snowball.tartarus.org/) and snowball is an implementation of Porter's algorithm ( http://www.tartarus.org/~martin/PorterStemmer/) so, if I'm not wrong, you should refer to them.

Re: What is stemming?

2005-11-20 Thread Daniel Naber
On Sonntag 20 November 2005 16:48, Koji Sekiguchi wrote: > Could someone explain what "stemming" is? Stemming usually means to cut off characters from the end of the word, e.g. walked -> walk, walking -> walk. However, this does not necessarily produce a real word, e.g. a stemmer could also cha

Re: What is stemming?

2005-11-20 Thread gekkokid
Hello fellow lucener :), firstly im no tutor but i will try my best to explain, if anyone believes im wrong please state it so our friend doesnt get the wrong idea, here it goes. O_o stemming is reducing the word to the root form, where lemmatisation is concerned with linguistics i believe l

Re: Spans, appended fields, and term positions

2005-11-20 Thread Yonik Seeley
> Does it make sense to add an IndexWriter setting to > specify a default position increment gap to use when multiple fields > are added in this way? Per-field might be nice... The good news is that Analyzer is an abstract class, and not an Interface, so we could add something to it without break

RE: reusing MultiSearcher vs. reusing contained IndexSearchers

2005-11-20 Thread Alexey Lef
We have a very similar situation. We cache an IndexSearcher for each index and a MultiSearcher for each combination. Both IndexSearchers and MultiSearchers are created on demand (i.e. we don't prebuild every possible combination). I think caching MultiSearchers is not necessary because it is just

What is stemming?

2005-11-20 Thread Koji Sekiguchi
Hello, Luceners! What is "stemming"? I have Lucene in Action and found the following definitions on page 103: - reducing words to a root form (stemming) - changing words into the basic form (lemmatization) but I cannot see the difference between them. I'm also confused by the following words on

Spans, appended fields, and term positions

2005-11-20 Thread Erik Hatcher
I'm working on building a custom highlighter for a client, which may eventually be generalizable. In my work, I've come across some issues I'd like to discuss. One issue is of appended fields allowing querying across boundaries. For example, if I index two fields with the same name:

Re: reusing MultiSearcher vs. reusing contained IndexSearchers

2005-11-20 Thread Erik Hatcher
Oh perhaps create a custom MultiSearcher that requires you specify which indexes (by an int[], perhaps?) to search for any given query. Erik On 20 Nov 2005, at 03:45, Oren Shir wrote: Hi, I'm searching several indexes combinations on random. Which method is better: 1) Keep one IndexSe

reusing MultiSearcher vs. reusing contained IndexSearchers

2005-11-20 Thread Oren Shir
Hi, I'm searching several indexes combinations on random. Which method is better: 1) Keep one IndexSearcher for each index, and create a new MultiSearcher for each request according to the combination needed. 2) Keep one MultiSearcher with the scope of all the indexes, and pay the toll of searchin