Re: new version of NewMultiFieldQueryParser

2004-10-27 Thread sergiu gordea
Bill Janssen wrote: I'm not sure this solution is very robust Thanks, but I'm pretty sure it *is* robust. Can you please offer a specific critique? Always happy to learn and improve :-). Try to see the behavior if you want to have a single term query juat something like: "robust

Searchable Solutions Please

2004-10-27 Thread Karthik N S
Hi Guys Aplologies On a Using the Lucene Search , If returned hits for the following is to be aquired Search Word =' kids watches ' Hits on docs returned should have =kid's , kid watch , junior watches Solution's Please Thx in advance WITH WARM REGARD

Documents with 1 word are given unfair lengthNorm()

2004-10-27 Thread Kevin A. Burton
WRT to my blog post: It seems the problem is that the distribution for lengthNorm() starts at 1 and moves down from there. 1.0f would work but HUGE documents would be normalized and so would distort the results. What would you think of using this implementation for lengthNorm: public float

document ID and performance

2004-10-27 Thread Yan Pujante
Hello I wrote the following test programs: I index 150,000 documents in Lucene and I build each document using this method. private Document buildDocument(String documentID, String body) { Document document = new Document(); document.add(Field.Keyword("docID", documentID)); document.a

Re: Poor Lucene Ranking for Short Text

2004-10-27 Thread Daniel Naber
On Wednesday 27 October 2004 22:47, Kevin A. Burton wrote: > If the current behavior is all that happens this is fine... this way I > can just get this behavior for new documents that are added. You'll have to try it out, I'm not sure what exactly will happen. > Also... why isn't this the defaul

Locks and Readers and Writers

2004-10-27 Thread yahootintin . 1247688
Hi, I'm getting: java.io.IOException: Lock obtain timed out I have a writer service that opens the index to delete and add docs. I have a reader service that opens the index for searching only. This error occurs when the reader service opens the index (this takes about 500ms). Meanwhile

weights on multi index searches

2004-10-27 Thread Ravi
Can I give weights on different indexes when I search against multiple indexes. The final score of a document should be a linear combination of the weights on each index and the individual score for that index. Is this possible in Lucene? Thanks Ravi.

Re: Looking for consulting help on project

2004-10-27 Thread David Spencer
Suggestions [a] Try invoking the VM w/ an option like "-XX:CompileThreshold=100" or even a smaller number. This encourages the hotspot VM to compile methods sooner, thus the app will take less time to "warm up". http://java.sun.com/docs/hotspot/VMOptions.html#additional You might want to sea

Re: new version of NewMultiFieldQueryParser

2004-10-27 Thread Bill Janssen
> I'm not sure this solution is very robust Thanks, but I'm pretty sure it *is* robust. Can you please offer a specific critique? Always happy to learn and improve :-). > I think I already sent an email with a better code... Pretty vague. Can you send a URL for that message in the archiv

Re: Poor Lucene Ranking for Short Text

2004-10-27 Thread Kevin A. Burton
Daniel Naber wrote: (Kevin complains about shorter documents ranked higher) This is something that can easily be fixed. Just use a Similarity implementation that extends DefaultSimilarity and that overwrites lengthNorm: just return 1.0f there. You need to use that Similarity for indexing and sea

Highlighter problem: null as result

2004-10-27 Thread Miro Max
Hello, i'm trying to use highlighter from sandbox and actually i've got a problem with some results getting from highlighter. normaly when i search in my index for ex. "motor" i get circa 150 results --> this results are ok. but when i use highlighter i get some results as "null" values from the

Re: Stopwords in Exact phrase

2004-10-27 Thread Justin Swanhart
your analyzer will have removed the stopword when you indexed your documents, so lucene won't be able to do this for you. You will need to implement a second pass over the results returned by lucene and check to see if the stopword is included, perhaps with String.indexOf() On Wed, 27 Oct 2004 1

Re: Stopwords in Exact phrase

2004-10-27 Thread Erik Hatcher
On Oct 27, 2004, at 3:36 PM, Ravi wrote: Is there way to include stopwords in an exact phrase search? For example, when I search on "Melbourne IT", Lucene only searches for Melbourne ignoring "IT". But you want stop words removed for general term queries? Have a look at how Nutch does its thing -

Stopwords in Exact phrase

2004-10-27 Thread Ravi
Is there way to include stopwords in an exact phrase search? For example, when I search on "Melbourne IT", Lucene only searches for Melbourne ignoring "IT". Thanks, Ravi. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additi

Re: Poor Lucene Ranking for Short Text

2004-10-27 Thread Daniel Naber
On Wednesday 27 October 2004 20:20, Kevin A. Burton wrote: > http://www.peerfear.org/rss/permalink/2004/10/26/PoorLuceneRankingForSho >rtText/ (Kevin complains about shorter documents ranked higher) This is something that can easily be fixed. Just use a Similarity implementation that extends De

Poor Lucene Ranking for Short Text

2004-10-27 Thread Kevin A. Burton
http://www.peerfear.org/rss/permalink/2004/10/26/PoorLuceneRankingForShortText/ -- Use Rojo (RSS/Atom aggregator). Visit http://rojo.com. Ask me for an invite! Also see irc.freenode.net #rojo if you want to chat. Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html If you're interested

Re: IndexWriter Constructor question

2004-10-27 Thread Justin Swanhart
You could always modify your own local copy if you want to change the behavior of the parameter. or just do: IndexWriter w = new IndexWriter(indexDirectory, new StandardAnalyzer(), !(IndexReader.indexEx

IndexWriter Constructor question

2004-10-27 Thread Armbrust, Daniel C.
Wouldn't it make more sense if the constructor for the IndexWriter always created an index if it doesn't exist - and the boolean parameter should be clear (instead of create) So instead of this (from javadoc): IndexWriter public IndexWriter(Directory d, Analyzer a,

RE: Indexing process causes Tomcat to stop working

2004-10-27 Thread Armbrust, Daniel C.
So, are you creating the indexes from inside the tomcat runtime, or are you creating them on the command line (which would be in a different runtime than tomcat)? What happens to tomcat? Does it hang - still running but not responsive? Or does it crash? If it hangs, maybe you are running ou

Re: Backup strategies

2004-10-27 Thread Justin Swanhart
I would suggest that you create a lock file for your index writing process, if the lock file is encountered close the IndexWriter until the lock file is removed. After you create the lockfile, wait a few seconds to make sure the writer process has quiesced, then create a snapshot of the filesystem

RE: Indexing process causes Tomcat to stop working

2004-10-27 Thread James Tyrrell
Aad, D'oh forgot to mention that mildly important info. Rather than re-index I am just creating a new index each time, this makes things easier to roll-back etc (which is what my boss wants). the command line is something like I have wondered about whether sessions could be a problem, but

RE: Indexing process causes Tomcat to stop working

2004-10-27 Thread Aad Nales
James, How do you kick off your reindex? Could it be a session timeout? cheers, Aad Hello, I am a Java/Lucene/Tomcat newbie I know that does not bode well as a start to a post but I really am in dire straits as far as Lucene goes so bear with me. I am working on indexing and replacing searc

Boost value

2004-10-27 Thread Michael Hartmann
Hello, I am working on Lucene and tried to understand the calculation of the score value. As far as I understand it works as follows: (1) idf = ln(numDocs/(docFreq+1)) (2) queryWeight = idf * boost (3) sumOfSquaredWeights = queryWeight * queryWeight (4) norm = 1/sqrt(sumOfSquaredWeights)

Re: Backup strategies

2004-10-27 Thread Christoph Kiehl
Christiaan Fluit wrote: I have no practical experience with backing up an online index, but I would try to find out the details of the write lock mechanism used by Lucene at the file level. You can then create a backup component that write-locks the index and does a regular file copy of the inde

Re: Backup strategies

2004-10-27 Thread Christiaan Fluit
Christoph Kiehl wrote: I'm curious about your strategy to backup indexes based on FSDirectory. If I do a file based copy I suspect I will get corrupted data because of concurrent write access. My current favorite is to create an empty index and use IndexWriter.addIndexes() to copy the current in

Backup strategies

2004-10-27 Thread Christoph Kiehl
Hi, I'm curious about your strategy to backup indexes based on FSDirectory. If I do a file based copy I suspect I will get corrupted data because of concurrent write access. My current favorite is to create an empty index and use IndexWriter.addIndexes() to copy the current index state. But I'm

Indexing process causes Tomcat to stop working

2004-10-27 Thread James Tyrrell
Hello, I am a Java/Lucene/Tomcat newbie I know that does not bode well as a start to a post but I really am in dire straits as far as Lucene goes so bear with me. I am working on indexing and replacing search functionality for a website (about 10 gig in size, although only about 7 gig is indexed