je-analysis.jar

2007-08-01 Thread Jun.Chen
Dear All, Who has the je-analysis.jar? If somebody has, can you send it to me? I don't have the access to download something in my computer now. Thank you very much! Yours truly, Daniel This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may c

Using Nutch APIs in Lucene

2007-08-01 Thread Srinivasarao Vundavalli
How can we use nutch APIs in Lucene? For example using FetchedSegments , we can get ParseText from which we can get the content of the document. So can we use these classes (FetchedSegments, ParseText ) in lucene. If so, how to use them? Thank You

Solr's NumberUtils doesnt work

2007-08-01 Thread Mohammad Norouzi
Hi I am using NumberUtils to encode and decode numbers while indexing and searching, when I am going to decode the number retrieved from an index it throws exception for some fields the exception message is: Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: 1 at

Re: Can I do boosting based on term postions?

2007-08-01 Thread Cedric Ho
Thanks for the quick response =) On 8/1/07, Shailendra Sharma <[EMAIL PROTECTED]> wrote: > Yes, it is easily doable through "Payload" facility. During indexing process > (mainly tokenization), you need to push this extra information in each > token. And then you can use BoostingTermQuery for using

Re: High CPU usage duing index and search

2007-08-01 Thread karl wettin
It sounds like you have a fairly busy system, perhaps 100% load on the process is not that strange, at least not during short periods of time. A simpler solution would be to nice the process a little bit in order to give your background jobs some more time to think. Running a profiler is still t

Re: Size of field?

2007-08-01 Thread Erick Erickson
Glad it worked out for you Did you ever have any insight into what was magical about 87,300? Although now that I re-read your mail, that was the number of characters, so I can imagine that your corpus averaged 8.73 characters/word Best Erick On 8/1/07, Eduardo Botelho <[EMAIL PROTECTED]>

Re: More IP/MAC indexing questions

2007-08-01 Thread Mike Klaas
On 1-Aug-07, at 11:34 AM, Joe Attardi wrote: On 8/1/07, Erick Erickson <[EMAIL PROTECTED]> wrote: Use a SpanNearQuery with a slop of 0 and specify true for ordering. What that will do is require that the segments you specify must appear in order with no gaps. You have to construct this your

Re: Size of field?

2007-08-01 Thread Eduardo Botelho
Hi Erick!! You're right, I just use setMaxFieldLength() and all work fine. You save my life, thanks! (y) On 7/30/07, Erick Erickson <[EMAIL PROTECTED]> wrote: > > See IndexWriter.setMaxFieldLength(). 87,300 is odd, since the default > max field length, last I knew, was 10,000. But this sounds li

Re: More IP/MAC indexing questions

2007-08-01 Thread Erick Erickson
I suspect you're going to have to deal with wildcards if you really want this functionality. Erick On 8/1/07, Joe Attardi <[EMAIL PROTECTED]> wrote: > > On 8/1/07, Erick Erickson <[EMAIL PROTECTED]> wrote: > > > > Use a SpanNearQuery with a slop of 0 and specify true for ordering. > > What that w

Re: More IP/MAC indexing questions

2007-08-01 Thread Joe Attardi
On 8/1/07, Erick Erickson <[EMAIL PROTECTED]> wrote: > > Use a SpanNearQuery with a slop of 0 and specify true for ordering. > What that will do is require that the segments you specify must appear > in order with no gaps. You have to construct this yourself since there's > no support for SpanQueri

Re: More IP/MAC indexing questions

2007-08-01 Thread Erick Erickson
Think of a custom analyzer class rather than an custom query parser. The QueryParser uses your analyzer, so it all just "comes along". Here's the approach I'd try first, off the top of my head Yes, break the IP and etc. up into octets and index them tokenized. Use a SpanNearQuery with a slop

Re: More IP/MAC indexing questions

2007-08-01 Thread Joe Attardi
Hi Erick, First, consider using your own analyzer and/or breaking the IP addresses > up by substituting ' ' for '.' upon input. Do you mean breaking the IP up into one token for each segment, like ["192", "168", "1", "100"] ? > But on to your question. Please post what you mean by > "a large n

Re: More IP/MAC indexing questions

2007-08-01 Thread Erick Erickson
First, consider using your own analyzer and/or breaking the IP addresses up by substituting ' ' for '.' upon input. Otherwise, you'll have endless issues as time passes.. But on to your question. Please post what you mean by "a large number". 10,000? 1,000,000,000? we have no clue from your po

Re: IndexReader deletes more that expected

2007-08-01 Thread Mark Miller
On 8/1/07, Ridwan Habbal <[EMAIL PROTECTED]> wrote: > > but what about runing it on mutiThread app like web application? There > you are the code: If you are targeting a multi threaded webapp than I strongly suggest you look into using either Solr or the LuceneIndexAccessor code. You will want

RE: IndexReader deletes more that expected

2007-08-01 Thread Steven Parkes
If I'm reading this correctly, there's something a little wonky here. In your example code, you close the IndexWriter and then, without creating a new IndexWriter, you call addDocument again. This shouldn't be possible (what version of Lucene are you using?) Assuming for the time being that you ar

IndexReader deletes more that expected

2007-08-01 Thread Ridwan Habbal
Hi, I got unexpected behavior while testing lucene. To shortly address the problem: Using IndexWriter I add docs with fields named ID with a consecutive order (1,2,3,4, etc) then close that index. I get new IndexReader, and call IndexReader.deleteDocuments(Term). The term is simply new Term("ID

More IP/MAC indexing questions

2007-08-01 Thread Joe Attardi
Hi again, everyone. First of all, I want to thank everyone for their extremely helpful replies so far. Also, I just started reading the book "Lucene in Action" last night. So far it's an awesome book, so a big thanks to the authors. Anyhow, on to my question. As I've mentioned in several of my pre

RE: Searching with too many clauses + Out of Memory

2007-08-01 Thread Chandan Tamrakar
What is the size of heap u r allocating for your app ? -Original Message- From: Harini Raghavan [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 01, 2007 2:29 PM To: java-user@lucene.apache.org Subject: Searching with too many clauses + Out of Memory Hi Everyone, I am using Compass 1.

Crawling in Nutch

2007-08-01 Thread Srinivasarao Vundavalli
Hi, Where does (in which field) nutch stores the content of a document while indexing. I am using this nutch index to search in Lucene. So i want to know the field in which the content of the document is present. Thank You

Re: Problem Search using lucene

2007-08-01 Thread Michael Wechner
Chhabra, Kapil wrote: You just have to make sure that what you are searching is indexed (and esp. in the same format/case). Use Luke (http://www.getopt.org/luke/) to browse through your index. Does Luke also work re to Nutch? Thanks Michael This might give you an insight of what you hav

Searching with too many clauses + Out of Memory

2007-08-01 Thread Harini Raghavan
Hi Everyone, I am using Compass 1.1 M2 which supports Lucene 2.2 to store & search huge amount of company, executive and employment data. There are some usecases where I need to search for executives/employments on the result set of company search. But when I try to create a compass query to sear