speeding up lucene search

2004-07-20 Thread Anson Lau
Hello guys, What are some general techniques to make lucene search faster? I'm thinking about splitting up the index. My current index has approx 1.8 million documents (small documents) and index size is about 550MB. Am I likely to get much gain out of splitting it up and use a multiparallelsea

Token or not Token, PerFieldAnalyzer

2004-07-20 Thread Florian Sauvin
I still don't understand something, my analyzer contains a tokenizer, turning "hello world" into [hello] [world] is this analyzer applied on non-tokenized field? What exactly is done on a field when the boolean token is set to true? -- Florian ---

Sorting on tokenized fields

2004-07-20 Thread Florian Sauvin
I see in the Javadoc that it is only possible to sort on fields that are not tokenized, I have two questions about that: 1) What happens if the field is tokenized, is sorting done anyway, using the first term only? 2) Is there a way to do some sorting anyway, by concatenating all the tokens in

Re: Lucene vs. MySQL Full-Text

2004-07-20 Thread Florian Sauvin
On Jul 20, 2004, at 12:29 PM, Tim Brennan wrote: Someone came into my office today and asked me about the project I am trying to Lucene for -- "why aren't you just using a MySQL full-text index to do that" -- after thinking about it for a few minutes, I realized I don't have a great answer. MySQL b

Re: Lucene vs. MySQL Full-Text

2004-07-20 Thread Daniel Naber
On Tuesday 20 July 2004 21:29, Tim Brennan wrote: > ÂDoes anyone out there have > anything more concrete they can add? Stemming is still on the MySQL TODO list: http://dev.mysql.com/doc/mysql/en/Fulltext_TODO.html Also, for most people it's easier to extend Lucene than MySQL (as MySQL is writt

Re: Tokenizers and java.text.BreakIterator

2004-07-20 Thread Grant Ingersoll
Answering my own question, I think it is b/c Tokenizer's work with a Reader and you would have to read in the whole document in order to use the BreakIterator, which operates on a String... >>> [EMAIL PROTECTED] 07/20/04 03:23PM >>> Hi, Was wondering if anyone uses java.text.BreakIterator#getWo

Limiting Term Queries

2004-07-20 Thread Shawn Konopinsky
Is it possible to limit a term query? For example: I am indexing documents with (amongst other things) a string in one field and with a number in another field. All combinations of strings and numbers are allowed and neither field is unique. I would like a way to query Lucene to pull out all uniq

Re: lucene cutomized indexing

2004-07-20 Thread Grant Ingersoll
It seems to me the answer to this is not necessarily to open up the API, but to provide a mechanism for adding Writers and Readers to the indexing/searching process at the application level. These readers and writers could be passed to Lucene and used to read and write to separate files (thus,

Lucene vs. MySQL Full-Text

2004-07-20 Thread Tim Brennan
Someone came into my office today and asked me about the project I am trying to Lucene for -- "why aren't you just using a MySQL full-text index to do that" -- after thinking about it for a few minutes, I realized I don't have a great answer. MySQL builds inverted indexes for (in theory) doing th

Tokenizers and java.text.BreakIterator

2004-07-20 Thread Grant Ingersoll
Hi, Was wondering if anyone uses java.text.BreakIterator#getWordInstance(Locale) as a tokenizer for various languages? Does it do a good job? It seems like it does, at least for languages where words are separated by spaces or punctuation, but I have only done simple tests. Anyone have any t

Re: lucene cutomized indexing

2004-07-20 Thread Erik Hatcher
On Jul 20, 2004, at 2:10 PM, John Wang wrote: I have already provided my opinion on this one - I think it would be fine to allow Token to be public. I'll let others respond to the additional requests you've made. Great, what processes need to be in place before this gets in the code base? You're

Re: lucene cutomized indexing

2004-07-20 Thread John Wang
That is what exactly they did and that's probably what I have to do. But that means we are diverging from the lucene code base and future fixes and enhancements need to be synchronized and that maybe a pain. -John On Tue, 20 Jul 2004 20:03:05 +0200, Daniel Naber <[EMAIL PROTECTED]> wrote: > On Tu

Re: lucene cutomized indexing

2004-07-20 Thread John Wang
On Tue, 20 Jul 2004 13:40:28 -0400, Erik Hatcher <[EMAIL PROTECTED]> wrote: > On Jul 20, 2004, at 12:12 PM, John Wang wrote: > > There are few things I want to do to be able to customize lucene: > > > [...] > > > > 3) to be able to customize analyzers to add more information to the > > Token w

Re: join two indexes

2004-07-20 Thread Daniel Naber
On Tuesday 20 July 2004 19:19, Sergio wrote: > i want to join two lucene indexes but i dont know how to do that. There are two "addIndexes" methods in IndexWriter which you can use to write your own small merge tool (a ready-to-use tool for index merging doesn't exist AFAIK). Regards Daniel

Re: Very slow IndexReader.open() performance

2004-07-20 Thread Doug Cutting
Optimization should not require huge amounts of memory. Can you tell a bit more about your configuration: What JVM? What OS? How many fields? What mergeFactor have you used? Also, please attach the output of 'ls -l' of your index directory, as well as the stack trace you see when OutOfMemo

Re: lucene cutomized indexing

2004-07-20 Thread Daniel Naber
On Tuesday 20 July 2004 18:12, John Wang wrote: > They make sure during deployment their "versions" > gets loaded before the same classes in the lucene .jar. I don't see why people cannot just make their own lucene.jar. Just remove the "final" and recompile. Finally, Lucene is Open Source. Rega

Very slow IndexReader.open() performance

2004-07-20 Thread Mark Florence
Hi -- We have a large index (~4m documents, ~14gb) that we haven't been able to optimize for some time, because the JVM throws OutOfMemory, after climbing to the maximum we can throw at it, 2gb. In fact, the OutOfMemory condition occurred most recently during a segment merge operation. maxMergeD

Re: lucene cutomized indexing

2004-07-20 Thread Erik Hatcher
On Jul 20, 2004, at 12:12 PM, John Wang wrote: There are few things I want to do to be able to customize lucene: [...] 3) to be able to customize analyzers to add more information to the Token while doing tokenization. I have already provided my opinion on this one - I think it would be fine

Here is how to search multiple indexes

2004-07-20 Thread Don Vaillancourt
Here is the code that I use to do multi-index searches: // create a multi index searcher IndexSearcher indexes[] = new IndexSearcher[n]; // where n is the number of indexes to search for (int i = 0; i < n; i++) { // use whichever IndexSearcher constructor you want // blah is the

Syntax of Query

2004-07-20 Thread Hetan Shah
Hey guys, Need some help with creating a query. Here is the scenario: Field 1: Field 2: Field 3: MultiSelect 1 : MultiSelect 2 :

join two indexes

2004-07-20 Thread Sergio
Hi, i want to join two lucene indexes but i dont know how to do that. For example i have a student index and a school index. In the scholl index i have the studentId field. How to do that ? Any idea will be wellcomed. Thx, Sergio.

Re: No change in the indexing time after increase the merge factor

2004-07-20 Thread Otis Gospodnetic
All Lucene articles that I know of were written before IndexWriter.minMergeDocs was added. Check IndexWriter javadoc for more info, but this is another field you can tune. Otis --- Praveen Peddi <[EMAIL PROTECTED]> wrote: > I performed lucene indexing with 25,000 documents. > We feel that index

Re: lucene cutomized indexing

2004-07-20 Thread John Wang
Hi Daniel: There are few things I want to do to be able to customize lucene: 1) to be able to plug in a different similarity model (e.g. bayesian, vector space etc.) 2) to be able to store certain fields in its own format and provide corresponding readers. I may not want to store every fiel

Re: Post-sorted inverted index?

2004-07-20 Thread Doug Cutting
You can define a subclass of FilterIndexReader that re-sorts documents in TermPositions(Term) and document(int), then use IndexWriter.addIndexes() to write this in Lucene's standard format. I have done this in Nutch, with the (as yet unused) IndexOptimizer. http://cvs.sourceforge.net/viewcvs.p

Re: lucene cutomized indexing

2004-07-20 Thread Daniel Naber
On Tuesday 20 July 2004 17:28, John Wang wrote: >I have asked to make the Lucene API less restrictive many many many > times but got no replies. I suggest you just change it in your source and see if it works. Then you can still explain what exactly you did and why it's useful. From the deve

No change in the indexing time after increase the merge factor

2004-07-20 Thread Praveen Peddi
I performed lucene indexing with 25,000 documents. We feel that indexing is slow, so I am trying to tune it. My configuration is as follow: Machine: Windows XP, 1GB RAM, 3GHz # of documents: 25,000 App Server: Weblogic 7.0 lucene version: lucene 1.4 final I ran the indexer with merge factor of 10

lucene cutomized indexing

2004-07-20 Thread John Wang
Hi: I am trying to store some Databased like field values into lucene. I have my own way of storing field values in a customized format. I guess my question is wheather we can make the Reader/Writer classes, e.g. FieldReader, FieldWriter, DocumentReader/Writer classes non-final? I have a

Re: The indexer

2004-07-20 Thread Erik Hatcher
On Jul 20, 2004, at 10:07 AM, Ian McDonnell wrote: As for indexing data from mysql - there have been lots of discussions of that recently, so check the archives. Basically you read the data, and index it with Lucene's API. And you are responsible for keeping it >in sync. The problem i am having

Re: The indexer

2004-07-20 Thread Erik Hatcher
On Jul 20, 2004, at 9:29 AM, Ian McDonnell wrote: Basically i add details about a movie clip as various fields in an sql db using a jsp form. When the form submits i want to add the details into the db and also want the fields to be stored as a searchable lucene index on the server. Is this pos

Re: The indexer

2004-07-20 Thread Ian McDonnell
Yeah that last part of your reply seems to be what i'm trying to do(you're going to have to excuse me as i'm a total newbie to Lucene and am only finding my feet with it). I searched the archives and went back through it manually just there, but didnt find any relevant posts in the archive. >As

Re: The indexer

2004-07-20 Thread Ian McDonnell
Basically i add details about a movie clip as various fields in an sql db using a jsp form. When the form submits i want to add the details into the db and also want the fields to be stored as a searchable lucene index on the server. Is this possible? Ian --- Erik Hatcher <[EMAIL PROTECTED]>

Re: The indexer

2004-07-20 Thread Erik Hatcher
On Jul 20, 2004, at 8:44 AM, Ian McDonnell wrote: Can Lucenes indexer be used to store info in fields in a mysql db? I'm not quite clear on your question. You want to store a Lucene index (aka Directory) within mysql? Or, you want to index data from your existing mysql database into a Lucene in

The indexer

2004-07-20 Thread Ian McDonnell
Can Lucenes indexer be used to store info in fields in a mysql db? If so can anybody point me to an example or some documentation relating to it. Ian _ Sign up for FREE email from SpinnersCity Online Dance Magazine & Vortal at http://w

Re: Post-sorted inverted index?

2004-07-20 Thread Erik Hatcher
On Jul 20, 2004, at 1:27 AM, Aphinyanaphongs, Yindalon wrote: I gather from reading the documentation that the scores for each document hit are computed at query time. I have an application that, due to the complexity of the function, cannot compute scores at query time. Would it be possible f

Re: Query across multiple fields scenario not handled by "MultiFieldQueryParser"

2004-07-20 Thread Thomas Plümpe
Daniel, > > Does anybody here know which changes I > > would have to make to QueryParser.jj to get the functionality described? > > I haven't tried it but I guess you need to change the getXXXQuery() methods so > they return a BooleanQuery. For example, getFieldQuery currently might return > a