Hello guys,
What are some general techniques to make lucene search faster?
I'm thinking about splitting up the index. My current index has approx 1.8
million documents (small documents) and index size is about 550MB. Am I
likely to get much gain out of splitting it up and use a
multiparallelsea
I still don't understand something, my analyzer contains a tokenizer,
turning "hello world" into [hello] [world]
is this analyzer applied on non-tokenized field? What exactly is done
on a field when the boolean token is set to true?
--
Florian
---
I see in the Javadoc that it is only possible to sort on fields that
are not tokenized, I have two questions about that:
1) What happens if the field is tokenized, is sorting done anyway,
using the first term only?
2) Is there a way to do some sorting anyway, by concatenating all the
tokens in
On Jul 20, 2004, at 12:29 PM, Tim Brennan wrote:
Someone came into my office today and asked me about the project I am
trying to Lucene for -- "why aren't you just using a MySQL full-text
index to do that" -- after thinking about it for a few minutes, I
realized I don't have a great answer.
MySQL b
On Tuesday 20 July 2004 21:29, Tim Brennan wrote:
> ÂDoes anyone out there have
> anything more concrete they can add?
Stemming is still on the MySQL TODO list:
http://dev.mysql.com/doc/mysql/en/Fulltext_TODO.html
Also, for most people it's easier to extend Lucene than MySQL (as MySQL is
writt
Answering my own question, I think it is b/c Tokenizer's work with a Reader and you
would have to read in the whole document in order to use the BreakIterator, which
operates on a String...
>>> [EMAIL PROTECTED] 07/20/04 03:23PM >>>
Hi,
Was wondering if anyone uses java.text.BreakIterator#getWo
Is it possible to limit a term query?
For example: I am indexing documents with (amongst other things) a
string in one field and with a number in another field. All combinations
of strings and numbers are allowed and neither field is unique. I would
like a way to query Lucene to pull out all uniq
It seems to me the answer to this is not necessarily to open up the API, but to
provide a mechanism for adding Writers and Readers to the indexing/searching process
at the application level. These readers and writers could be passed to Lucene and
used to read and write to separate files (thus,
Someone came into my office today and asked me about the project I am
trying to Lucene for -- "why aren't you just using a MySQL full-text
index to do that" -- after thinking about it for a few minutes, I
realized I don't have a great answer.
MySQL builds inverted indexes for (in theory) doing th
Hi,
Was wondering if anyone uses java.text.BreakIterator#getWordInstance(Locale) as a
tokenizer for various languages? Does it do a good job? It seems like it does, at
least for languages where words are separated by spaces or punctuation, but I have
only done simple tests.
Anyone have any t
On Jul 20, 2004, at 2:10 PM, John Wang wrote:
I have already provided my opinion on this one - I think it would be
fine to allow Token to be public. I'll let others respond to the
additional requests you've made.
Great, what processes need to be in place before this gets in the code
base?
You're
That is what exactly they did and that's probably what I have to do.
But that means we are diverging from the lucene code base and future
fixes and enhancements need to be synchronized and that maybe a pain.
-John
On Tue, 20 Jul 2004 20:03:05 +0200, Daniel Naber
<[EMAIL PROTECTED]> wrote:
> On Tu
On Tue, 20 Jul 2004 13:40:28 -0400, Erik Hatcher
<[EMAIL PROTECTED]> wrote:
> On Jul 20, 2004, at 12:12 PM, John Wang wrote:
> > There are few things I want to do to be able to customize lucene:
> >
> [...]
> >
> > 3) to be able to customize analyzers to add more information to the
> > Token w
On Tuesday 20 July 2004 19:19, Sergio wrote:
> i want to join two lucene indexes but i dont know how to do that.
There are two "addIndexes" methods in IndexWriter which you can use to
write your own small merge tool (a ready-to-use tool for index merging
doesn't exist AFAIK).
Regards
Daniel
Optimization should not require huge amounts of memory. Can you tell a
bit more about your configuration: What JVM? What OS? How many
fields? What mergeFactor have you used?
Also, please attach the output of 'ls -l' of your index directory, as
well as the stack trace you see when OutOfMemo
On Tuesday 20 July 2004 18:12, John Wang wrote:
> They make sure during deployment their "versions"
> gets loaded before the same classes in the lucene .jar.
I don't see why people cannot just make their own lucene.jar. Just remove
the "final" and recompile. Finally, Lucene is Open Source.
Rega
Hi -- We have a large index (~4m documents, ~14gb) that we haven't been
able to optimize for some time, because the JVM throws OutOfMemory, after
climbing to the maximum we can throw at it, 2gb.
In fact, the OutOfMemory condition occurred most recently during a segment
merge operation. maxMergeD
On Jul 20, 2004, at 12:12 PM, John Wang wrote:
There are few things I want to do to be able to customize lucene:
[...]
3) to be able to customize analyzers to add more information to the
Token while doing tokenization.
I have already provided my opinion on this one - I think it would be
fine
Here is the code that I use to do multi-index searches:
// create a multi index searcher
IndexSearcher indexes[] = new IndexSearcher[n]; // where n is the number
of indexes to search
for (int i = 0; i < n; i++)
{
// use whichever IndexSearcher constructor you want
// blah is the
Hey guys,
Need some help with creating a query. Here is the scenario:
Field 1:
Field 2:
Field 3:
MultiSelect 1 :
MultiSelect 2 :
Hi,
i want to join two lucene indexes but i dont know how to do that.
For example i have a student index and a school index.
In the scholl index i have the studentId field.
How to do that ?
Any idea will be wellcomed.
Thx, Sergio.
All Lucene articles that I know of were written before
IndexWriter.minMergeDocs was added. Check IndexWriter javadoc for more
info, but this is another field you can tune.
Otis
--- Praveen Peddi <[EMAIL PROTECTED]> wrote:
> I performed lucene indexing with 25,000 documents.
> We feel that index
Hi Daniel:
There are few things I want to do to be able to customize lucene:
1) to be able to plug in a different similarity model (e.g. bayesian,
vector space etc.)
2) to be able to store certain fields in its own format and provide
corresponding readers. I may not want to store every fiel
You can define a subclass of FilterIndexReader that re-sorts documents
in TermPositions(Term) and document(int), then use
IndexWriter.addIndexes() to write this in Lucene's standard format. I
have done this in Nutch, with the (as yet unused) IndexOptimizer.
http://cvs.sourceforge.net/viewcvs.p
On Tuesday 20 July 2004 17:28, John Wang wrote:
>I have asked to make the Lucene API less restrictive many many many
> times but got no replies.
I suggest you just change it in your source and see if it works. Then you can
still explain what exactly you did and why it's useful. From the deve
I performed lucene indexing with 25,000 documents.
We feel that indexing is slow, so I am trying to tune it.
My configuration is as follow:
Machine: Windows XP, 1GB RAM, 3GHz
# of documents: 25,000
App Server: Weblogic 7.0
lucene version: lucene 1.4 final
I ran the indexer with merge factor of 10
Hi:
I am trying to store some Databased like field values into lucene.
I have my own way of storing field values in a customized format.
I guess my question is wheather we can make the Reader/Writer
classes, e.g. FieldReader, FieldWriter, DocumentReader/Writer classes
non-final?
I have a
On Jul 20, 2004, at 10:07 AM, Ian McDonnell wrote:
As for indexing data from mysql - there have been lots of discussions
of that recently, so check the archives. Basically you read the data,
and index it with Lucene's API. And you are responsible for keeping
it >in sync.
The problem i am having
On Jul 20, 2004, at 9:29 AM, Ian McDonnell wrote:
Basically i add details about a movie clip as various fields in an sql
db using a jsp form. When the form submits i want to add the details
into the db and also want the fields to be stored as a searchable
lucene index on the server.
Is this pos
Yeah that last part of your reply seems to be what i'm trying to do(you're going to
have to excuse me as i'm a total newbie to Lucene and am only finding my feet with
it). I searched the archives and went back through it manually just there, but didnt
find any relevant posts in the archive.
>As
Basically i add details about a movie clip as various fields in an sql db using a jsp
form. When the form submits i want to add the details into the db and also want the
fields to be stored as a searchable lucene index on the server.
Is this possible?
Ian
--- Erik Hatcher <[EMAIL PROTECTED]>
On Jul 20, 2004, at 8:44 AM, Ian McDonnell wrote:
Can Lucenes indexer be used to store info in fields in a mysql db?
I'm not quite clear on your question. You want to store a Lucene index
(aka Directory) within mysql?
Or, you want to index data from your existing mysql database into a
Lucene in
Can Lucenes indexer be used to store info in fields in a mysql db?
If so can anybody point me to an example or some documentation relating to it.
Ian
_
Sign up for FREE email from SpinnersCity Online Dance Magazine & Vortal at
http://w
On Jul 20, 2004, at 1:27 AM, Aphinyanaphongs, Yindalon wrote:
I gather from reading the documentation that the scores for each
document hit are computed at query time. I have an application that,
due to the complexity of the function, cannot compute scores at query
time. Would it be possible f
Daniel,
> > Does anybody here know which changes I
> > would have to make to QueryParser.jj to get the functionality described?
>
> I haven't tried it but I guess you need to change the getXXXQuery() methods so
> they return a BooleanQuery. For example, getFieldQuery currently might return
> a
35 matches
Mail list logo