mark harwood wrote:
Isn't that what Query.extractTerms is for? Isn't it
implimented by all primitive Queries?..
As of last week, yes. I changed the SpanQueries to
implement this method and then refactored the
Highlighter package's QueryTermExtractor to make use
of this (it radically simplified
Thanks guys as always... lucene (and especially the people behind
it) are top notch.
Less than 6 hours from the time I figured out that the bug was in
Lucene (and not my code, which is usually the case) - and its already
fixed (I'm going to assume - I'll test it tomorrow when I get to work)
A
addIndexes(Dir[]) was the only user of mergeSegments() that passed an
endpoint that wasn't the end of the segment list, and hence the only
caller to mergeSegments() that will see a change of behavior.
Given that, I feel comfortable enough to commit this.
-Yonik
http://incubator.apache.org/solr So
OK, the following patch seems to work for me!
You might want to try it out on your larger test Dan.
The first part probably isn't necessary (the base=start instead of
start+1), but the second part is.
-Yonik
http://incubator.apache.org/solr Solr, The Open Source Lucene Search Server
Index: org
I realized what the real problem was during the drive home.
merged segments are added after all other segments, instead of the
spot the original segments resided.
I'll propose a patch soon...
-Yonik
http://incubator.apache.org/solr Solr, The Open Source Lucene Search Server
Spoke too soon... the loop counter goes down to zero, but it looks
like the segments are added in order.
for (int i = input.readInt(); i > 0; i--) { // read segmentInfos
SegmentInfo si =
new SegmentInfo(input.readString(), input.readInt(), directory);
addElement(si)
Ah Ha! I found the problem.
SegmentInfos.read(Directory directory) reads the segment info in reverse order!
I gotta go home now... I'll look into the right fix later (it depends
on what else uses that method...)
FYI, I managed to reproduce it with only 3 documents in each index.
-Yonik
http://in
On 4/5/06, Dan Armbrust <[EMAIL PROTECTED]> wrote:
> Yonik Seeley wrote:
> > For your test case, try lowering numbers, such as maxBufferedDocs=2,
> > mergeFactor=2 or 3
> > to create more segments more quickly and cause more merges with fewer
> > documents.
>
> Good suggestion. A merge factor of
Doug Cutting wrote:
I assume that your merge factor when calling addIndexes() is less than
90. If it's 90, then what you're doing is the same as Lucene would
automatically do. I think you could save yourself a lot of trouble if
you simply lowered your merge factor substantially and then ind
Yonik Seeley wrote:
For your test case, try lowering numbers, such as maxBufferedDocs=2,
mergeFactor=2 or 3
to create more segments more quickly and cause more merges with fewer documents.
Good suggestion. A merge factor of 2 made it happen much more quickly.
Bug is filed:
http://issues.ap
> Out of interest, does indexing time speed up much on 64-bit hardware?
I was able to speed up indexing on 64-bit platform by taking advantage of
the larger address space to parallelize the indexing process. One thread
creates index segments with a set of RAMDirectories and another thread
merges t
On 4/5/06, Doug Cutting <[EMAIL PROTECTED]> wrote:
> As others have noted, this should work correctly.
One slight oddity I noticed with addIndexes(Dir[]) is that merging
starts at one past the first new segment added (not the first new
segment). It doesn't seem like that should hurt much though.
Dan Armbrust wrote:
My indexing process works as follows (and some of this is hold-over from
the time before lucene had a compound file format - so bear with me)
I open up a File based index - using a merge factor of 90, and in my
current test, the compound index format. When I have added 100
On 4/5/06, Dan Armbrust <[EMAIL PROTECTED]> wrote:
> I will try to come up with a reproduceable test case.
If you reproduce it, I'll fix it :-)
For your test case, try lowering numbers, such as maxBufferedDocs=2,
mergeFactor=2 or 3
to create more segments more quickly and cause more merges with f
: Well, I set out to write JUnit test case to quickly show this... but
: I'm having a heck of a time doing it. With relatively small numbers of
: documents containing very few fields... I haven't been able to recreate
: the out-of-order problem. However, with my real process, with a ton
: more
Yonik Seeley wrote:
On 4/5/06, Dan Armbrust <[EMAIL PROTECTED]> wrote:
I'll continue to try to generate a test case that gets the docs out of
order... but if someone in the know could answer authoritatively whether
I browsed the code for IndexWriter.addIndexes(Dir[]), and it looks
like it shou
On 4/5/06, Dan Armbrust <[EMAIL PROTECTED]> wrote:
> I haven't been able to recreate
> the out-of-order problem. However, with my real process, with a ton
> more data, I can recreate it every single time I index (it even gets the
> same documents out of order, consistently).
If you have enough fi
On 4/5/06, Dan Armbrust <[EMAIL PROTECTED]> wrote:
> I'll continue to try to generate a test case that gets the docs out of
> order... but if someone in the know could answer authoritatively whether
I browsed the code for IndexWriter.addIndexes(Dir[]), and it looks
like it should preserve order.
T
Chris Hostetter wrote:
: exactly the same as how I insert them. Lucene is supposed to maintain
: document order, even across index merges, correct?
Lucene definitely maintains index order for document additions -- but i
don't know if any similar claim has been made about merging whole indexes.
On Mittwoch 05 April 2006 13:02, Max Pfingsthorn wrote:
> The setMaxBufferedDocs and related parameters help a lot already to
> fully exploit my RAM when indexing, but since I'm running a fairly small
> index of around 4 docs and I'm optimizing it relatively often, I was
> wondering if there i
Daniel you are very clever! Your solution remind me this:
No temptation has overtaken you but such as is common to man; and God is
faithful, who will not allow you to be tempted beyond what you are able, but
with the temptation will provide the way of escape also, so that you will be
able to en
I don't know if there is anyway for a Custom Sort to access the lucene
score -- but another approach that works very well is to use the
FunctionQuery classes from Solr...
http://incubator.apache.org/solr/docs/api/org/apache/solr/search/function/package-summary.html
...you can make a FunctionQuer
: exactly the same as how I insert them. Lucene is supposed to maintain
: document order, even across index merges, correct?
Lucene definitely maintains index order for document additions -- but i
don't know if any similar claim has been made about merging whole indexes.
: this until I'm done w
: Thanks for your answer, you're right, filepathes are pretty much
: unique. Anyway I don't want this total-field-cache-loading situation occur
: in any circumstances - it's too expensive. My app usually crawls while
: user searches are performed. Crawl involves additions and deletions so
: IndexS
On 4/5/06, Bruno Grilheres <[EMAIL PROTECTED]> wrote:
> Thanks for your answer, I was not aware of the SOLR project,
>
> There was a big typo here, I meant less than 10 Go of PDF files per day
> during one month => i.e. less than 300 Go of PDF files.
Sorry, I'm not sure what the "Go" abbreviation
Thanks for your answer, I was not aware of the SOLR project,
There was a big typo here, I meant less than 10 Go of PDF files per day
during one month => i.e. less than 300 Go of PDF files.
I made some tests with PDF files, 100Mo or Native PDF are converted to
3Mo of index in lucene [The text wa
Hi,
I need to change the lucene sorting to give just a bit more relevance to
the recent documents (but i don't want to sort by date). I'd like to mix
the lucene score with the date of the document.
I'm following the example in "Lucene in Action", chapter 6. I'm trying
to extends the SortCompa
I'm using Lucene 1.9.1, and I'm seeing some odd behavior that I hope
someone can help me with.
My application counts on Lucene maintaining the order of the documents
exactly the same as how I insert them. Lucene is supposed to maintain
document order, even across index merges, correct?
My i
On 05.04.2006, at 17:15 Uhr, Bill Janssen wrote:
Or, as I suggested a couple of days ago, a 1.9.2 release could be
offered.
Would be a good idea, because the current nightly builds have a lot
of deprecated methods removed which where available in 1.9.1.
Lot of work just for this ... :-(
> Hi.
>
> Is it correct that in Release 1.9.1 a WRITE_LOCK_TIMEOUT is hardcoded
> and there is no way to set it from outside?
>
> I've seen a check-in in the CVS from a few days ago which added
> getters/setters for this, but ... there is no release containing
> this, right?
>
> So, my que
On 4/5/06, Bruno Grilheres <[EMAIL PROTECTED]> wrote:
> 1) High volume of data indexation but only with add and delete
> functionality (approximatively 10 PDF) => scalable architecture HDFS
> seems good.
> 2) Specific analysis chain and a given set of meta-data indexation.
> 3) Language Recognition
Hi.
Is it correct that in Release 1.9.1 a WRITE_LOCK_TIMEOUT is hardcoded
and there is no way to set it from outside?
I've seen a check-in in the CVS from a few days ago which added
getters/setters for this, but ... there is no release containing
this, right?
So, my question is: Is it s
On 4/5/06, Artem Vasiliev <[EMAIL PROTECTED]> wrote:
> The int[] array here contains references to String[] and to populate
> it still all the field values need to be loaded and compared/sorted
Terms are stored and iterated in sorted order, so no sorting needs to be done.
It's still the case that
http://regain.sourceforge.net/ ?
- Original Message -
From: "Delip Rao" <[EMAIL PROTECTED]>
To:
Sent: Wednesday, April 05, 2006 2:23 PM
Subject: searching offline
Hi,
I have a large collection of text documents that I want to search
using lucene. Is there any command line utility th
Red Piranha: http://red-piranha.sourceforge.net/
-Original Message-
From: Delip Rao [mailto:[EMAIL PROTECTED]
Sent: Wednesday, April 05, 2006 6:53 PM
To: java-user@lucene.apache.org
Subject: searching offline
Hi,
I have a large collection of text documents that I want to search
using l
Hi,
I have a large collection of text documents that I want to search
using lucene. Is there any command line utility that will allow me to
search this static collection of documents?
Writing one is an option but I want to know if anyone has already done this.
Thanks in advance,
Delip
-
You understood me right, Erik. Your solution is working well, thanks.
Venu
-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, April 05, 2006 6:03 PM
To: java-user@lucene.apache.org
Subject: Re: Which Analyzer to use when searching on Keyword fields
Venu,
Venu,
I presume you're asking about what Analyzer to use with QueryParser.
QueryParser analyzes all term text, but you can fake it for Keyword
(non-tokenized) fields by using PerFieldAnalyzerWrapper, specifying
the KeywordAnalyzer for the fields you indexed as such.
The KeywordAnalyzer c
Hi,
I am using lucene 1.4.3. Some of my fields are indexed as Keywords. I
also have subclassed Analyzer inorder to put stemming etc. I am not sure
if the input is tokenized when I am searching on keyword fields; I don't
want it to be. Do I need to have a special case in the overridden method
(Anal
Hi all,
I have a question about memory/fileio settings and the FSDirectory.
The setMaxBufferedDocs and related parameters help a lot already to fully
exploit my RAM when indexing, but since I'm running a fairly small index of
around 4 docs and I'm optimizing it relatively often, I was wonder
You need to make sure that both the indexing and searching process use
the same lock directory.
-Original Message-
From: Supriya Kumar Shyamal [mailto:[EMAIL PROTECTED]
Sent: Wednesday, April 05, 2006 4:16 PM
To: java-user@lucene.apache.org
Subject: FS lock on NFS mounted filesystem for
Hi All,
I got a strange problem during the indexer process running on Redhat ES4
Linux machine ..
java.io.FileNotFoundException: /u01/export/index/books/_2s.fnm (No such
file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.(RandomAccessFi
Hi All,
I have to develop a protoype of a search/indexation system with the
following characteristics,
1) High volume of data indexation but only with add and delete
functionality (approximatively 10 PDF) => scalable architecture HDFS
seems good.
2) Specific analysis chain and a given set of
>Isn't that what Query.extractTerms is for? Isn't it
>implimented by all primitive Queries?..
As of last week, yes. I changed the SpanQueries to
implement this method and then refactored the
Highlighter package's QueryTermExtractor to make use
of this (it radically simplified the code in there).
44 matches
Mail list logo