Thanks Mark for the pointer!
-John
On Thu, Dec 18, 2008 at 6:13 PM, Mark Miller wrote:
> No, not a bug, certainly its the intended behavior (though the name is a
> bit tricky isn't it? I've actually thought about that in the past myself).
> If you check out the javadoc on Fieldable youll find:
>
Mark Miller wrote:
TrieRangeQuery has been added to contrib. Super awesome, super
efficient, large scale sorting.
Sorry. Its way past my bedtime. Large scale numerical range searching.
Sorting on the brain.
-
To unsubscr
Well look at the issues and see for yourself :)
Its a subjective call I think. Heres my take:
There are not going to be too many sweeping changes in the next release.
There are tons of little bug fixes and improvements, but not a lot of
the bullet point type stuff that you mention in your wish
Ganesh - yahoo wrote:
>
> Optimize will remove the deletes and rearrange the document numbers.
>
> Have you done some deletes before deleting 1.3 million docs?
>
>
No, that is the crazy part. I haven't done anything to this index since it
was first compiled until I did the deletes. That is
Optimize will remove the deletes and rearrange the document numbers.
Have you done some deletes before deleting 1.3 million docs?
Regards
Ganesh
- Original Message -
From: "1world1love"
To:
Sent: Friday, December 19, 2008 9:49 AM
Subject: optimize: went from 14488449 to 38449
Ok
Does Lucene 2.9 has real time search? Any improvements in sorting? Any
facility to store a payload per document (without updating document)?
Please highlight the important feature?
Regards
Ganesh
- Original Message -
From: "Michael McCandless"
To:
Sent: Friday, December 19, 2008 3:
Ok. This is crazy. I have an index with 14,488,449 docs in it. Today I did a
CheckIndex on it and everything looked fine. I made a copy of the index, ran
a delete on about 1.3 million docs and then did an optimize and now my doc
count is 38449.
The index was originally built with 2.3, but I am no
No, not a bug, certainly its the intended behavior (though the name is a
bit tricky isn't it? I've actually thought about that in the past
myself). If you check out the javadoc on Fieldable youll find:
/** Expert:
*
* If set, omit term freq, positions and payloads from postings for
this f
Thanks Mark!I don't think it is documented (at least the ones I've read),
should this be considered as a bug or ... ?
Thanks
-John
On Thu, Dec 18, 2008 at 2:05 PM, Mark Miller wrote:
> Drops positions as well.
>
> - Mark
>
>
>
> On Dec 18, 2008, at 4:57 PM, "John Wang" wrote:
>
> Hi:
>> In
Well... there are a couple threads on java-dev discussing this "now":
http://www.nabble.com/2.9-3.0-plan---Java-1.5-td20972994.html
http://www.nabble.com/2.9,-3.0-and-deprecation-td20099343.html
though they seem to have petered out.
Also we have 29 open issues for 2.9:
https://issues.a
Drops positions as well.
- Mark
On Dec 18, 2008, at 4:57 PM, "John Wang" wrote:
Hi:
In lucene 2.4, when Field.omitTF() is called, payload is disabled as
well. Is this intentional? My understanding is payload is
independent from
the term frequencies.
Thanks
-John
Hi:
In lucene 2.4, when Field.omitTF() is called, payload is disabled as
well. Is this intentional? My understanding is payload is independent from
the term frequencies.
Thanks
-John
Hi -
I am just curious - what is the approximate release target date that we have
for Lucene 2.9 ( currently in beta in dev).
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: j
I would recommend, very strongly, that you don't rely on the doc IDs being
the same in two different indexes. Doc IDs are just incremented by one
for each doc added, but.
optimization can change the doc ID. and is guaranteed to change at
least some of them if there are deletions from your inde
These results are surprising.
I'd expect single IndexWriter with 2 threads to do better than a
single thread, but in your test two threads are significantly worse
than one.
Is it possible there's a bottleneck outside of Lucene in sourcing the
documents?
How many segments are produced a
I would think that if the place names are English, which those in Boston
would be, then they would be reasonable candidates for soundex and
double metaphone. I am considering an approach where I store SOUNDEX,
refined SOUNDEX, doublemetaphone, and I'll look into ngram as well, and
search against ea
Op Wednesday 17 December 2008 22:49:08 schreef 1world1love:
> Just an FYI in case anyone runs into something similar.
>
> Essentially I had indexes that I have been searching from a java
> stored procedure in Oracle without issue for awhile. All of a sudden,
> I started getting the error I alluded
Hi,
I noticed that the doc id is the same. So, if I have HitCollector, just
collect the doc-ids of both Searchers (for the two indexes) and find the
intersection between them, it would work. Also, get the doc is even
where there are large number of hits is fast.
Of course, I am using somethin
that makes it much faster (<100ms after the first run). thanks alot.
also, the index will be updated oftenly throughout the day, will keeping the
indexreader open recognize updates to the index?
Sincerely,
Chris Salem
Development Team
Main Sequence Technologies, Inc.
PCRecruiter.net - PCRecruite
Thanks. Yep the code is very easy. However, it take about 3 mins to
complete merging.
Looks like I will need to have an out of band merging of indexes once
they are closed (planning to store about 50mil entries in each index
partition)
However, as the data is being indexed, is there any oth
A lot depends upon what you mean by "search across all fields".
For single-term queries, that's pretty straight forward, but for, say,
(a AND b) what does it mean to "search across all fields"? Should
you get a hit if a appears only in field1 and b appears only in field 2?
Or should you only get a
I would do query expansion:
- receive the query, parse it the way you want (e.g. use QueryParser)
- then expand your query along the various fields
If using different analyzers per field (e.g. soundex), you'll have to
adjust things when coming into the term-query.
paul
Le 18-déc.-08 à 16:0
Hi,
I'm beginner on Lucene. I'm working on a Poc Lucene project at Generali
France company.
I have 40 fields (max ten words by field) in my index of about 6
millions documents.
I need to search a word in all fields.
Must I create a field "content" with all the informations of the others
fields ?
Thank you, it works very good.
Regards
Ariel
On Thu, Dec 18, 2008 at 8:22 AM, Erick Erickson wrote:
> Use the setSort that takes an array of Sort objects...
>
> On Thu, Dec 18, 2008 at 8:11 AM, Ariel wrote:
>
> > What I am doing is this:
> >
> >Sort sort = new Sort();
> >
Use the setSort that takes an array of Sort objects...
On Thu, Dec 18, 2008 at 8:11 AM, Ariel wrote:
> What I am doing is this:
>
>Sort sort = new Sort();
>sort.setSort("year", true);
>hits = searcher.search(pquery,sort);
>
>
> How I must put my code to sort
Hi,
I think this should do it...
SortField dateSortField = new SortField("year", false);//the
second argument reverses the sort direction if set to true
SortField scoreSortField= new SortField(null, SortField.SCORE,
false); // value of null for field, since 'score' is not reall
You will be stunned at how easy it is. The merging code should be
a dozen lines (and that only if you are merging 6 or so indexes)
See IndexWriter.addIndexes or
IndexWriter.addIndexesNoOptimize
Best
Erick
On Thu, Dec 18, 2008 at 5:03 AM, Preetham Kajekar wrote:
> Hi,
> I tried out a single
What I am doing is this:
Sort sort = new Sort();
sort.setSort("year", true);
hits = searcher.search(pquery,sort);
How I must put my code to sort first by date an then by score ???
Greetings
Ariel
On Thu, Dec 18, 2008 at 4:48 AM, Ian Lea wrote:
> Lucene let
Somehow I seem to have missed (and can't find) your original mail, but
it seems like you're asking about using double metaphone for place
names. We've done this on our site (http://boston.povo.com) for street
and place names, and I can't say we've been happy with the results.
We're toying with ngr
On Dec 17, 2008, at 11:56 AM, Yonik Seeley wrote:
On Wed, Dec 17, 2008 at 10:32 AM, Patrick Johnstone
wrote:
As I said in the original email, my issue is that I don't
think Lucene is returning the fields in the original order
anymore.
Hmmm, you're right.
http://wiki.apache.org/jakarta-luce
I don't know of any. I'd google for "Persian Lucene" or "Farsi
Lucene". When I did that, I did see some researchers who did some
experiments w/ Lucene and Persian.
On Dec 17, 2008, at 8:12 AM, Ian Vink wrote:
I have ported the Java version of the Arabic analyzer recently
committed to
Lu
I am planning to keep indexing and searching in a single process and expose
the search functionality as a service.
In any case, i want the deletion to be done by reader, so that it could be
reflected immediately in search. If it is done by writer, then i need to
commit the changes, reopen the se
This was an attempt on addIndexesNoOptimize's part to "respect" the
maxMergeDocs (which prevents large segments from being merged) you had
set on IndexWriter.
However, the check was too pedantic, and was removed as of 2.4, under
this issue:
https://issues.apache.org/jira/browse/LUCENE
Hi,
I tried out a single IndexWriter used by two threads to index different
fields. It is slower than using two separate IndexWriters. These are my
findings
All Fields (9) using 1 IndexWriter 1 Thread - 38,000 object per sec
5 Fields using 1 IndexWriter 1 Thread - 62,000 object per sec
A
Well, if the indexing is happening in a separate process then that
will have locked the index and you won't be able to delete by reader
in your search process. I'd suggest passing the deletions to the
indexer process. In my experience everything works smoother when all
index modifications happen
Hi
Are all the queries broadly similar or are the later ones more
complex? What happens if you switch the order and run the later
queries first? Any complications like sorting? Has your jvm got
enough memory?
There is no IndexSearcher cache that you can increase.
--
Ian.
On Wed, Dec 17, 20
Lucene lets you sort by multiple fields, including score. See the
javadocs for Sort and SortField, specifically SortField.SCORE.
--
Ian.
On Wed, Dec 17, 2008 at 8:15 PM, Ariel wrote:
> Hi:
> This solution have a problem.
> the results are sorted bye the year criteria but I need that after sort
37 matches
Mail list logo