Hi,
The details are as follows:
Solaris version: Solaris 10 U5 and U6
For the Java Setup, I have tried with:
Sun JDK 1.5 (32 & 64)
Sun JDK 1.6 (32 & 64)
Heap Space: 2G from 32 bit and 4G for 64 bit (Set the same values for
both XMS and XMX)
Disk: Tried with ZFS (U6) and UFS (U5)
I reduced the
Hello all,
I want to know what algorithm lucene uses for indexing documents.
Can I use lucene in my application with my own algorithm for indexing?
regards,
Nitin Gopi
Your solution (b) is better rather than using your own way of paging.
Do search for every page and collect the (pageno * count) results, discard
(pageno-1 * count) and display the last count results to the User. This is
fast and efficient.
Regards
Ganesh
- Original Message -
From:
R2.4
So, I may well be missing something here, but: I use
IndexSearcher.search(someQuery, null, count, new
Sort());
to get an instance of TopFieldDocs (the "Hits" is deprecated). So far, all
fine; I get a bunch of documents. Now, what is the Lucene-best-practice for
getting the *next* batch
Or:
// store and index this field to allow original field content retrieval and
search against it
myDocument.add(new Field("contents", theFullDocumetText, Field.Store.COMPRESS,
Field.Index.ANALYZED));
Otis--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
_
Thanks to Erick, Matthew, and Uwe -- that does help, a lot. E.g., one bit of
code I had (mostly copied) now makes more sense:
// add this field, to allow retrieving the full-text:
myDocument.add(new Field("contents", theFullDocumetText, Field.Store.COMPRESS,
Field.Index.NO));
// add this fiel
Grant Ingersoll-6 wrote:
>
> I presume they are both now slower, right? Otherwise you wouldn't
> mind the speedup on the bigger one. Hits did caching and prefetched
> things, which has it's tradeoffs. Can you describe how you were
> measuring the queries? How many results were you get
AlexElba wrote:
>
> Hello,
> I have project which I am trying to switch from lucene 2.3.2 to 2.4 I am
> getting some strange scores
>
> Before my code was:
>
> Hits hits= searcher.search(query);
> Float score = hits.score(1)
>
> and scores from hist was from 0-1; 1 was 100% match
>
> I chan
Thanks very much for helps!
-Original Message-
From: Robert Muir [mailto:rcm...@gmail.com]
Sent: Tuesday, February 17, 2009 9:48 PM
To: java-user@lucene.apache.org
Subject: Re: Hebrew and Hindi analyzers
hey i've played around with trying to get towards a reasonable gpl hebrew
analyzer f
On Wed, Feb 18, 2009 at 3:26 AM, wrote:
> Due to requirement, we need to construct a Lucene document with tens of
> thousands of Field. Did anyone try this? What's the performance penalty
> comparing with one single field to store all tokens for both indexing
> and searching?
It's doable.
Search
Fuzzy search tends to be super heavy on CPU because of the Levenstein
distance algo. We use it for a small index 60MB for spell correcting and our
QPS suffers as a result.
There was recently a discussion of a new fuzzy algorithm:
https://issues.apache.org/jira/browse/LUCENE-1513?page=com.atlassian
You could always sort by EVENTID, that way at least
you'd have all the events for a particular ID together
in your results. You'd have to post-filter the results to
determine whether all the necessary descriptions were
present. But I don't think this works all that well because,
as you pointed out,
Could you give some configuration details:
- Solaris version
- Java VM version, heap size, and any other flags
- disk setup
You should also consider using huge pages (see
http://zzzoot.blogspot.com/2009/02/java-mysql-increased-performance-with.html)
I will also be posting performance gains using
Hi
I have I request in my search projetct that I do not know if it is possible
to do easily:
If occur matching exact using PharseQuery for example I must add for this
boosting of this field a value of the other field of the document (example
price), but if occur matching partial I must
You can use the explain() method or you can use the Highlighter, but
both aren't perfect in this regards. You can also look into using
SpanQueries, which give you positional information about where matches
take place. This would require you switching how you generate queries.
There is als
Hi Lucene Users,
For those who don't already know, I will be offering a two day Lucene
Boot Camp training at ApacheCon Europe on March 23 and 24. The two
day class covers a lot of detail on how to use Lucene to build search
applications, including the basics of searching, indexing and
an
Have you tried NGram SpellChecker + Query expansion? This is quite similar to
your proposal, you have your priority queue in SpellChecker
- Original Message
> From: mark harwood
> To: java-user@lucene.apache.org
> Sent: Wednesday, 18 February, 2009 11:54:18
> Subject: Re: Lucene sear
dear lucene community,
i am playing around with lucene right now. and have come to very bad problem.
given environment:
a signal source gives signals with eventids ans eventdescriptions
for example EVENTID=1 and EVENTDESCRIPTION="STARTING EVENT"
those events can be running very long (e.g. one
I'm still not clear why the built-in phrase query syntax won't work. If I
index the following terms (erick, erickson, thinks, small, thoughts)
in a single field, then searching for "erick erickson" (as a phrase query,
i.e. with double quotes when sent through a query parser or constructing
a Phrase
The method suggested would make the speed faster, but I doubt whether it
would be substantial on processors with slower clock speed. Keeping in
mind that most processors are going multi-core, it would make sense to
multi-thread the scan.
Any remarks are welcome!
Varun Dhussa
Product Architect
Thank you Erick.
I need first to index phrases, the built-in phrase processing (with double
quotes) comes in the search step.
Is there any difference between :
1) start by indexing phrases and then make a phrase search
2) index terms and then search for phrases
To
I don't understand your question. Metadata about what? The
Fields in the document? The number of terms in a field?
The most frequent word in the index? in the Document?
If you elucidated the problem you're trying to solve you'd
probably get better answers
Best
Erick
On Wed, Feb 18, 2009 at 7
Have you tried the built-in phrase processing with double quotes? e.g.
"this is a phrase"?
See the Term section at
http://lucene.apache.org/java/2_4_0/queryparsersyntax.html
Best
Erick
On Wed, Feb 18, 2009 at 5:57 AM, Nada Mimouni <
mimo...@tk.informatik.tu-darmstadt.de> wrote:
>
>
> Hello ever
Is there a way I can ask lucene what metadata elements it knows of and
is storing in its index?
Thanks - Tod
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@luc
If not for merging, I believe indexing is simply linear.
Merging adds only a logarithmic (in total index size) cost.
Using as large an IndexWriter RAM buffer as you can will minimize the
amount of merging. (Also increasing mergeFactor, or decreasing
maxMergeMB/Docs, but these will impact s
Hello everybody,
I use Lucene to index and search into text documents.
At present, I just index and search for single words. I want to extend this to
phrases (or nGrams).
Could anyone please give me details on how to index phrases and then make a
phrase search?
Thank you very much in advanc
I was having some thoughts recently about speeding up fuzzy search.
The current system does edit-distance on all terms A-Z, single threaded. Prefix
length can reduce the search space and there is a "minimum similarity"
threshold but that's roughly where we are. Multithreading this to make use o
Hi,
I have had a bad experience when migrating my application from Intel
Xeon based servers to Sun UltraSparc T2 T5120 servers. Lucene fuzzy
search just does not perform. A search which took approximately 500 ms
takes more than 6 seconds to execute.
The index has about 100,000,000 records. S
Hello everybody,
In my research work, I use Lucene to index and search into text documents.
At present, I just index and search for single words. I want to extend this to
phrases (or nGrams).
Could anyone please give me more details on how to do it and also point me to
some useful references o
Hi,
Due to requirement, we need to construct a Lucene document with tens of
thousands of Field. Did anyone try this? What's the performance penalty
comparing with one single field to store all tokens for both indexing
and searching?
Thanks,
Li
---
Did you try?
The cost of index merging grows when indexes are getting bigger.
Try to limit the max document size in a segment by setting setMaxMergeDocs in
IndexWriter.
-Original Message-
From: 治江 王 [mailto:wangzhijiang...@yahoo.com.cn]
Sent: Monday, February 16, 2009 1:49 PM
To: java-us
31 matches
Mail list logo