bhecht wrote:
I want to be able to split tokens by giving a list of substring words.
So I can give a list f subwords like: "strasse", "gasse",
And the token "mainstrasse" or "maingasse" will be split to 2 tokens "main"
and "strasse".
IMBEMBA, PASQUALINO: A Splitter for German Compound Words. F
On Sunday 20 May 2007 02:49, Peter Bloem wrote:
> Ah, now we're getting somewhere. So I run the first query on the
> collection index, get a set of collection id's from that. But how do I
> use them in the second query on the document index? It should be easy
> enough to retrieve all documents i
Greetings All,
We would like to introduce our java lucene based command line search tool,
Minalyzer Lite.
Minalyzer Lite ships with an indexing executable, which can index data from
file system, databases, by crawling web sites or an ARC file (output of
Heritrix Crawler). The end user does
See Paul's e-mail, he's talking about a place I haven't been in Lucene
yet.
Other than that, see below
On 5/19/07, Peter Bloem <[EMAIL PROTECTED]> wrote:
Ah, now we're getting somewhere. So I run the first query on the
collection index, get a set of collection id's from that. But how do I
Thank you for the clarification.
I assume that hits usually return in ranking order which makes sense in
terms how one usually wants to display the result. In terms of access speed
this is the non wanted order. Though it is not a big deal sorting the array
it might be interesting thinking about
Thanks for your reply. This is getting me much deeper into the uncharted
territories of Lucene, especially the area of FieldCaches, but it's also
piqued my curiosity. Most of what I've been able to find are discussions
by people that are already using FieldCache, rather than explanations of
wha
My comments on storing document id's are perhaps based on a misguided
view of lucene, but it's worth investigating. I figured since there's
only one document per id in the document index, instead of executing one
query with n OR clauses, you could execute n queries with a single docId
to get al
On Sunday 20 May 2007 19:52, Peter Bloem wrote:
> Thanks for your reply. This is getting me much deeper into the uncharted
> territories of Lucene, especially the area of FieldCaches, but it's also
> piqued my curiosity. Most of what I've been able to find are discussions
> by people that are al
Thanks, the link was helpful. I'll let you know if I find anything.
Thanks for all the replies to this.
Steve
Doron Cohen wrote:
Stephen Gray wrote:
Thanks. If the extra memory allocated is native memory I don't think
jconsole includes it in "non-heap" as it doesn't show this as
increasin
I'm constructing a search with some required terms and some optional
terms in in the query. According to some earlier posts that looks like
"+(A B) C D E" in query syntax for required terms A and B and optional
terms C D and E. In other words, Lucene considers all documents that
have both A and
I like to think of it like this:
Each doc is going to get a score -- if the score is positive the doc
will be a hit, if the score is 0 the doc will not be a hit.
If a boolean clause is Occur.Must and it is not found, the score will be
dropped to 0 no matter what (if found, the score is obviou
Have you tried the static Sort.INDEXORDER sort object in Lucene 2.1?
Erick
On 5/20/07, Andreas Guther <[EMAIL PROTECTED]> wrote:
Thank you for the clarification.
I assume that hits usually return in ranking order which makes sense in
terms how one usually wants to display the result. In term
it seems not quick.
http://demo1.minalyzer.com/minalyzerlite/search4.php?q=test&offset=0
Results 1 - 15 of 16 for test.(1.586 seconds)
2007/5/20, Saurabh Dani <[EMAIL PROTECTED]>:
Greetings All,
We would like to introduce our java lucene based command line search tool,
Minalyzer Lite.
13 matches
Mail list logo