What does your code look like? If you are using Hits, what does hits.length()
give you?
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Hasan Diwan <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Friday, May 16, 2008 1:48:56
On 15/05/2008, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
> You can get all matches via Hits if you want, it's just that Lucene will
> need to do some re-querying under the hood. Why don't you use the
> search() method that takes HitCollector to get all docs - I thought
> that's what you
Hi,
You can get all matches via Hits if you want, it's just that Lucene will need
to do some re-querying under the hood. Why don't you use the search()
method that takes HitCollector to get all docs - I thought that's what you were
trying to use in the first place.
Otis
--
Sematext -- htt
Otis,
On 15/05/2008, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
> That method should let you have *all* non-zero scoring docs if filter ==
> null.
> If that's not the case then I think that's a bug. If you can come up with a
> unit test that shows the bug, please post it in JIRA.
>From the
Hi Hasan,
That method should let you have *all* non-zero scoring docs if filter == null.
If that's not the case then I think that's a bug. If you can come up with a
unit test that shows the bug, please post it in JIRA.
Thanks,
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
pong.
Is that the most optimal use of FieldSelector? What happens if you remove it
from that HitCollector.collect method?
It looks like you are creating a new FieldSelector object for each hit found in
each search thread.
If it's not that, is the index optimized?
If not, does optimizing it make
It would appear that to see all results (including low scoring) I need
to pass a different Filter to Searcher.search[1]. If filter is null,
only the highest-scoring results are returned. How do I change the
threshold for hits returned?
--
Cheers,
Hasan Diwan <[EMAIL PROTECTED]>
1.
http://lucene.
Not directly Lucene related, but I'm out of ideas and I'm not a Russian
speaker...
I'm extracting text from RTF to pump into Lucene. I'm using the original
RTFEditorKit() code shown in LIA, p252 (actually, it's Nutch's RTFParser)
I have an RTF document, which starts with
---
{\rtf1\ansi\ansic
Thanks Karl.
My apologies for the duplicate mail sent.
>>Is Lucene your primary data store?
Almost, as most properties of my items can be queried. I would like to
be able to "not" store these fields though, but the fact that I need to
update my documents (delete + create), forces me to store th
No I did not, because I'm not performing a search with a leading
wildcard, nor am I intending to allow that behavior. But what I do want
to be able to search on is a word that starts with a * by escaping it,
because sadly our data contains such things.
Matt
Karl Wettin wrote:
15 maj 2008 k
15 maj 2008 kl. 18.33 skrev Matthew Hall:
12:23:05,602 INFO [STDOUT]
org.apache.lucene.queryParser.ParseException: Cannot parse
'\*ache*': '*' not allowed as first character in PrefixQuery
12:23:05,602 INFO [STDOUT] Failure in
QS_MarkerSearch.searchMarkerNomen
12:23:05,602 ERROR [STDER
Have you tried using Carrot2 with Lucene? They work quite well in tandem!
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
> From: Supheakmungkol SARIN <[EMAIL PROTECTED]>
> To: java-user@lucene.apache.org
> Sent: Wednesday, May 14, 2008 11:23:45 PM
Hello,
We are using lucene for a while, and we are happy with it. Now we want
to optimize some space.
We are parsing versions of files and we want to keep track of history
and also know which one is the newest we set a flag to it (field
newest=true).
so when a new version comes along :
- we w
15 maj 2008 kl. 19.15 skrev Jean-Claude Antonio:
This work perfectly, but for this we need to have a content field as
new Field("content", content, Field.Store.YES, Field.Index.TOKENIZED)
to be able to update the current document which stores the content.
We wish not to store the content as the
14 maj 2008 kl. 17.30 skrev Erick Erickson:
Another
possibility would be to introduce marker tokens in your field, index
something like "$ member of technical staff $" and then, when
querying for exact matches, *add* the $ tokens to the beginning
and end of the query.
Just a note, I've hit pro
Hello,
We are using lucene for a while, and we are happy with it. Now we want
to optimize some space.
We are parsing versions of files and we want to keep track of history
and also know which one is the newest we set a flag to it (field
newest=true).
so when a new version comes along :
- we w
Greetings,
I'm searching against a data set using lucene that contains searches
such as the following:
*ache*
*aChe*
etc and so forth, sadly this part of the dataset is imported via an
external client, so we have no real way of controlling how they format it.
Now, to make matters a bit mor
15 maj 2008 kl. 09.46 skrev Michael McCandless:
Mark Miller wrote:
Its been months since i've tested this sort of thing, but from what I
remember there is a point where as you go higher, performance
starts to
very slowly drop. The point was lower than I'd expect, and def
created
what look
> Problem I am having is that some of them has multiple columns. and multiple
> word boxes. Does the xpdf patch extract different columns and wordboxes?
It tells you where each word is. Columns you have to do for yourself.
Bill
> > In UpLib, I use xpdf-3.02pl2 with a patch which gives me positi
I haven't participated in TREC for the past 2 years, so I am wonder which
TREC track were you comparing your results against? The last time I checked,
Lucene's score for the Terabyte track wasn't wonderful, but it was still
pretty decent.
Bear in mind that Lucene uses the plain old vanilla TF-IDF
Hello Bill,
Problem I am having is that some of them has multiple columns. and multiple
word boxes. Does the xpdf patch extract different columns and wordboxes?
Best,
-C.B.
On Wed, May 14, 2008 at 6:35 PM, Bill Janssen <[EMAIL PROTECTED]> wrote:
> > > the unix program pdf2text can convert keep
Mark Miller wrote:
Its been months since i've tested this sort of thing, but from what I
remember there is a point where as you go higher, performance
starts to
very slowly drop. The point was lower than I'd expect, and def created
what looked like sweet spot settings.
This was my recollect
22 matches
Mail list logo