hi, this could probably split into two threads but for context let's start
it in a single discussion;
Recently I was looking at the search quality of Lucene - Recall and
Precision, focused at [EMAIL PROTECTED],5,10,20 and, mainly, MAP.
-- Part 1 --
I found out that quality can be enhanced by mo
Just to throw in a few things:
First off, this is great!
As I am sure you are aware: https://issues.apache.org/jira/browse/
LUCENE-836
On Jun 25, 2007, at 3:15 AM, Doron Cohen wrote:
hi, this could probably split into two threads but for context
let's start
it in a single discussion;
R
TopDocCollector.topDocs throws ArrayIndexOutOfBoundsException when called twice
---
Key: LUCENE-942
URL: https://issues.apache.org/jira/browse/LUCENE-942
Project: Lucene - Jav
To whom it may engage...
This is an automated request, but not an unsolicited one. For
more information please visit http://gump.apache.org/nagged.html,
and/or contact the folk at [EMAIL PROTECTED]
Project lucene-java has an issue affecting its community integration.
This issue affects
To whom it may engage...
This is an automated request, but not an unsolicited one. For
more information please visit http://gump.apache.org/nagged.html,
and/or contact the folk at [EMAIL PROTECTED]
Project lucene-java has an issue affecting its community integration.
This issue affects
Doron Cohen wrote:
It is very important that we would be able to assess the search quality in
a repeatable manner - so that anyone can repeat the quality tests, and
maybe find ways to improve them. (This would also allow to verify the
"improvements claims" above...). This capability seems like a
Hey Grant, thanks for your comments!
Grant Ingersoll wrote:
> As I am sure you are aware: https://issues.apache.org/jira/browse/
> LUCENE-836
I remembered you mentioning setting our own doc/query judgment system but
forgot it was in LUCENE-836, thanks for the reminder.
> On Jun 25, 2007, at 3:1
On Jun 25, 2007, at 2:19 PM, Doron Cohen wrote:
IANAL and I didn't read the link, but I think people publish their
MAP scores, etc. all the time on TREC data. I think it implies that
you obtained the data through legal means.
So you're saying that if person "X" got the TREC data legally, we
On Jun 25, 2007, at 2:04 PM, Doug Cutting wrote:
Doron Cohen wrote:
It is very important that we would be able to assess the search
quality in
a repeatable manner - so that anyone can repeat the quality tests,
and
maybe find ways to improve them. (This would also allow to verify the
"impro
On Jun 25, 2007, at 11:56 AM, Grant Ingersoll wrote:
To do this, we could use Reuters or Wikipedia. The hard part is
generating the queries and having people make relevance judgments
for a sufficient sample size.
Wikipedia is a moving target. I think the collection would have to
be sta
Yes you are correct, we could use the specific version that we use
for benchmarking. I was assuming that one, just didn't say it! :-)
-Grant
On Jun 25, 2007, at 3:00 PM, Marvin Humphrey wrote:
On Jun 25, 2007, at 11:56 AM, Grant Ingersoll wrote:
To do this, we could use Reuters or Wikipe
Marvin Humphrey wrote:
Wikipedia is a moving target. I think the collection would have to be
static.
In theory, one can evaluate against other search engines results for
Wikipedia. However this may violate their EULAs...
Doug
---
Michael McCandless wrote:
> OK, when you say "fair" I think you mean because you already had a
> previous run that used compound file, you had to use compound file in
> the run with the LUCENE-843 patch (etc)?
Yes, that's true.
> The recommendations above should speed up Lucene with or without m
: For the first change, logic is that Lucene's default length normalization
: punishes long documents too much. I found contrib's sweet-spot-similarity
: helpful here, but not enough. I found that a better doc-length
: normalization method is one that considers collection statistics - e.g.
: avera
[
https://issues.apache.org/jira/browse/LUCENE-942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508018
]
Hoss Man commented on LUCENE-942:
-
this seems like both a documentation issue, and a bad state checking issue.
the j
[
https://issues.apache.org/jira/browse/LUCENE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doron Cohen resolved LUCENE-933.
Resolution: Fixed
Lucene Fields: [Patch Available] (was: [New])
committed the bakwards-comp
[
https://issues.apache.org/jira/browse/LUCENE-942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508025
]
Doron Cohen commented on LUCENE-942:
Perhaps simpler to make the scoreDocs[] array a private data member, which n
[
https://issues.apache.org/jira/browse/LUCENE-942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508039
]
Hoss Man commented on LUCENE-942:
-
that makes sense ... but there is still a state issue of "don't call topDocs()
un
[
https://issues.apache.org/jira/browse/LUCENE-940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Doron Cohen updated LUCENE-940:
---
Attachment: lucene-940.patch
Attached patch fixing DateFormat for parallel "doc making".
Also fixing
: I think it makes sense to move to 0.8.
okay ... i removed the sitemap file, regened everything, and then read
through the diff to ensure there was nothing broken/missing -- the diff
seemed to be entirely related to teaks to the skinning between 0.7 and
0.8.
(but i'm still curious how michael h
[
https://issues.apache.org/jira/browse/LUCENE-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Hoss Man resolved LUCENE-936.
-
Resolution: Fixed
Assignee: Hoss Man
thanks for spotting this...
Committed revision 550680.
> Ty
[
https://issues.apache.org/jira/browse/LUCENE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508054
]
Hoss Man commented on LUCENE-933:
-
woops ... sorry doron, i actually reviewed these patches the other day, but
apare
[
https://issues.apache.org/jira/browse/LUCENE-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508058
]
Doron Cohen commented on LUCENE-933:
great, thanks Hoss!
> QueryParser can produce empty sub BooleanQueries when
23 matches
Mail list logo