On Mon, May 3, 2010 at 15:11, Adriano Crestani
wrote:
> I actually never liked how QueryNode -> query string is done today, using
> QueryNode.toQueryString(...) method. A QueryNode shouldn't be responsible
> for converting itself back to the string format, because different
> SyntaxParser(s) may c
Grant,
We are currently working on a relevancy improvement project. We took the IBM's
paper from 2007 TREC and followed the approaches they described to improve
Lucene's relevance. It also gave us some idea of Lucene’s out-of-the-box
precision performance (MAP). In addition to it we used som
Hello,
Lucene core doesn't seems to use relative word positioning (?) for scoring.
For example, indexing that phrase "a b c d e f g h i j k l m n o p q r
s t u v w x y z", these queries give the same results (0.19308087) :
- 1 : phrase:'e f g'
- 2 : phrase:'o k z'
I'm a bit familiar with lucen
We discovered very soon after going to production that Lucene's scores were
often 'too precise'. For example, a page of 25 results may have several
different score values, and all within 15% of each other, but to the end
user all 25 results were equally relevant. Thus we wanted the secondary sort
f
The quick answer is that the session is probably the wrong place to keep
an IndexReader, since that's per-user. I'd define a new server/servlet that
did my searching and have my webapps use that. Makes it really simple
to re-use index readers.
And reopening the IndexReader for each request will p
Regarding Part3:
Data quality
For our search domain (catalog products) we face very often the problem that
the search data is full of acronyms and abbreviations like:
cable,nym-j,pvc,3x2.5mm²
or
dvd-/cd-/usb-carradio,4x50W,divx,bl
We solved this by a combination of normalization for better data
Hi all,
In a clustered environment I search the index from the web
application. In the web application I am creating IndexReader on each
request. is it expensive to do like this? I read somewhere in the web
that try using the same reader as much as possible. Can i keep the
initially created IndexR
dear all,
as replied below, does searching again for the document in the index
and if found skip the indexing else index it, is this not similar to
indexing all pdf documents once again, is not this overhead? As I am
not going to index the details of the pdf (so if an indexed pdf was
recreated i n
dear,
Thanks for you reply Mr. simon, I found it very useful.
I have another doubt, I create the index in a clustered environment (2
physical systems and 2 virtual). A shared system among the nodes is
where this index will be created. The scheduler runs in another remote
system which will create an
Hey there,
you might have to implement a some kind of unique identifier using an
indexed lucene field. When you are indexing you should fire a query with the
uuid of your document (maybe the path to you pdf document) and check if the
document is in the index already. You could also do a boolean qu
Dear all,
I am using lucene 3.0 to index the pdf reports that I generate
dynamically. I index the pdf file name (without extension), file path
and its absolute path as fields. I search with the file name without
extension; it retrieves a list, as usually 2 or more files are present
in the same name
11 matches
Mail list logo