can't you pick any arbitrary "marker" field name (that's not a real field
name) and use that?
Yes, I could. I guess you're saying that the field name doesn't matter,
except that it's used for caching the comparator, right?
... he wants the "bucketing" to happen as part of hte scoring so that t
Empirically, when I insert the elements in the FieldSortedHitQueue
they get sorted according to the Sort object. The original query
that gives me a TopDocs applied
no secondary sorting, only relevancy. Since I normalized
all the scores into one of only 5 discrete values, and secondary
sorting was
Are unindexed fields stored seperately from the main inverted
index?
If so then, one could implement the field value change as a
delete and
re-add of just that value?
The short answer is that won't work. Field values are stored in a
different data structure than the posting
I don't really understand the difference between using the ramDirectory
and using IndexWriter.
What's the difference between using ramDirectory instead of using
IndexWriter with those properties set to:
setMergeFactor(1000);setMaxMergeDocs(1);setMaxBufferedDocs(1);
Le Mercredi 28 Février 2007 16:19, WATHELET Thomas a écrit :
> I don't really understand the difference between using the ramDirectory
> and using IndexWriter.
>
> What's the difference between using ramDirectory instead of using
> IndexWriter with those properties set to:
> setMergeFactor(1000);se
Hi,
I store the Lucene Index of my web applications in a file system.
Oftenly, I need to add to this index another index also stored as file
system.
I have three questions :
* What is the best way to do this ?
Open an IndexReader on this newcoming index-file system
and use addIndexes(IndexR
Je pense mettre mal exprimée.
Dans les 2 cas j'utilise la classe IndexWriter mais dans un cas je l'utilise
avec un RamDirectory et dans l'autre avec FSDirecory (index=new IndexWriter(ram
OR fsdir,analyser,true))
Si j'utilise la classe ramDirectory c'est pour éviter l'accès disque fréquent.
Mais j
I guess it depends upon your goal. If you're asking what the difference
between writing to a RAMDirectory *then* flushing to an FSDIrectory,
I don't believer there's much, if any. As I remember (and my memory
isn't always...er...accurate), there's been discussion on this thread
by those who know t
Hey guys,
I want to filter a result set on a particular field..I have code like this
try
{
PhraseQuery textQuery = new PhraseQuery();
PhraseQuery titleQuery = new PhraseQuery();
PhraseQuery catQuery = new PhraseQuery();
textQuery.setSlop( 20 );
When you have a category, add the pair of clauses as a sub-Boolean query.
Something like...
try
{
PhraseQuery textQuery = new PhraseQuery();
PhraseQuery titleQuery = new PhraseQuery();
PhraseQuery catQuery = new PhraseQuery();
textQuery.setSlop( 20 )
thanks a lot
On 2/28/07, Erick Erickson <[EMAIL PROTECTED]> wrote:
When you have a category, add the pair of clauses as a sub-Boolean query.
Something like...
try
{
PhraseQuery textQuery = new PhraseQuery();
PhraseQuery titleQuery = new PhraseQuery();
Phras
I found the problem!
I did not realize using a HitCollector would return things in an unsorted order.
I was using the HitCollector to try to maximize performance by only returning
the documents that I needed (which page of the results, and how many per page).
-Phillip
- Original Message -
Been searching http://www.gossamer-threads.com/lists/lucene/java-user/
as Erick suggested; man, is there a wealth of information in the
Lucene archives.
I have found many examples of how to convert text to dates and back,
how to search Date fields for various ranges, and so forth -- but I
don't t
Walt,
I am no expert, but it sounds like you need to associate many
dates to a single record. Can this be handled as you would a synonym?
Basically add a token at the same offset as the row itself? i.e. you
would have a record that would also have a date field that has 3 offsets
that woul
I just add a 1000 to it, but in my rounding, I always make sure that I have 4
decimal places.
Here are some code snippets;
//indexing the lat
double lat = physicalAddress.getLatitude() + 1000.0;
Double latitude = new Double(lat);
document.add(new Field(Indexer.LATITUDE, latitude.toString()
On Wednesday 28 February 2007 01:01, Russ wrote:
> I will definatelly check it out tommorow.
>
> I also forgot to mention that I am not interested in the hits themselves,
only whether or not there was a hit. Is there something I can use that's
optimized for this scenario, or should I look into
Erich,
Yes, this seems to be the simplest way to implement score 'bucketization',
but wouldn't it be more efficient to do this with a custom ScoreComparator?
That way, you'd do the bucketizing and sorting in one 'step' (compare()).
Maybe the savings isn't measurable, though. A comparator might al
karl wettin wrote:
28 feb 2007 kl. 00.49 skrev Russ:
Thanks, I will try it tommorow... Is it significantly different from
using a standard index on a ramdir?
A bit different.
You can also try LUCENE-550. It has about the same speed as
contrib/memory but can handle multiple documents and
It may well be, but as I said this is efficient enough for my needs
so I didn't pursue it. One of my pet peeves is spending time making
things "more efficient" when there's no need, and my index isn't
going to grow enough larger to worry about that now ...
Erick
On 2/28/07, Peter Keegan <[EMAIL
Hello,
There are a few ways to solve this but no
Date extraction filter I know of. Adding
a hundred fields for each Lucene doc
seems bloated.
First, get your text out of the various
source documents (.doc,.pdf,.html) using
available tools out there described in the
Lucene in Action book.
It sou
Antony Bowesman <[EMAIL PROTECTED]> wrote on 27/02/2007 17:37:41:
> Doron Cohen wrote:
> > The collect() method is going to be invoked once for each document that
> > matches the query (having nonzero score). If the index is very large,
that
> > may turn to be a very large number of calls. Often,
Hi,
Does anyone know of a written document that describes in some details
how Lucene's ranking/scoring algorithm works? I'm safely assuming that
a single consistent algorithm is being used to compute the scores of
each matching documents (with or without explicit boost factors in the
query) and r
: I have generic material that _contain_ dates: historic time lines,
: certificates, news articles, forms, deeds, testimonies, and wildly
: free form genealogical information. The dates have no specific
: structure, obvious context, nor consistency.
identifying an extracting dates from bulk text
: I want to execute parallel search over several machines. But
: ParallelSearcher doesn't look perfect. It creates threads and spawns many
: requests to the underlying Searchables (over a network) for a single search.
: Is there a decent implementation of the parallel search over remote indexes
:
http://lucene.apache.org/java/docs/scoring.html
(which you can also find by googling "lucene scoring")
-Original Message-
From: Jong Kim [mailto:[EMAIL PROTECTED]
Sent: Wednesday, February 28, 2007 2:21 PM
To: java-user@lucene.apache.org
Subject: ranking/scoring algorithm in details
Hi,
Yeah, date finding is a little like entity extraction, since dates can
have many formats, depending on how crazy you want to get ("a week from
tomorrow" is 3/8/2007 if you know that this e-mail was written today).
So much so that I went and looked up lingpipe, but they seem to not be
concerned with
Hello
I am implemented an IndexResultSet just like java.sql.ResultSet with all its
methods. when I call searcher.search(...) I pass a the returned Hits to my
IndexResultSet.
in the IndexResultSet I have getString(String) getString(int) getInt()
next() previous() absolute() and all methods of the j
hi all,
i have requirement where in i create an index file for each xml file . i have
over 100/150 xml files which are all related .
if create 100/150 index files and query using these indices , will this affect
the performance of the search operation .
bye
raaj
28 matches
Mail list logo