in the index. Think of
positions in a list - they are not part of the
list itself. You have to take into account that
these numbers may change for documents after
any deletions in the index.
Regards,
Karsten
--
Dr.-Ing. Karsten Konrad
Head of Artificial Intelligence Lab
Xtramind Technologies GmbH
then a search over the filtered document index and counting the
number of results. Filters are quite
efficient.
Hope this helps,
Karsten
--
Dr.-Ing. Karsten Konrad
Head of Artificial Intelligence Lab
Xtramind Technologies GmbH
Stuhlsatzenhausweg 3
D-66123 Saarbrücken
Phone +49 (681) 3 02-51 13
intelligence and similar tasks, but not too many people require such
specialized features.
Brox' price models for this engine may be interesting for those who find other
products too expensive; it also works with all existing search engines, not only
Lucene.
--
Dr.-Ing. Karsten Konrad
Head
search engines can be done without programming ones
leg off.
I know them because they use my clustering algorithm
when doing meta-searches. See
http://searchdemo.brox.de/
(search for Lucene - the clustering is geared towards German
though!)
Regards,
--
Dr.-Ing. Karsten Konrad
Head of Artificial
Hi,
an elegant method is to create an empty directory and merge
the index to be copied into it, using .addDirectories() of
IndexWriter. This way, you do not have to deal with files
at all.
Regards,
Karsten
-Ursprüngliche Nachricht-
Von: Nicolas Maisonneuve [mailto:[EMAIL PROTECTED]
Hi,
I would highly appreciate it if the experts here (especially Karsten or
Chong) look at my idea and tell me if this would be possible.
Sorry, I have no idea about how to use a probabilistic approach with
Lucene, but if anyone does so, I would like to know, too.
I am currently puzzled by
Hi,
Do they produce same ranking results?
No; Lucene's operations on query weight and length normalization is not
equivalent to a vanilla cosine in vector space.
I guess the 2nd approach will be more precise but slow.
Query similarity
will indeed be faster, but may actually not be worse.
documents. Not a great deal, really.
If TF/IDF weighting is a problem to you, the Similarity interface implementation
allows you
to remove all references to length normalization and document frequencies.
Regards,
Mit freundlichen Grüßen aus Saarbrücken
--
Dr.-Ing. Karsten Konrad
Head
allows you
to remove all references to length normalization and document
frequencies.
Regards,
Mit freundlichen Grüßen aus Saarbrücken
--
Dr.-Ing. Karsten Konrad
Head of Artificial Intelligence Lab
XtraMind Technologies GmbH
Stuhlsatzenhausweg 3
D-66123 Saarbrücken
Phone: +49 (681
a number.
The implementation is fast.
Regards,
Karsten
Mit freundlichen Grüßen aus Saarbrücken
--
Dr.-Ing. Karsten Konrad
Head of Artificial Intelligence Lab
XtraMind Technologies GmbH
Stuhlsatzenhausweg 3
D-66123 Saarbrücken
Phone: +49 (681) 3025113
Fax: +49 (681) 3025109
[EMAIL PROTECTED
Rules of linguistics? Is there such a thing? :)
Yes there are. How can you expect communication (the goal of
the game that natural language is about) to work if the game
has no rules?
Anyway, Herb is right, sentence boundaries do carry a meaning and the
linguistic rule could be phrased as:
not be convinced to work
on such structures and boost the relation of terms more if they appear
within closer RST-structure connections.
Regards,
Karsten
Mit freundlichen Grüßen aus Saarbrücken
--
Dr.-Ing. Karsten Konrad
Head of Artificial Intelligence Lab
XtraMind Technologies GmbH
Stuhlsatzenhausweg
Not only is the query slow, but it seems to be slower the more results
it returns.
Any suggestions?
If you have a lot of terms in that range,
you can see that there is obviously some cycles spinning to do the work
needed.
If the number of different date terms causes this effect, why
different as it
requires an extension by position information).
E.g., searching a term means finding all vectors that have
a certain common dimension and ranking means weighting these
relatively to their angle in vector space.
KK
Mit freundlichen Grüßen aus Saarbrücken
--
Dr.-Ing. Karsten
not expect to be.
Regards,
Karsten
Mit freundlichen Grüßen aus Saarbrücken
--
Dr.-Ing. Karsten Konrad
Head of Artificial Intelligence Lab
XtraMind Technologies GmbH
Stuhlsatzenhausweg 3
D-66123 Saarbrücken
Phone: +49 (681) 3025113
Fax: +49 (681) 3025109
[EMAIL PROTECTED]
www.xtramind.com
Hi,
it is very easy to provoke the errrors you describe
when you are opening many alternating writers and
readers on Windows.
You can circumvent this problem by using fewer
writer and reader objects, e.g., first delete
all documents to update, then write all the
updated documents. Or use a
Hi,
after indexing 238000 Documents on a Linux box, we get the
following error:
Caused by:java.lang.IllegalStateException: docs out of order
at: java.lang.IllegalStateException: docs out of order
at org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:219)
Hi,
1) How can I search untokenized fields? Do I have to pass my query
through a NullAnalyzer?
No, the contents of an untokenized (i.e., keyword) field
are stored as one lucene token. Hence, you must build such
a token from your query and build a
TermQuery for being able to search it.
In
An: [EMAIL PROTECTED]
Betreff: Re: AW: Analyzers, Queries: three questions
Karsten Konrad wrote:
2) How can I pass the value of a field through an Analyzer before
storing it?
A text field is automatically analyzed and tokenized by the given
analyzer, you do not have to do it manually.
Well
Thanks,
do you have already some numbers how it compares to the
file system implementation, i.e., how fast is indexing
and searching?
Regards,
Karsten
-Ursprüngliche Nachricht-
Von: Anthony Eden [mailto:[EMAIL PROTECTED]
Gesendet: Montag, 2. Juni 2003 22:23
An: Lucene Users List
terms(Term t) throws IOException;
I haven't found a way to stop the enumeration once I am sure that
the input term can not match any more :)
Regards,
Karsten
-Ursprüngliche Nachricht-
Von: Eric Jain [mailto:[EMAIL PROTECTED]
Gesendet: Montag, 2. Juni 2003 13:17
An: Karsten Konrad
Hi,
please have a look at the FuzzyTermEnum class in Lucene.
There is an impressive implementation of Levenshtein distance
there that you can use; simply set the fuzzy distance higher
than 0.5 (0.75 seems to work fine) and modify the
termCompare method such that the last term produced is
22 matches
Mail list logo