AW: Lucene internal document number?

2004-08-06 Thread Karsten Konrad
in the index. Think of positions in a list - they are not part of the list itself. You have to take into account that these numbers may change for documents after any deletions in the index. Regards, Karsten -- Dr.-Ing. Karsten Konrad Head of Artificial Intelligence Lab Xtramind Technologies GmbH

AW: How to acces informations from a part of the index

2004-07-09 Thread Karsten Konrad
then a search over the filtered document index and counting the number of results. Filters are quite efficient. Hope this helps, Karsten -- Dr.-Ing. Karsten Konrad Head of Artificial Intelligence Lab Xtramind Technologies GmbH Stuhlsatzenhausweg 3 D-66123 Saarbrücken Phone +49 (681) 3 02-51 13

AW: clustering results

2004-04-11 Thread Karsten Konrad
intelligence and similar tasks, but not too many people require such specialized features. Brox' price models for this engine may be interesting for those who find other products too expensive; it also works with all existing search engines, not only Lucene. -- Dr.-Ing. Karsten Konrad Head

AW: Paid support for Lucene

2004-01-30 Thread Karsten Konrad
search engines can be done without programming ones leg off. I know them because they use my clustering algorithm when doing meta-searches. See http://searchdemo.brox.de/ (search for Lucene - the clustering is geared towards German though!) Regards, -- Dr.-Ing. Karsten Konrad Head of Artificial

AW: Copy Directory to Directory function ( backup)

2004-01-15 Thread Karsten Konrad
Hi, an elegant method is to create an empty directory and merge the index to be copied into it, using .addDirectories() of IndexWriter. This way, you do not have to deal with files at all. Regards, Karsten -Ursprüngliche Nachricht- Von: Nicolas Maisonneuve [mailto:[EMAIL PROTECTED]

AW: Probabilistic Model in Lucene - possible?

2003-12-03 Thread Karsten Konrad
Hi, I would highly appreciate it if the experts here (especially Karsten or Chong) look at my idea and tell me if this would be possible. Sorry, I have no idea about how to use a probabilistic approach with Lucene, but if anyone does so, I would like to know, too. I am currently puzzled by

AW: Document Similarity

2003-12-03 Thread Karsten Konrad
Hi, Do they produce same ranking results? No; Lucene's operations on query weight and length normalization is not equivalent to a vanilla cosine in vector space. I guess the 2nd approach will be more precise but slow. Query similarity will indeed be faster, but may actually not be worse.

AW: Real Boolean Model in Lucene?

2003-12-01 Thread Karsten Konrad
documents. Not a great deal, really. If TF/IDF weighting is a problem to you, the Similarity interface implementation allows you to remove all references to length normalization and document frequencies. Regards, Mit freundlichen Grüßen aus Saarbrücken -- Dr.-Ing. Karsten Konrad Head

AW: AW: Real Boolean Model in Lucene?

2003-12-01 Thread Karsten Konrad
allows you to remove all references to length normalization and document frequencies. Regards, Mit freundlichen Grüßen aus Saarbrücken -- Dr.-Ing. Karsten Konrad Head of Artificial Intelligence Lab XtraMind Technologies GmbH Stuhlsatzenhausweg 3 D-66123 Saarbrücken Phone: +49 (681

AW: inter-term correlation [was Re: Vector Space Model in Lucene?]

2003-11-17 Thread Karsten Konrad
a number. The implementation is fast. Regards, Karsten Mit freundlichen Grüßen aus Saarbrücken -- Dr.-Ing. Karsten Konrad Head of Artificial Intelligence Lab XtraMind Technologies GmbH Stuhlsatzenhausweg 3 D-66123 Saarbrücken Phone: +49 (681) 3025113 Fax: +49 (681) 3025109 [EMAIL PROTECTED

AW: inter-term correlation [was Re: Vector Space Model in Lucene?]

2003-11-15 Thread Karsten Konrad
Rules of linguistics? Is there such a thing? :) Yes there are. How can you expect communication (the goal of the game that natural language is about) to work if the game has no rules? Anyway, Herb is right, sentence boundaries do carry a meaning and the linguistic rule could be phrased as:

Sentence dependencies (was: inter-term relation)

2003-11-15 Thread Karsten Konrad
not be convinced to work on such structures and boost the relation of terms more if they appear within closer RST-structure connections. Regards, Karsten Mit freundlichen Grüßen aus Saarbrücken -- Dr.-Ing. Karsten Konrad Head of Artificial Intelligence Lab XtraMind Technologies GmbH Stuhlsatzenhausweg

AW: Slow response time with datefilter

2003-11-15 Thread Karsten Konrad
Not only is the query slow, but it seems to be slower the more results it returns. Any suggestions? If you have a lot of terms in that range, you can see that there is obviously some cycles spinning to do the work needed. If the number of different date terms causes this effect, why

AW: Vector Space Model in Lucene?

2003-11-14 Thread Karsten Konrad
different as it requires an extension by position information). E.g., searching a term means finding all vectors that have a certain common dimension and ranking means weighting these relatively to their angle in vector space. KK Mit freundlichen Grüßen aus Saarbrücken -- Dr.-Ing. Karsten

AW: Negative boosting?

2003-09-11 Thread Karsten Konrad
not expect to be. Regards, Karsten Mit freundlichen Grüßen aus Saarbrücken -- Dr.-Ing. Karsten Konrad Head of Artificial Intelligence Lab XtraMind Technologies GmbH Stuhlsatzenhausweg 3 D-66123 Saarbrücken Phone: +49 (681) 3025113 Fax: +49 (681) 3025109 [EMAIL PROTECTED] www.xtramind.com

AW: Exceptions while Updating an Index

2003-08-28 Thread Karsten Konrad
Hi, it is very easy to provoke the errrors you describe when you are opening many alternating writers and readers on Windows. You can circumvent this problem by using fewer writer and reader objects, e.g., first delete all documents to update, then write all the updated documents. Or use a

Mysterious bugs...

2003-06-24 Thread Karsten Konrad
Hi, after indexing 238000 Documents on a Linux box, we get the following error: Caused by:java.lang.IllegalStateException: docs out of order at: java.lang.IllegalStateException: docs out of order at org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:219)

AW: Analyzers, Queries: three questions

2003-06-11 Thread Karsten Konrad
Hi, 1) How can I search untokenized fields? Do I have to pass my query through a NullAnalyzer? No, the contents of an untokenized (i.e., keyword) field are stored as one lucene token. Hence, you must build such a token from your query and build a TermQuery for being able to search it. In

AW: AW: Analyzers, Queries: three questions

2003-06-11 Thread Karsten Konrad
An: [EMAIL PROTECTED] Betreff: Re: AW: Analyzers, Queries: three questions Karsten Konrad wrote: 2) How can I pass the value of a field through an Analyzer before storing it? A text field is automatically analyzed and tokenized by the given analyzer, you do not have to do it manually. Well

AW: DBDirectory available for download

2003-06-03 Thread Karsten Konrad
Thanks, do you have already some numbers how it compares to the file system implementation, i.e., how fast is indexing and searching? Regards, Karsten -Ursprüngliche Nachricht- Von: Anthony Eden [mailto:[EMAIL PROTECTED] Gesendet: Montag, 2. Juni 2003 22:23 An: Lucene Users List

AW: Search for similar terms

2003-06-02 Thread Karsten Konrad
terms(Term t) throws IOException; I haven't found a way to stop the enumeration once I am sure that the input term can not match any more :) Regards, Karsten -Ursprüngliche Nachricht- Von: Eric Jain [mailto:[EMAIL PROTECTED] Gesendet: Montag, 2. Juni 2003 13:17 An: Karsten Konrad

AW: Search for similar terms

2003-05-31 Thread Karsten Konrad
Hi, please have a look at the FuzzyTermEnum class in Lucene. There is an impressive implementation of Levenshtein distance there that you can use; simply set the fuzzy distance higher than 0.5 (0.75 seems to work fine) and modify the termCompare method such that the last term produced is