RE: Lock obtain timed out

2003-12-16 Thread MOYSE Gilles (Cetelem)
Hi. I obtained this exception when I had more than one thread trying to create an IndexWriter. I solved it by placing the code using the IndexWriter in a synchronized method. Hope it will help, Gilles. -Message d'origine- De : Hohwiller, Joerg [mailto:[EMAIL PROTECTED] Envoyé : mardi

RE: Tokenizing text custom way

2003-11-26 Thread MOYSE Gilles (Cetelem)
Do you want to define expressions, i.e. a set of terms that must be intpreted as a whole ? For instance, when the Analyzer catchs time followed by out it returns time_out ? -Message d'origine- De : Dragan Jotanovic [mailto:[EMAIL PROTECTED] Envoyé : mercredi 26 novembre 2003 12:12 À :

RE: Tokenizing text custom way

2003-11-25 Thread MOYSE Gilles (Cetelem)
Hi. You should define expressions. To define expressions, you first have to define an expression file. An expression file contains one expressions per line. For instance : time_out expert_system ... You can use any character to specify the expression link. Here, I use the

RE: Document ID's and duplicates

2003-11-19 Thread MOYSE Gilles (Cetelem)
Hi. You just have to add a field in your document object before adding it to the index. The field should be of keyword type. You can use a code of that kind : IndexWriter writer = new IndexWriter(path_to_your_index, your_analyzer_object); Document doc = new Document();

RE: Document ID's and duplicates

2003-11-19 Thread MOYSE Gilles (Cetelem)
-Original Message- From: MOYSE Gilles (Cetelem) [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 19, 2003 6:57 AM To: 'Lucene Users List' Subject: RE: Document ID's and duplicates Hi. You just have to add a field in your document object before adding it to the index

Boost in Query Parser

2003-11-12 Thread MOYSE Gilles (Cetelem)
Hello. I've made a Filter which recognizes special words and return them in a boosted form, in a QueryParser sense. For instance, when the filter receives special_word, it returns special_word^3, so as to boost it. The problem is that the QueryParser understands the boost syntax when the string

RE: multiple tokens from a single input token

2003-11-10 Thread MOYSE Gilles (Cetelem)
Hi. I experienced the same problem, and I used the following solution (maybe not the good one, but it works, and not too slowly). The problem was to detect synonyms. I used a synonyms file, made up of that kind of lines : a b c d e f to define a, b, and c as synonyms, and d, e

Compound expression extraction

2003-10-21 Thread MOYSE Gilles (Cetelem)
Hi. I'm trying to extract expressions from the terms position information, i.e., if two words appears frequently side-by-side, then we can consider that the two words are only one. For instance, 'Object' and 'Oriented' appears side-by-side 9 times out of 10. It allows us to define a new

Expression Extractions

2003-10-21 Thread MOYSE Gilles (Cetelem)
I've found something about expression extractions (the ability , when a word and another appear frequently side-by-side, to detect that they form an expression) : http://www.miv.t.u-tokyo.ac.jp/papers/matsuoFLAIRS03.pdf Gilles Moyse

RE: Does the Lucene search engine work with PDF's?

2003-10-20 Thread MOYSE Gilles (Cetelem)
You can also use the TextMining.org toolbox, which provides classes to extract text from PDF and DOC files, using the Jakarta POI project. They are all free, under Apache Licence. The URL :http://www.textmining.org/modules.php?op=modloadname=Newsfile=articlesid =6mode=threadorder=0thold=0). (URL

RE: Indexing UTF-8 and lexical errors

2003-10-14 Thread MOYSE Gilles (Cetelem)
Hi. You should edit the StandardTokenizer.jj file. It contains all the definitions to generate the StandardTokenizer.java class, that you certainly use. At the end of the StandardTokenizer.jj file, you'll find the definition of the LETTER token. You'll see all the accepted letters, in Unicode. If