handling token created/deleted events in an Index

2008-06-16 Thread Mathieu Lecarme
With the LUCENE-1297, the SpellChecker will be able to choose how to estimate distance between two words. Here are some other enhancement: * The capacity to synchronize the main Index and the SpellChecker Index. Handling tokens creation is easy, a simple TokenFilter can do the work. But

Re: WebLuke - include Jetty in Lucene binary distribution?

2008-04-25 Thread Mathieu Lecarme
markharw00d a écrit : Any word on getting this committed as a contrib? Not really changed the code since the message below. I can commit pretty much the contents of the zip file below any time you want. Do folks still feel comfortable with the bloat this adds to the Lucene source distro?

Re: Storing phrases in index

2008-04-10 Thread Mathieu Lecarme
palexv a écrit : Thanks! Can you help me to get ShingleFilter class. It is absent in version 2.3.1. How can I get it? It's in the SVN version. You can backport it, are building your own, with a Stack. M. - To unsubscribe,

Re: Optimise Indexing time using lucene..

2008-04-09 Thread Mathieu Lecarme
lucene4varma a écrit : Hi all, I am new to lucene and am using it for text search in my web application, and for that i need to index records in database. We are using jdbc directory to store the indexes. Now the problem is when is start the process of indexing the records for the first time it

Re: Storing phrases in index

2008-04-09 Thread Mathieu Lecarme
palexv a écrit : Hello all. I have a question to advanced in lucene. I have a set of phrases which I need to store in index. Is there is a way of storing phrases as terms in index? How is the best way of writing such index? Should this field be tokenized? not tokenized What is the best

Re: shingles and punctuations

2008-04-08 Thread Mathieu Lecarme
the WikipediaTokenizer is the only one using flags currently in the Lucene. On Apr 6, 2008, at 10:43 PM, Mathieu Lecarme wrote: I'll use Token flags to specifiy first token in a sentence, but how it's works? how flag collision is avoided? to keep it simple, i'll take 1 as flag, but what happens

shingles and punctuations

2008-04-06 Thread Mathieu Lecarme
The newly ShingleFilter is very helpful to fetch group of words, but it doesn't handle ponctuation or any separation. If you feed it with multiple sentences, you will get shingle that start in one sentences and end in the next. In order to avoid that, you can handle token positions, if there

Re: WordNet synonyms overhead

2008-03-18 Thread Mathieu Lecarme
Harald Näger a écrit : Hi, I am especially interessted in the WordNet synonym expansion that was discussed in the Lucene in Action book. Is there anyone here on the list who has experience with this approach? I'm curious about how much the synonym expansion will increase the size of an

Re: [jira] Created: (LUCENE-1229) NGramTokenFilter optimization in query phase

2008-03-14 Thread Mathieu Lecarme
Hiroaki Kawai (JIRA) a écrit : NGramTokenFilter optimization in query phase Key: LUCENE-1229 URL: https://issues.apache.org/jira/browse/LUCENE-1229 Project: Lucene - Java Issue Type:

Re: an API for synonym in Lucene-core

2008-03-13 Thread Mathieu Lecarme
contribution (I looked at it briefly the other day, but haven't had the chance to really understand it). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Mathieu Lecarme [EMAIL PROTECTED] To: java-dev@lucene.apache.org Sent: Wednesday, March 12

an API for synonym in Lucene-core

2008-03-12 Thread Mathieu Lecarme
Why Lucen doesn't have a clean synonym API? WordNet contrib is not an answer, it provides an Interface for its own needs, and most of the world don't speak english. Compass provides a tool, just like Solr. Lucene is the framework for applications like Solr, Nutch or Compass, why don't backport

[jira] Commented: (LUCENE-1190) a lexicon object for merging spellchecker and synonyms from stemming

2008-03-07 Thread Mathieu Lecarme (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12576415#action_12576415 ] Mathieu Lecarme commented on LUCENE-1190: - A simpler preview of Lexicon features

Re: [jira] Commented: (LUCENE-1190) a lexicon object for merging spellchecker and synonyms from stemming

2008-03-02 Thread Mathieu Lecarme
hum, quote and question disappear. Le 2 mars 08 à 13:32, Mathieu Lecarme (JIRA) a écrit : [ https://issues.apache.org/jira/browse/LUCENE-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12574214 #action_12574214 ] Mathieu Lecarme commented

[jira] Updated: (LUCENE-1190) a lexicon object for merging spellchecker and synonyms from stemming

2008-02-29 Thread Mathieu Lecarme (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu Lecarme updated LUCENE-1190: Attachment: aphone+lexicon.patch a lexicon object for merging spellchecker and synonyms

[jira] Created: (LUCENE-1190) a lexicon object for merging spellchecker and synonyms from stemming

2008-02-25 Thread Mathieu Lecarme (JIRA)
Type: New Feature Components: contrib/*, Search Affects Versions: 2.3 Reporter: Mathieu Lecarme Attachments: aphone+lexicon.patch Some Lucene features need a list of referring word. Spellchecking is the basic example, but synonyms is an other use. Other tools can

[jira] Updated: (LUCENE-1190) a lexicon object for merging spellchecker and synonyms from stemming

2008-02-25 Thread Mathieu Lecarme (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu Lecarme updated LUCENE-1190: Attachment: aphone+lexicon.patch a lexicon object for merging spellchecker and synonyms

[jira] Updated: (LUCENE-956) phonem conversion from aspell dictionnary

2008-02-21 Thread Mathieu Lecarme (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu Lecarme updated LUCENE-956: --- Attachment: aphone.patch New version, with more language (bg, br, da, de, el, en, fo, fr

Re: Need help for ordering results by specific order

2007-07-19 Thread Mathieu Lecarme
it atindexing time)? Mathieu Lecarme wrote: Have a look of the book Lucene in action, ch 6.1 : using custom sort method SortComparatorSource might be your friend. Lucene selecting stuff, and you sort, just like you wont. M. Le 18 juil. 07 à 10:29, savageboy a écrit : Hi, I am newer

Re: Need help for ordering results by specific order

2007-07-18 Thread Mathieu Lecarme
Have a look of the book Lucene in action, ch 6.1 : using custom sort method SortComparatorSource might be your friend. Lucene selecting stuff, and you sort, just like you wont. M. Le 18 juil. 07 à 10:29, savageboy a écrit : Hi, I am newer for lucene. I have a project for search engine

Re: for a better spellchecker

2007-07-13 Thread Mathieu Lecarme
The SpellChecker code mix indexing function, ngram treatment, and querying functions. Extending it will not produce clean code. Is it relevant to first refactor SpellChecker code for extracting dictionary reading function and indexing/searching functions? SpellChecker will get a method to add

[jira] Created: (LUCENE-956) phonem conversion from aspell dictionnary

2007-07-11 Thread Mathieu Lecarme (JIRA)
Affects Versions: 2.2 Reporter: Mathieu Lecarme First step to improve Spellchecker's suggestions : phonem conversion for differents languages. The conversion code is build from aspell file description. The patch contains class for managing english, french, wallon and swedish. If it's

[jira] Updated: (LUCENE-956) phonem conversion from aspell dictionnary

2007-07-11 Thread Mathieu Lecarme (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu Lecarme updated LUCENE-956: --- Attachment: aphone.patch phonem conversion from aspell dictionnary

build.xml for a contrib wich depend on an other contrib

2007-07-10 Thread Mathieu Lecarme
The first version of aspell format phonem converter in java is almost finished. The source is buildable with ant, but, in the lucene trunk, it failed. The build depends on SpellChecker wich is build after. How can can I fix it? A statical spellChecker.jar in lib in my contrib? a depends in

Re: [jira] Updated: (LUCENE-906) Elision filter for simple french analyzing

2007-06-28 Thread Mathieu Lecarme
Any news about the integration of this patch? M. Mathieu Lecarme (JIRA) a écrit : [ https://issues.apache.org/jira/browse/LUCENE-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu Lecarme updated LUCENE-906

[jira] Updated: (LUCENE-906) Elision filter for simple french analyzing

2007-06-13 Thread Mathieu Lecarme (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu Lecarme updated LUCENE-906: --- Attachment: elision-0.2.patch All suggested corrections are done. Elision filter

[jira] Updated: (LUCENE-906) Elision filter for simple french analyzing

2007-06-13 Thread Mathieu Lecarme (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu Lecarme updated LUCENE-906: --- Attachment: (was: elision-0.2.patch) Elision filter for simple french analyzing

[jira] Updated: (LUCENE-906) Elision filter for simple french analyzing

2007-06-05 Thread Mathieu Lecarme (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mathieu Lecarme updated LUCENE-906: --- Attachment: elision.patch Elision filter for simple french analyzing

[jira] Created: (LUCENE-906) Elision filter for simple french analyzing

2007-06-05 Thread Mathieu Lecarme (JIRA)
Reporter: Mathieu Lecarme If you don't wont to use stemming, StandardAnalyzer miss some french strangeness like elision. l'avion wich means the plane must be tokenized as avion (plane). This filter could be used with other latin language if elision exists. -- This message is automatically

using a french specific analyser without stemming

2007-06-04 Thread Mathieu Lecarme
For a project with a lot ofLucene search (via Compass), I had some troubles with stemming. Stemming is nice for enlarge search range, but make completion strange. So FrenchAnalyzer was not usable. A simpler StandardAnalyzer makes the job right, except for some french speciality, like