With the LUCENE-1297, the SpellChecker will be able to choose how to
estimate distance between two words.
Here are some other enhancement:
* The capacity to synchronize the main Index and the SpellChecker
Index. Handling tokens creation is easy, a simple TokenFilter can do
the work. But
markharw00d a écrit :
Any word on getting this committed as a contrib?
Not really changed the code since the message below. I can commit
pretty much the contents of the zip file below any time you want.
Do folks still feel comfortable with the bloat this adds to the
Lucene source distro?
palexv a écrit :
Thanks!
Can you help me to get ShingleFilter class. It is absent in version 2.3.1.
How can I get it?
It's in the SVN version. You can backport it, are building your own,
with a Stack.
M.
-
To unsubscribe,
lucene4varma a écrit :
Hi all,
I am new to lucene and am using it for text search in my web application,
and for that i need to index records in database.
We are using jdbc directory to store the indexes. Now the problem is when is
start the process of indexing the records for the first time it
palexv a écrit :
Hello all.
I have a question to advanced in lucene.
I have a set of phrases which I need to store in index.
Is there is a way of storing phrases as terms in index?
How is the best way of writing such index? Should this field be tokenized?
not tokenized
What is the best
the
WikipediaTokenizer is the only one using flags currently in the
Lucene.
On Apr 6, 2008, at 10:43 PM, Mathieu Lecarme wrote:
I'll use Token flags to specifiy first token in a sentence, but how
it's works? how flag collision is avoided? to keep it simple, i'll
take 1 as flag, but what happens
The newly ShingleFilter is very helpful to fetch group of words, but
it doesn't handle ponctuation or any separation.
If you feed it with multiple sentences, you will get shingle that
start in one sentences and end in the next.
In order to avoid that, you can handle token positions, if there
Harald Näger a écrit :
Hi,
I am especially interessted in the WordNet synonym expansion that was
discussed in the Lucene in Action book. Is there anyone here on the list
who has experience with this approach?
I'm curious about how much the synonym expansion will increase the size of an
Hiroaki Kawai (JIRA) a écrit :
NGramTokenFilter optimization in query phase
Key: LUCENE-1229
URL: https://issues.apache.org/jira/browse/LUCENE-1229
Project: Lucene - Java
Issue Type:
contribution (I looked at it
briefly the other day, but haven't had the chance to really understand it).
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Mathieu Lecarme [EMAIL PROTECTED]
To: java-dev@lucene.apache.org
Sent: Wednesday, March 12
Why Lucen doesn't have a clean synonym API?
WordNet contrib is not an answer, it provides an Interface for its own
needs, and most of the world don't speak english.
Compass provides a tool, just like Solr. Lucene is the framework for
applications like Solr, Nutch or Compass, why don't backport
[
https://issues.apache.org/jira/browse/LUCENE-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12576415#action_12576415
]
Mathieu Lecarme commented on LUCENE-1190:
-
A simpler preview of Lexicon features
hum, quote and question disappear.
Le 2 mars 08 à 13:32, Mathieu Lecarme (JIRA) a écrit :
[ https://issues.apache.org/jira/browse/LUCENE-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12574214
#action_12574214 ]
Mathieu Lecarme commented
[
https://issues.apache.org/jira/browse/LUCENE-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mathieu Lecarme updated LUCENE-1190:
Attachment: aphone+lexicon.patch
a lexicon object for merging spellchecker and synonyms
Type: New Feature
Components: contrib/*, Search
Affects Versions: 2.3
Reporter: Mathieu Lecarme
Attachments: aphone+lexicon.patch
Some Lucene features need a list of referring word. Spellchecking is the basic
example, but synonyms is an other use. Other tools can
[
https://issues.apache.org/jira/browse/LUCENE-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mathieu Lecarme updated LUCENE-1190:
Attachment: aphone+lexicon.patch
a lexicon object for merging spellchecker and synonyms
[
https://issues.apache.org/jira/browse/LUCENE-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mathieu Lecarme updated LUCENE-956:
---
Attachment: aphone.patch
New version, with more language (bg, br, da, de, el, en, fo, fr
it atindexing time)?
Mathieu Lecarme wrote:
Have a look of the book Lucene in action, ch 6.1 : using custom
sort method
SortComparatorSource might be your friend. Lucene selecting stuff,
and you sort, just like you wont.
M.
Le 18 juil. 07 à 10:29, savageboy a écrit :
Hi,
I am newer
Have a look of the book Lucene in action, ch 6.1 : using custom
sort method
SortComparatorSource might be your friend. Lucene selecting stuff,
and you sort, just like you wont.
M.
Le 18 juil. 07 à 10:29, savageboy a écrit :
Hi,
I am newer for lucene.
I have a project for search engine
The SpellChecker code mix indexing function, ngram treatment, and
querying functions. Extending it will not produce clean code.
Is it relevant to first refactor SpellChecker code for extracting
dictionary reading function and indexing/searching functions?
SpellChecker will get a method to add
Affects Versions: 2.2
Reporter: Mathieu Lecarme
First step to improve Spellchecker's suggestions : phonem conversion for
differents languages.
The conversion code is build from aspell file description. The patch contains
class for managing english, french, wallon and swedish. If it's
[
https://issues.apache.org/jira/browse/LUCENE-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mathieu Lecarme updated LUCENE-956:
---
Attachment: aphone.patch
phonem conversion from aspell dictionnary
The first version of aspell format phonem converter in java is almost
finished. The source is buildable with ant, but, in the lucene trunk,
it failed. The build depends on SpellChecker wich is build after. How
can can I fix it? A statical spellChecker.jar in lib in my contrib? a
depends in
Any news about the integration of this patch?
M.
Mathieu Lecarme (JIRA) a écrit :
[
https://issues.apache.org/jira/browse/LUCENE-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mathieu Lecarme updated LUCENE-906
[
https://issues.apache.org/jira/browse/LUCENE-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mathieu Lecarme updated LUCENE-906:
---
Attachment: elision-0.2.patch
All suggested corrections are done.
Elision filter
[
https://issues.apache.org/jira/browse/LUCENE-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mathieu Lecarme updated LUCENE-906:
---
Attachment: (was: elision-0.2.patch)
Elision filter for simple french analyzing
[
https://issues.apache.org/jira/browse/LUCENE-906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mathieu Lecarme updated LUCENE-906:
---
Attachment: elision.patch
Elision filter for simple french analyzing
Reporter: Mathieu Lecarme
If you don't wont to use stemming, StandardAnalyzer miss some french
strangeness like elision.
l'avion wich means the plane must be tokenized as avion (plane).
This filter could be used with other latin language if elision exists.
--
This message is automatically
For a project with a lot ofLucene search (via Compass), I had some
troubles with stemming. Stemming is nice for enlarge search range,
but make completion strange.
So FrenchAnalyzer was not usable. A simpler StandardAnalyzer makes
the job right, except for some french speciality, like
29 matches
Mail list logo