In current weblucene project including a SAX Based xml source indexer:
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/weblucene/weblucene/webapp/WEB-INF/src/com/chedong/weblucene/index/
It can parse xml data source like following example:
?xml version=1.0 encoding=GB2312?
Table
Record id=1
I added the following code:
for (int i = 0; i numOfDocs; i++) {
if ( !reader.isDeleted(i)) {
doc = reader.document(i);
docs[i] =
doc.get(SearchEngineConstants.REPOSITORY_PATH);
}
}
return docs;
Our application is a string similarity searcher where the query is an input string and
we want to find all fuzzy variants of the input string in the DB. The Score is
basically dice's coefficient: 2C/Q+D, where C is the number of terms (n-grams) in
common, Q is the number of unique query terms
Probably shouldn't have added that last bit. Our app isn't a DNA searcher. But
DASG+Lev does look interesting.
Our app is a linguistic application. We want to search for sentences which have many
ngrams in common and rank them based on the score below. Similar to the TELLTALE
system (do a
I see. Are you looking for this:
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html
On the other hand, if n is not fixed, you still have a problem. As far
as I read this list it seems, that Lucene reads a dictionary (of terms)
into memory, and it also allocates
Hi,
I am trying to implement special character search.
If I do a search with query title:java\-perl then documents with title
java-perl as well as java+perl comes up. While first result is desirable the
second one is not.
I want to know what is going wrong here ?
Also, I am using
Hi!
I have lucene-1.3-rc1 and jdk1.3.1.
What to change in a demonstration example to carry out
search in html files with coding Cp1251?
Thanks,
Vladimir.
---
Professional hosting for everyone - http://www.host.ru
-
To
hi,
When i run the web demo i get an error that says
ERROR opening the Index - contact sysadmin!
While parsing query: /opt/lucene/index not a directory
i do not have the permission to modify opt so have not created an index
directory in it.Thus i do not use the default as given
I have seen some interesting work done on storing DNA sequence as a set of common
patterns with unique sequence between them. If one uses an analyzer to break sequence
into its set of patterns and unique sequence then Lucene could be used to search for
exact pattern matches. I know of only one
Exact matches are not ideal for DNA applications, I guess. I am not a
DNA expert, but those guys often need a feature that is termed
``fuzzy''[*] in Lucene. They need Levenstein's and Hamming's metrics,
and I think that Lucene has many drawbacks which disallow effective
implementations. On the
Try to chang permisssion 777 for index directory.
= Original Message From Lucene Users List
[EMAIL PROTECTED] =
hi,
When i run the web demo i get an error that says
ERROR opening the Index - contact sysadmin!
While parsing query: /opt/lucene/index not a directory
i do not
The method I mention was based on using lempel-ziv (I expect my spelling is way off on
this) algorithms used in lz compression. It relied only on exact matches of short
stretches of DNA separated by non-matching sequence. The idea was to find stretches of
sequence that had patterns in common,
Hello,
does anyone know of good stopword lists for use with Lucene? I'm
interested in English and German lists.
The default lists aren't very complete, for example the English list
doesn't contain words like every, because or until and the German
list misses dem and des (definite articles).
Ulrich Mayring wrote:
does anyone know of good stopword lists for use with Lucene? I'm
interested in English and German lists.
The Snowball project has good stop lists.
See:
http://snowball.tartarus.org/
http://snowball.tartarus.org/english/stop.txt
There is a much more complete list of Englihs stop words included in
the Lucene article (the intro one) on Onjava.com.
I can't help you with German stop words.
Otis
--- Ulrich Mayring [EMAIL PROTECTED] wrote:
Hello,
does anyone know of good stopword lists for use with Lucene? I'm
Doug Cutting wrote:
Snowball stemmers are pre-packaged for use with Lucene at:
http://jakarta.apache.org/lucene/docs/lucene-sandbox/snowball/
These look interesting. Am I right in assuming that in order to use
these stemmers, I have to write an Analyzer and in its tokenStream
method I return
I found a some handy tools in the org.apache.lucene.analysis.de package
using the WordListLoader class you can load up your stop words in a verity
of ways including a line delimited text file thanks to Gerhard Schwarz.
Bryan LaPlante
- Original Message -
From: Ulrich Mayring [EMAIL
There is already an analyzer available in the sandbox. Take a look
here: http://jakarta.apache.org/lucene/docs/lucene-sandbox/snowball/
Sincerely,
Anthony Eden
Ulrich Mayring wrote:
Doug Cutting wrote:
Snowball stemmers are pre-packaged for use with Lucene at:
Ulrich Mayring wrote:
Hello,
does anyone know of good stopword lists for use with Lucene? I'm
interested in English and German lists.
What does mean ``good''? It depends on your corpus IMHO. The best way,
how one can get a ``good'' stop-list, is an analysis that's based on
idf. Thus, index
On Thursday 05 June 2003 14:12, Jim Hargrave wrote:
Our application is a string similarity searcher where the query is an input
string and we want to find all fuzzy variants of the input string in the
DB. The Score is basically dice's coefficient: 2C/Q+D, where C is the
number of terms
20 matches
Mail list logo