date:20030606

[PLAN]: SAXIndexer, indexing database via XML gateway

2003-06-06 Thread Che Dong

In current weblucene project including a SAX Based xml source indexer: http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/weblucene/weblucene/webapp/WEB-INF/src/com/chedong/weblucene/index/ It can parse xml data source like following example: ?xml version=1.0 encoding=GB2312? Table Record id=1

RE: java.lang.IllegalArgumentException: attempt to access a deleted document

2003-06-06 Thread Rob Outar

I added the following code: for (int i = 0; i numOfDocs; i++) { if ( !reader.isDeleted(i)) { doc = reader.document(i); docs[i] = doc.get(SearchEngineConstants.REPOSITORY_PATH); } } return docs;

String similarity search vs. typcial IR application...

2003-06-06 Thread Jim Hargrave

Our application is a string similarity searcher where the query is an input string and we want to find all fuzzy variants of the input string in the DB. The Score is basically dice's coefficient: 2C/Q+D, where C is the number of terms (n-grams) in common, Q is the number of unique query terms

Re: String similarity search vs. typcial IR application...

2003-06-06 Thread Jim Hargrave

Probably shouldn't have added that last bit. Our app isn't a DNA searcher. But DASG+Lev does look interesting. Our app is a linguistic application. We want to search for sentences which have many ngrams in common and rank them based on the score below. Similar to the TELLTALE system (do a

Re: String similarity search vs. typcial IR application...

2003-06-06 Thread Leo Galambos

I see. Are you looking for this: http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html On the other hand, if n is not fixed, you still have a problem. As far as I read this list it seems, that Lucene reads a dictionary (of terms) into memory, and it also allocates

Special Character Search

2003-06-06 Thread Ramrakhiani, Vikas

Hi, I am trying to implement special character search. If I do a search with query title:java\-perl then documents with title java-perl as well as java+perl comes up. While first result is desirable the second one is not. I want to know what is going wrong here ? Also, I am using

problems with search on Russian content

2003-06-06 Thread Vladimir

Hi! I have lucene-1.3-rc1 and jdk1.3.1. What to change in a demonstration example to carry out search in html files with coding Cp1251? Thanks, Vladimir. --- Professional hosting for everyone - http://www.host.ru - To

Trouble running web demo

2003-06-06 Thread psethi

hi, When i run the web demo i get an error that says ERROR opening the Index - contact sysadmin! While parsing query: /opt/lucene/index not a directory i do not have the permission to modify opt so have not created an index directory in it.Thus i do not use the default as given

RE: String similarity search vs. typcial IR application...

2003-06-06 Thread Frank Burough

I have seen some interesting work done on storing DNA sequence as a set of common patterns with unique sequence between them. If one uses an analyzer to break sequence into its set of patterns and unique sequence then Lucene could be used to search for exact pattern matches. I know of only one

Re: String similarity search vs. typcial IR application...

2003-06-06 Thread Leo Galambos

Exact matches are not ideal for DNA applications, I guess. I am not a DNA expert, but those guys often need a feature that is termed ``fuzzy''[*] in Lucene. They need Levenstein's and Hamming's metrics, and I think that Lucene has many drawbacks which disallow effective implementations. On the

RE: Trouble running web demo

2003-06-06 Thread xx28

Try to chang permisssion 777 for index directory. = Original Message From Lucene Users List [EMAIL PROTECTED] = hi, When i run the web demo i get an error that says ERROR opening the Index - contact sysadmin! While parsing query: /opt/lucene/index not a directory i do not

RE: String similarity search vs. typcial IR application...

2003-06-06 Thread Frank Burough

The method I mention was based on using lempel-ziv (I expect my spelling is way off on this) algorithms used in lz compression. It relied only on exact matches of short stretches of DNA separated by non-matching sequence. The idea was to find stretches of sequence that had patterns in common,

Where to get stopword lists?

2003-06-06 Thread Ulrich Mayring

Hello, does anyone know of good stopword lists for use with Lucene? I'm interested in English and German lists. The default lists aren't very complete, for example the English list doesn't contain words like every, because or until and the German list misses dem and des (definite articles).

Re: Where to get stopword lists?

2003-06-06 Thread Doug Cutting

Ulrich Mayring wrote: does anyone know of good stopword lists for use with Lucene? I'm interested in English and German lists. The Snowball project has good stop lists. See: http://snowball.tartarus.org/ http://snowball.tartarus.org/english/stop.txt

Re: Where to get stopword lists?

2003-06-06 Thread Otis Gospodnetic

There is a much more complete list of Englihs stop words included in the Lucene article (the intro one) on Onjava.com. I can't help you with German stop words. Otis --- Ulrich Mayring [EMAIL PROTECTED] wrote: Hello, does anyone know of good stopword lists for use with Lucene? I'm

Re: Where to get stopword lists?

2003-06-06 Thread Ulrich Mayring

Doug Cutting wrote: Snowball stemmers are pre-packaged for use with Lucene at: http://jakarta.apache.org/lucene/docs/lucene-sandbox/snowball/ These look interesting. Am I right in assuming that in order to use these stemmers, I have to write an Analyzer and in its tokenStream method I return

Re: Where to get stopword lists?

2003-06-06 Thread Bryan LaPlante

I found a some handy tools in the org.apache.lucene.analysis.de package using the WordListLoader class you can load up your stop words in a verity of ways including a line delimited text file thanks to Gerhard Schwarz. Bryan LaPlante - Original Message - From: Ulrich Mayring [EMAIL

Re: Where to get stopword lists?

2003-06-06 Thread Anthony Eden

There is already an analyzer available in the sandbox. Take a look here: http://jakarta.apache.org/lucene/docs/lucene-sandbox/snowball/ Sincerely, Anthony Eden Ulrich Mayring wrote: Doug Cutting wrote: Snowball stemmers are pre-packaged for use with Lucene at:

Re: Where to get stopword lists?

2003-06-06 Thread Leo Galambos

Ulrich Mayring wrote: Hello, does anyone know of good stopword lists for use with Lucene? I'm interested in English and German lists. What does mean ``good''? It depends on your corpus IMHO. The best way, how one can get a ``good'' stop-list, is an analysis that's based on idf. Thus, index

Re: String similarity search vs. typcial IR application...

2003-06-06 Thread Ype Kingma

On Thursday 05 June 2003 14:12, Jim Hargrave wrote: Our application is a string similarity searcher where the query is an input string and we want to find all fuzzy variants of the input string in the DB. The Score is basically dice's coefficient: 2C/Q+D, where C is the number of terms

[PLAN]: SAXIndexer, indexing database via XML gateway

RE: java.lang.IllegalArgumentException: attempt to access a deleted document

String similarity search vs. typcial IR application...

Re: String similarity search vs. typcial IR application...

Re: String similarity search vs. typcial IR application...

Special Character Search

problems with search on Russian content

Trouble running web demo

RE: String similarity search vs. typcial IR application...

Re: String similarity search vs. typcial IR application...

RE: Trouble running web demo

RE: String similarity search vs. typcial IR application...

Where to get stopword lists?

Re: Where to get stopword lists?

Re: Where to get stopword lists?

Re: Where to get stopword lists?

Re: Where to get stopword lists?

Re: Where to get stopword lists?

Re: Where to get stopword lists?

Re: String similarity search vs. typcial IR application...

20 matches

Site Navigation

Mail list logo

Footer information