Re: EMAIL ADDRESS: Tokenize (i.e. an EmailAnalyzer)

2006-07-28 Thread Hasan Diwan
Michael: On 7/28/06, Michael J. Prichard <[EMAIL PROTECTED]> wrote: Howdynot sure if anyone else wants this but here is my first attempt at writing an analyzer for an email address...modifications, updates, fixes welcome. Why reinvent the wheel? See http://java.sun.com/products/javamail/ja

Re: Scoring a document (count?)

2006-07-28 Thread Doron Cohen
Doron Cohen/Haifa/[EMAIL PROTECTED] wrote on 28/07/2006 00:18:47: > For the scoring approach - I don't see an easy way to get the > counts from the score of the results, although the TF (term > frequency in candidate docs) is known+used during document > scoring, and although it seems that the appl

About search performance

2006-07-28 Thread zhongyi yuan
Hi,How about implement multi-key search use lucene, for example use boolean search exceed 1000 clauses,it will affect the performance greatly. If use filter or custom sorter to select the result, because the result is extremely large in amount,so the performance is lower. Please give me some advic

Re: Filter updating

2006-07-28 Thread Erick Erickson
Oh, yeah, you're hearing a doubtful opinion because if this kind of thing isn't done exactly correctly, it'd be particularly hard to debug. Keeping things coordinated is hard ... Given that you add/remove docs, you really don't want to just modify the filter. Here's why All a filter is a bit

Re: Filter updating

2006-07-28 Thread Paul Waite
Erick wrote: > Well, I *suppose* you could get the bitset from the pre-existing filter, > copy it to the bitset for your new filter, and play with the bits at the > end. I'm not sure how you get rid of your original filter if you use > CachingWrapperFilter though. Ok, I'm hearing it's a d

Re: luceneweb example returning null hrefs

2006-07-28 Thread SEAN MCELROY
Thank you. The example application is now working as expected. Sean Chen Wu <[EMAIL PROTECTED]> wrote: Hi, Please change the "url" to "path" in the result JSP file. coz the field name that is indexed is called "path" rather than "url". Cheers, Chen >>> [EMAIL PROTECTED] 7/28/2006 5:49 P

Re: luceneweb example returning null hrefs

2006-07-28 Thread SEAN MCELROY
Thank you. The example application is now working as expected. Sean Chen Wu <[EMAIL PROTECTED]> wrote: Hi, Please change the "url" to "path" in the result JSP file. coz the field name that is indexed is called "path" rather than "url". Cheers, Chen >>> [EMAIL PROTECTED] 7/28/2006 5:49 P

EMAIL ADDRESS: Tokenize (i.e. an EmailAnalyzer)

2006-07-28 Thread Michael J. Prichard
Howdynot sure if anyone else wants this but here is my first attempt at writing an analyzer for an email address...modifications, updates, fixes welcome. -- EmailAnalyzer import java.io.Reader; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.analysis.Lower

Re: Search Numerical Field

2006-07-28 Thread Erick Erickson
I'd really advise getting a copy of Luke to inspect your index as a first step. I've been surprised a number of times by what really got in my index. You might also try using a WhitespaceAnalyzer instead of StandardAnalyzer, that's the most basic analyzer available. I'm not sure whether individua

Re: Indexing large sets of documents?

2006-07-28 Thread Rafael Rossini
Ok, there is a patch (http://issues.apache.org/jira/browse/LUCENE-532). This is what I saw. But I still have a question. I guess it´s better to ask this in the hadoop mailing list, anyway, the hadoop project implements a DFS and the whole MapReduce paradigm. Is it possible to do the indexing and

Re: Search Numerical Field

2006-07-28 Thread Doron Cohen
John john <[EMAIL PROTECTED]> wrote on 28/07/2006 06:36:19: > Hello, > > I tried to add a field like that > field = new Field("number", "1", Field.Store.YES,Field.Index.UN_TOKENIZED); > > so i should be indexed and to analyzed? my writer is > writer = new IndexWriter(INDEX_DIR, new StandardA

Re: Leading wildcard query

2006-07-28 Thread Erick Erickson
You could form a filter, using the WildCardTermEnum or RegExTermEnum and then use the filter with a ConstantScoreQuery. You lose relevancy, but relevancy is an ambiguous concept with wildcards anyway. Using the query parser with a leading wildcard, even if enabled, is almost sure to give you a "T

Re: Consult some information about adding index while searching

2006-07-28 Thread Doron Cohen
"hu andy" <[EMAIL PROTECTED]> wrote on 28/07/2006 01:28:14: > These codes are written in C#,. There is a C# version of Lucene 1.9, which I am not a C#'er so I might have misunderstood this code, still, here is my take; One general comment - the program sent is not self contained so it's hard to "

Search Numerical Field

2006-07-28 Thread John john
Hello, I tried to add a field like that field = new Field("number", "1", Field.Store.YES,Field.Index.UN_TOKENIZED); so i should be indexed and to analyzed? my writer is writer = new IndexWriter(INDEX_DIR, new StandardAnalyzer(), true); but according to the javadoc it should be alright

Re: luceneweb example returning null hrefs

2006-07-28 Thread Chen Wu
Hi, Please change the "url" to "path" in the result JSP file. coz the field name that is indexed is called "path" rather than "url". Cheers, Chen >>> [EMAIL PROTECTED] 7/28/2006 5:49 PM >>> Hello, I am trying to use the luceneweb application that is shipped with the lucene installation. I

Re: Leading wildcard query

2006-07-28 Thread Pravin Shinde
Thanx for reply Miles So, avoiding leading wildcard query was design decision for sake of efficiency. Thanx for information. On 7/28/06, Miles Barr <[EMAIL PROTECTED]> wrote: Pravin Shinde wrote: > I am trying to use Leading wildcard query, but I am not able to do it. > Any query with leading w

Re: Leading wildcard query

2006-07-28 Thread Miles Barr
Pravin Shinde wrote: I am trying to use Leading wildcard query, but I am not able to do it. Any query with leading wildcard is failing with lexical error. query = parser.parse( "*hi" ) JavaError: org.apache.lucene.queryParser.ParseException: Lexical error at line 1, column 1. Encountered: "*"

luceneweb example returning null hrefs

2006-07-28 Thread SEAN MCELROY
Hello, I am trying to use the luceneweb application that is shipped with the lucene installation. I have followed the installation instructions and the luceneweb application has been successfully deployed using Tomcat 5.5.9. However all the results returned point to http://localhost:8080/l

Leading wildcard query

2006-07-28 Thread Pravin Shinde
Hi, I am trying to use Leading wildcard query, but I am not able to do it. Any query with leading wildcard is failing with lexical error. query = parser.parse( "*hi" ) JavaError: org.apache.lucene.queryParser.ParseException: Lexical error at line 1, column 1. Encountered: "*" (42), after : ""

Re: Consult some information about adding index while searching

2006-07-28 Thread hu andy
These codes are written in C#,. There is a C# version of Lucene 1.9, which can be downloaded from http://www.dotlucene.net This implements the indexing . public void CreateIndex() { try { AddDirectory(directory); writer.Optimize();

Re: Consult some information about adding index while searching

2006-07-28 Thread Doron Cohen
> Yes, I have closed IndexWriter. But it doesn't work. This is strange... Can you post a small version of your code that can be executed to show the problem? - Doron - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional co

Re: Scoring a document (count?)

2006-07-28 Thread Doron Cohen
This task reminds me more of a count(*) sql query than a text search query. Assuming that using a text search engine is a pre requisite, I can think of two approaches - basing on Lucene scoring as suggested in the question, or a more simple approach (below). For the scoring approach - I don't see