date:20040914

Re: PorterStemfilter

2004-09-14 Thread Honey George

--- Tea Yu <[EMAIL PROTECTED]> wrote: > David, > > For me I don't want a search for "in print" gives > results from "in printer"? > I'll consider that over-stemmed elsecase. Here the "in" won't be considered as it is a stopword in most of the analyzers. I know it is in StandardAnalyzer. So searc

Re: PorterStemfilter

2004-09-14 Thread Tea Yu

David, For me I don't want a search for "in print" gives results from "in printer"? I'll consider that over-stemmed elsecase. I'm also not that satisfactory when "effective" is stemmed to "effect" by snowball recently Cheers Tea > Hi David > > I like KStem more than Porter / Snowball - but sti

Re: frequent terms - Re: combining open office spellchecker with Lucene

2004-09-14 Thread David Spencer

Doug Cutting wrote: David Spencer wrote: [1] The user enters a query like: recursize descent parser [2] The search code parses this and sees that the 1st word is not a term in the index, but the next 2 are. So it ignores the last 2 terms ("recursive" and "descent") and suggests alternatives t

Re: Similarity score computation documentation

2004-09-14 Thread Doug Cutting

Your analysis sounds correct. At base, a weight is a normalized tf*idf. So a document weight is: docTf * idf * docNorm and a query weight is: queryTf * idf * queryNorm where queryTf is always one. So the product of these is (docTf * idf * docNorm) * (idf * queryNorm), which indeed contains id

Hits.doc(x) and range queries

2004-09-14 Thread roy-lucene-user

Hi guys! I've posted previously that Hits.doc(x) was taking a long time. Turns out it has to do with a date range in our query. We usually do date ranges like this: Date:[(lucene date field) - (lucene date field)] Sometimes the begin date is "0" which is what we get from DateField.dateT

Re: frequent terms - Re: combining open office spellchecker with Lucene

2004-09-14 Thread Doug Cutting

David Spencer wrote: [1] The user enters a query like: recursize descent parser [2] The search code parses this and sees that the 1st word is not a term in the index, but the next 2 are. So it ignores the last 2 terms ("recursive" and "descent") and suggests alternatives to "recursize"...thu

Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

2004-09-14 Thread Doug Cutting

Andrzej Bialecki wrote: I was wondering about the way you build the n-gram queries. You basically don't care about their position in the input term. Originally I thought about using PhraseQuery with a slop - however, after checking the source of PhraseQuery I realized that this probably wouldn't

Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

2004-09-14 Thread David Spencer

Andrzej Bialecki wrote: David Spencer wrote: ...or prepare in advance a fast lookup index - split all existing terms to bi- or trigrams, create a separate lookup index, and then simply for each term ask a phrase query (phrase = all n-grams from an input term), with a slop > 0, to get similar existi

Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

2004-09-14 Thread Andrzej Bialecki

David Spencer wrote: ...or prepare in advance a fast lookup index - split all existing terms to bi- or trigrams, create a separate lookup index, and then simply for each term ask a phrase query (phrase = all n-grams from an input term), with a slop > 0, to get similar existing terms. This should be

Re: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

2004-09-14 Thread David Spencer

Tate Avery wrote: I get a NullPointerException shown (via Apache) when I try to access http://www.searchmorph.com/kat/spell.jsp How embarassing! Sorry! Fixed! T -Original Message- From: David Spencer [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 14, 2004 3:23 PM To: Lucene Users List

RE: NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

2004-09-14 Thread Tate Avery

I get a NullPointerException shown (via Apache) when I try to access http://www.searchmorph.com/kat/spell.jsp T -Original Message- From: David Spencer [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 14, 2004 3:23 PM To: Lucene Users List Subject: NGramSpeller contribution -- Re: com

NGramSpeller contribution -- Re: combining open office spellchecker with Lucene

2004-09-14 Thread David Spencer

Andrzej Bialecki wrote: David Spencer wrote: I can/should send the code out. The logic is that for any terms in a query that have zero matches, go thru all the terms(!) and calculate the Levenshtein string distance, and return the best matches. A more intelligent way of doing this is to instead

Re: PorterStemfilter

2004-09-14 Thread Pete Lewis

Hi David I like KStem more than Porter / Snowball - but still has limitations although performs better as it has a dictionary to augment the rules. Note that KStem will also treat "print" and "printer" as two distinct terms, probably treating it as verb and noun respectively. Cheers Pete Lewis

Re: PorterStemfilter

2004-09-14 Thread Pete Lewis

Hi George There are lots of problems with Port stemmers, not great for English but get worse for other languages. If you look at: http://snowball.tartarus.org/demo.php You'll see the Snowball demo - this is basically another instance of Porter. If you enter "print" and "printer" and submit the

Re: PorterStemfilter

2004-09-14 Thread David Spencer

Honey George wrote: Hi, This might be more of a questing related to the PorterStemmer algorithm rather than with lucene, but if anyone has the knowledge please share. You might want to also try the Snowball stemmer: http://jakarta.apache.org/lucene/docs/lucene-sandbox/snowball/ And KStem: http://c

RE: Help for text based indexing

2004-09-14 Thread Honey George

You could recieve the group name as an input from the user and construct a BooleanQuery internally which will qyery only the group field based on the user input. So the user need not append the group name with the search string. Thanks, George --- mahaveer jain <[EMAIL PROTECTED]> wrote: > If

PorterStemfilter

2004-09-14 Thread Honey George

Hi, This might be more of a questing related to the PorterStemmer algorithm rather than with lucene, but if anyone has the knowledge please share. I am using the PorterStemFilter that some with lucene and it turns out that searching for the word 'printer' does not return a document containing the

RE: Help for text based indexing

2004-09-14 Thread mahaveer jain

If i have rightly understood, you mean to say that the query for search has to be "Group1" AND "Hello" (if hello is what I want to search ?) Cocula Remi <[EMAIL PROTECTED]> wrote: A keyword is not tokenized, that's why you wont be able to search over a part of it. You'd rather use a Text fie

RE: Help for text based indexing

2004-09-14 Thread Cocula Remi

A keyword is not tokenized, that's why you wont be able to search over a part of it. You'd rather use a Text fied. About creating a special field : IndexWriter Ir = File f = Document doc = new Document(); if (f.toString.startsWith("C:\tomcat\webapps\Root\Group1") {

RE: ANT +BUILD + LUCENE

2004-09-14 Thread Gerard Sychay

Hi, I've used the following Ant targets for build scripts that required platform dependent work. In the example here, the property "catalina.home" is set according to what platform we're running on. You can adapt as needed.

RE: Help for text based indexing

2004-09-14 Thread mahaveer jain

Well in my case the path is KeyWord. I had tried that earlier and it does not seems to work in a single index file. Can you explain a bit more about adding group1 and group2 ? Cocula Remi <[EMAIL PROTECTED]> wrote: Well you could add a field to each of your Documents whose value would be eith

RE: Help for text based indexing

2004-09-14 Thread Cocula Remi

Well you could add a field to each of your Documents whose value would be either "group1" or "group2". Or you could use the path to your files ... -Message d'origine- De : mahaveer jain [mailto:[EMAIL PROTECTED] Envoyé : mardi 14 septembre 2004 17:49 À : [EMAIL PROTECTED] Objet : RE: He

RE: Help for text based indexing

2004-09-14 Thread mahaveer jain

I am clear with looping recursively to index all the file under Root folder. But the problem is if I want to search only in group1 or group2.Is that possible to search only in one of the group folder ? Cocula Remi <[EMAIL PROTECTED]> wrote: You just have to loop recurssively over the C:\tomcat\w

RE: Help for text based indexing

2004-09-14 Thread Cocula Remi

You just have to loop recurssively over the C:\tomcat\webapps\Root tree to create your index. Yes you can index databases; you will just have to write a mechanism that is able to create org.apache.lucene.document.Document from database. For instance : - connect JDBC - run a query for obtaining a

Help for text based indexing

2004-09-14 Thread mahaveer jain

Hi I have implemented Text based search using lucene. I was wonderful playing around with it. Now I want to enchance the application. I have a Root folder, under that I have many other folder, that are group specific, say (group1, group2, .. so on). The Root folder is in C:\tomcat\webapps\Roo

Re: ANT +BUILD + LUCENE

2004-09-14 Thread Erik Hatcher

Karthik, You are still being a bit cryptic and making it hard for me to comprehend what the problem is, but here are some general pieces of advice with Ant related to what I think you are doing: * There is no need to use conditional logic to have a different set of properties for different oper

Re: Search PharseQuery

2004-09-14 Thread sergiu gordea

Natarajan.T wrote: Ok you are correct ... Suppose if I type "what java" then how can I handle... You don't have to handle it, lucene does it. If you don't like how lucene handles it then you may extend the functionality. If you use the same analyzer for indexing and searching then you will fi

RE: ANT +BUILD + LUCENE

2004-09-14 Thread Karthik N S

Hi Erik 1) Using Ant and Build.xml I want to run the org.apache.lucene.demo.IndexFiles to create an Indexfolder 2) Problem is The same Build.xml is to be used Across the O/s for creating Index 3) The path of Lucene1-4-final.jar are in respective directories for the O/s...

RE: Search PharseQuery

2004-09-14 Thread Natarajan.T

Ok you are correct ... Suppose if I type "what java" then how can I handle... Regards, Natarajan. -Original Message- From: sergiu gordea [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 14, 2004 7:38 PM To: Lucene Users List Subject: Re: Search PharseQuery Natarajan.T wrote: >Hi, >

Re: Addition to contributions page

2004-09-14 Thread Erik Hatcher

Perhaps we should @deprecate the contributions page like we did with the Powered By page, and migrate it to the wiki? Erik On Sep 13, 2004, at 6:50 PM, Daniel Naber wrote: On Friday 10 September 2004 15:48, Chas Emerick wrote: PDFTextStream should be added to the 'Document Converters' sec

Re: ANT +BUILD + LUCENE

2004-09-14 Thread Erik Hatcher

I'm not following what you want very clearly, but there is an task in Lucene's Sandbox. Please post what you are trying, and I'd be happy to help once I see the details. Erik On Sep 12, 2004, at 4:44 PM, Karthik N S wrote: Hi Guys Apologies.. The Task for me is to build the Ind

Re: Search PharseQuery

2004-09-14 Thread sergiu gordea

Natarajan.T wrote: Hi, Thanks for your response. For example search keyword is like below... Language "what is java" Token 1: language Token 2: what is java(like google) Regards, Natarajan. Lucene works exaclty as you describe above with a simple correction ... The analyzer has a list of s

Indexing object graphs

2004-09-14 Thread Erik Hatcher

Interesting! http://kasparov.skife.org/blog/2004/09/13#lucene-graphs - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

RE: Search PharseQuery

2004-09-14 Thread Natarajan.T

Hi, Thanks for your response. For example search keyword is like below... Language "what is java" Token 1: language Token 2: what is java(like google) Regards, Natarajan. -Original Message- From: Aad Nales [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 14, 2004 5:19 PM

Document Relevance

2004-09-14 Thread ebrahim . faisal

Hi I am new to Lucene. Could anyone tell me how to set the RELEVANCE in which the search results are displayed. Any online Examples available on this topic I welcome ur suggestions Thanx & Regards E.Faisal Important Email Information :- The information in this email is confidential and may

RE: Search PharseQuery

2004-09-14 Thread Honey George

--- "Natarajan.T" <[EMAIL PROTECTED]> wrote: > I am trying to extend the current behavior. You might have already seen a mail from Cocula Remi on this. Please provide more details of the problem for specific comments - basically the problem you are facing and/or what behavior you are trying to ex

RE: Search PharseQuery

2004-09-14 Thread Natarajan.T

I am trying to extend the current behavior. Regards, Natarajan. -Original Message- From: Honey George [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 14, 2004 5:09 PM To: Lucene Users List Subject: Re: Search PharseQuery --- "Natarajan.T" <[EMAIL PROTECTED]> wrote: > Hi All, > >

RE: Search PharseQuery

2004-09-14 Thread Aad Nales

Hi, Not sure if this is what you need but I created a lastname filter which in Dutch means potential double last names like:"van der Vaart". In order to process these I created a finite state machine that queried these last names. Since I only needed the filter on 'index' time and I never use it f

Re: Search PharseQuery

2004-09-14 Thread Honey George

--- "Natarajan.T" <[EMAIL PROTECTED]> wrote: > Hi All, > > > > How do I implement PharseQuery API? What exactly you mean by implement? Are you trying to extend the current behavior or only trying find out the usage? Thanks, George

RE: Search PharseQuery

2004-09-14 Thread Natarajan.T

Hi Serigu, String queryString = "\"waht is java\""; Query q = QueryParser.parse(queryString, "field", new StandardAnalyzer()); System.out.println(q.toString()); This is enough for starting consult Lucene API for more information Are you tested the above query? This search keyword is not

Re: Search PharseQuery

2004-09-14 Thread sergiu gordea

String queryString = "\"waht is java\""; Query q = QueryParser.parse(queryString, "field", new StandardAnalyzer()); System.out.println(q.toString()); This is enough for starting consult Lucene API for more information Sergiu Natarajan.T wrote: Hi, Thanks for your mail, that link says only th

RE: Search PharseQuery

2004-09-14 Thread Natarajan.T

Hi, Thanks for your mail, that link says only theoretically but I need some sample Regards, Natarajan. -Original Message- From: Cocula Remi [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 14, 2004 2:58 PM To: Lucene Users List Subject: RE: Search PharseQuery Use QueryParser.

RE: Search PharseQuery

2004-09-14 Thread Cocula Remi

Use QueryParser. please take a look at http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html It's pretty clear. -Message d'origine- De : Natarajan.T [mailto:[EMAIL PROTECTED] Envoyé : mardi 14 septembre 2004 11:26 À : 'Lucene Users List' Objet : Search PharseQuery Hi All,

Search PharseQuery

2004-09-14 Thread Natarajan.T

Hi All, How do I implement PharseQuery API? Pls send me some sample code.( How can I handle "java is platform" as single word? ) Regards, Natarajan.

Re: OutOfMemory example

2004-09-14 Thread Daniel Naber

On Tuesday 14 September 2004 08:32, JiÅÃ Kuhn wrote: > The error is thrown in exactly the same point as before. This morning I > downloaded Lucene from CVS, now the jar is lucene-1.5-rc1-dev.jar, JVM > is 1.4.2_05-b04, both Linux and Windows. Now I can reproduce the problem. I first tried running

45 matches

Mail list logo