Re: When does QueryParser creates PhraseQueries

2008-02-29 Thread duiduder
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Thanks a lot for your help Daniel, I have found a solution :) The 'token' field is public inside QueryParser, and inside 'token.image' you can read the origin String with apostrophe. Thus, I can differ between the two situations - and simply return

Re: SOC: Lulu, a Lua implementation of Lucene

2008-02-29 Thread Mathieu Lecarme
Petite Abeille a écrit : A proposal for a Lua entry for the "Google Summer of Code" '08: A Lua implementation of Lucene. For me, Lua is just a glue between C coded object, a super config file. Like used in lighttpd or WoW. Lulu will work on top of Lucy? Did I miss something? M. --

RE: how to all documents from

2008-02-29 Thread Shailendra Sharma
Create a match all docs query like following: MatchAllDocsQuery matchAllDocsQuery = new MatchAllDocsQuery(); And then search as you search for any other query - searcher.search(matchAllDocsQuery) - it returns hit Thanks, Shailendra -Original Message- From: sandyg [mailto:[

how to all documents from

2008-02-29 Thread sandyg
How to retreive all the documents from the index directory please some one help me yar -- View this message in context: http://www.nabble.com/how-to-all-documents-from-tp15756174p15756174.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --

Re: How do i get a text summary

2008-02-29 Thread Karl Wettin
h t skrev: Where is the introduction of below algorithm? Thanks. I can't recall where I picked it up, but something like this: Score terms by count and distribution. A term occuring 20 times in the same paragraph is not as important as a term occuring 20 times over 10 paragraphs. Similar ter

Re: SOC: Lulu, a Lua implementation of Lucene

2008-02-29 Thread Grant Ingersoll
On Feb 29, 2008, at 5:39 AM, Mathieu Lecarme wrote: Petite Abeille a écrit : A proposal for a Lua entry for the "Google Summer of Code" '08: A Lua implementation of Lucene. For me, Lua is just a glue between C coded object, a super config file. Like used in lighttpd or WoW. Lulu will work o

RE: how to all documents from

2008-02-29 Thread sandyg
Thanks for the immediate response Shailendra Sharma u had saved a lot of my time . Once again thanks for ur reply Shailendra Sharma wrote: > > Create a match all docs query like following: > MatchAllDocsQuery matchAllDocsQuery = new MatchAllDocsQuery(); > And then search as you search for

Re: SOC: Lulu, a Lua implementation of Lucene

2008-02-29 Thread Mathieu Lecarme
Grant Ingersoll a écrit : On Feb 29, 2008, at 5:39 AM, Mathieu Lecarme wrote: Petite Abeille a écrit : A proposal for a Lua entry for the "Google Summer of Code" '08: A Lua implementation of Lucene. For me, Lua is just a glue between C coded object, a super config file. Like used in lighttpd

MultiFieldQueryParser - BooleanClause.Occur

2008-02-29 Thread JensBurkhardt
Hey everybody, I read that it's possible to generate a query like: (title:term1 OR author:term1) AND (title:term2 OR author:term2) and so on. I also read that BooleanClause.Occur should help me handle this problem. But i have to admit that i totally don't understand how to use it. If someone ca

Re: MultiFieldQueryParser - BooleanClause.Occur

2008-02-29 Thread Donna L Gresh
I believe something like the following will do what you want: QueryParser parserTitle = new QueryParser("title", analyzer); QueryParser parserAuthor = new QueryParser("author", analyzer); BooleanQuery overallquery = new BooleanQuery(); BolleanQuery firstQuery = new BooleanQuery(); Query q1= pars

Re: Indexing source code files

2008-02-29 Thread Bill Au
There is an opensource project, OpenGrok, that uses Lucene for indexing and searching source code: http://opensolaris.org/os/project/opengrok/ It has Analyzers for different type of source files. It does link source code to requirements but you can take a look at the source code to see how it do

Re: MultiFieldQueryParser - BooleanClause.Occur

2008-02-29 Thread Paul Elschot
Op Friday 29 February 2008 18:04:47 schreef Donna L Gresh: > I believe something like the following will do what you want: > > QueryParser parserTitle = new QueryParser("title", analyzer); > QueryParser parserAuthor = new QueryParser("author", analyzer); > > BooleanQuery overallquery = new BooleanQ

Re: Alternate spelling suggestion (was [Resent] Document boosting based on .. semantics? )

2008-02-29 Thread Markus Fischer
Hi Mathieu Lecarme wrote: On a related topic, I'm also searching for a way to suggest alternate spelling of words to the user, when we found a word which is very less frequent used in the index or not in the index at all. I'm Austrian based, when I e.g. search for "retthich" (wrong spelled "re

Re: Alternate spelling suggestion (was [Resent] Document boosting based on .. semantics? )

2008-02-29 Thread Mathieu Lecarme
Hi Mathieu Lecarme wrote: On a related topic, I'm also searching for a way to suggest alternate spelling of words to the user, when we found a word which is very less frequent used in the index or not in the index at all. I'm Austrian based, when I e.g. search for "retthich" (wrong spel

Re: MultiFieldQueryParser - BooleanClause.Occur

2008-02-29 Thread Donna L Gresh
Paul- Thanks (that was one of my ulterior motives for answering the question; I figured if there was something inefficient or unnecessary about my approach, I'd hear about it :) ) Donna Gresh Paul Elschot <[EMAIL PROTECTED]> wrote on 02/29/2008 01:42:57 PM: > Op Friday 29 February 2008 18:04:

Re: SOC: Lulu, a Lua implementation of Lucene

2008-02-29 Thread Petite Abeille
On Feb 29, 2008, at 11:39 AM, Mathieu Lecarme wrote: For me, Lua is just a glue between C coded object, a super config file. Like used in lighttpd or WoW. This is a rather very narrow view of what one could do with Lua. For example, this wiki engine is written exclusively in Lua: http://al

Re: SOC: Lulu, a Lua implementation of Lucene

2008-02-29 Thread Petite Abeille
On Feb 29, 2008, at 1:09 PM, Grant Ingersoll wrote: That implies the Lucy actually is under development... Perhaps they will take up work on Lucy... Lulu has nothing to do with Lucy... goes back to something at high school or something... ---

Most efficient way to find related terms

2008-02-29 Thread Martin Bayly
I'm wondering what the most efficient approach is to finding all terms which are related to a specific term X. By related I mean terms which appear in a specific document field that also contains the target term X. e.g. Document has a keyword field, field1 that can contain multiple keywords

Re: SOC: Lulu, a Lua implementation of Lucene

2008-02-29 Thread Petite Abeille
On Feb 29, 2008, at 3:42 PM, Mathieu Lecarme wrote: In other hands, a Lucy with C for persistance and parsing, and Lua for filter and other fine configuration can be great. Who is that Lucy you keep talking about? :P Cheers, PA. ---

Re: SOC: Lulu, a Lua implementation of Lucene

2008-02-29 Thread Petite Abeille
On Feb 29, 2008, at 8:37 PM, Simon Willnauer wrote: or go to http://lucene.apache.org/lucy/ Looks rather, hmmm, inactive: http://svn.apache.org/viewvc/lucene/lucy/ Is there any working code anywhere? - To unsubscribe, e-m

Re: SOC: Lulu, a Lua implementation of Lucene

2008-02-29 Thread Simon Willnauer
goto www.google.com type Lucy lucene feel lucky or go to http://lucene.apache.org/lucy/ simon On Fri, Feb 29, 2008 at 8:24 PM, Petite Abeille <[EMAIL PROTECTED]> wrote: > > On Feb 29, 2008, at 3:42 PM, Mathieu Lecarme wrote: > > > In other hands, a Lucy with C for persistance and parsing, a

Re: SOC: Lulu, a Lua implementation of Lucene

2008-02-29 Thread Petite Abeille
On Feb 29, 2008, at 11:39 AM, Mathieu Lecarme wrote: For me, Lua is just a glue between C coded object, a super config file. Like used in lighttpd or WoW. Here is a an online demo of a wiki engine implemented purely in Lua: http://svr225.stepx.com:3388/a http://svr225.stepx.com:3388/nanoki

Lucene for Sentiment Analysis

2008-02-29 Thread Aaron Schon
Hello, I was interested to learn about using Lucene for text analytics work such as for Sentiment Analysis. Has anyone done work along these lines? if so, could you share your approach, experiences, accuracy levels obtained etc. Thanks, AS ___

Corrupted Indexes Under Lucene 2.3 (and 2.3.1)

2008-02-29 Thread Tyler V
After upgrading to Lucene 2.3 (and subsequently 2.3.1), our application has experienced sporadic index corruptions on our larger (and more frequently updated) indexes. These indexes experienced no corruptions under any prior version of Lucene (which we have been using for several years). The patte

Re: Corrupted Indexes Under Lucene 2.3 (and 2.3.1)

2008-02-29 Thread Michael McCandless
Not good! (I'm sorry). That first exception is worrisome. It's the root cause here. Can you describe your documents? That exception, if I'm reading it right, seems to imply that you have documents with 4762 fields. Is that right? Are you using multiple threads? Is it possible that yo

Re: How do i get a text summary

2008-02-29 Thread Bob Carpenter
Mathieu Lecarme wrote: [EMAIL PROTECTED] a écrit : And how could one create automatically such a summary? Here's a site with some pointers to the literature and some systems out there to do summarization: http://www.summarization.com/ This is actually whole-document or even multiple-docume

Re: Corrupted Indexes Under Lucene 2.3 (and 2.3.1)

2008-02-29 Thread Tyler V
Mike -- Thanks so much for the prompt reply. You are right, we are accessing these documents with multiple threads (and have always been). However, I am wondering if the increased indexing speed in 2.3 has revealed a hidden concurrency issue. I am going to add in some additional concurrency check

Re: SOC: Lulu, a Lua implementation of Lucene

2008-02-29 Thread Marvin Humphrey
On Feb 29, 2008, at 4:09 AM, Grant Ingersoll wrote: That implies the Lucy actually is under development... Perhaps they will take up work on Lucy... :) I haven't been feeding back my KinoSearch commits into the Lucy repository, true. Not much has changed status-wise since this:

RE: Lucene Search Performance

2008-02-29 Thread Andreas Guther
Just some comment and I understand that you cannot change your index: What we did is to organize our index based on creation date of entries. We limit our search to a given number of years starting from the current year. Organizing the index in that way allows us to take off outdated information.

Re: SOC: Lulu, a Lua implementation of Lucene

2008-02-29 Thread Marvin Humphrey
On Feb 29, 2008, at 11:22 AM, Petite Abeille wrote: Lulu is a meant to be a pure Lua implementation of Lucene. How fast is Lua's method dispatch, compared to Java's? That has a huge impact on performance, since *everything* is a method in Lucene -- down to writeByte(). There have been

Re: Corrupted Indexes Under Lucene 2.3 (and 2.3.1)

2008-02-29 Thread Yonik Seeley
On Fri, Feb 29, 2008 at 7:05 PM, Tyler V <[EMAIL PROTECTED]> wrote: > Mike -- Thanks so much for the prompt reply. > > You are right, we are accessing these documents with multiple threads > (and have always been). However, I am wondering if the increased > indexing speed in 2.3 has revealed a h

More IndexDeletionPolicy questions

2008-02-29 Thread Tim Brennan
Is there a direct way to ask an IndexReader what segment it is pointing at? That would make implementing custom deletion policies a LOT easier. It seems like it should be pretty simple -- keep a list of open IndexReaders, track what Segment files they're pointing to, and in onCommit don't delet

Re: Lucene for Sentiment Analysis

2008-02-29 Thread Srikant Jakilinki
Hi I remember doing it once a long time ago (Lucene 1.9) but could not go anywhere since term vectors and such were not easily accessible then. However, if text analytics is what you are after, perhaps you already know about LingPipe. I used it for named entity extraction (enhanced with some