Re: product based term combination for BooleanQuery?

2007-07-03 Thread Chris Hostetter
(side note: if you are going to try and obfuscate your field names when sending explain output so we don't know you are using wikipedia data (not that we care), please at least be consistent about it so the final explanations actual make sense -- it will save everyone a lot of confusion and help u

Re: Too Many Open files Exception

2007-07-03 Thread Chris Hostetter
: I am getting a "Too Many Open Files" Exception. I've read the FAQ about : lowering the merge factor (currently set to 25), issuing a ulimit -n : , etc... but I am still getting the "Too Many Open Files" : Exception (yes... I'm making sure I close all writer/searchers/reader : and I only have one

Re: product based term combination for BooleanQuery?

2007-07-03 Thread Tim Sturge
Here's the explain output I currently get for "George Bush" "George W Bush", "John Kerry" "John Denver" and "John Bush". (there are others in between, but they follow very much the same pattern; an enormous score for one of "John" or "Bush" and a very small score for the other being better than

Too Many Open files Exception

2007-07-03 Thread Van Nguyen
I am getting a "Too Many Open Files" Exception. I've read the FAQ about lowering the merge factor (currently set to 25), issuing a ulimit -n , etc... but I am still getting the "Too Many Open Files" Exception (yes... I'm making sure I close all writer/searchers/reader and I only have one open at a

Re: Modify search results

2007-07-03 Thread Chris Hostetter
: Question: how do I go about manipulating the search results? Is it possible : to "intercept" the listing of HTML pages returned by the Lucene search : function and modify the report it sends to the screen. : : Can this be as simple as adding a line to the Lucene Java code so that : instead of r

Re: product based term combination for BooleanQuery?

2007-07-03 Thread Chris Hostetter
: "Lucene Download" as a query. I want something that strongly references : "Lucene" (in the title) and strongly references "Download" but "Download : Lucene" or "Lucene Project Download" are better than some page that : happens to contain the exact phrase. : : Other examples are "camera review" o

Re: IndexWriter.updateDocument(Term, Document) not removing old Document?

2007-07-03 Thread Joe Attardi
Hi Chris, That did it! Thanks for the help. I should have read the javadocs for Field.Index more closely! Thanks to everyone else for their input too. -- Joe Attardi [EMAIL PROTECTED] http://thinksincode.blogspot.com/ On 7/3/07, Chris Hostetter <[EMAIL PROTECTED]> wrote: It sounds like your

Auto Slop

2007-07-03 Thread Walt Stoneburner
I've solved the problem, thanks to tips from Mark Miller and Ard Schrijvers, and am simply recording it so that someone else walking through the archives might get some benefit. A while ago I had been working on a case-sensitive version of Lucene, where with a prefix symbol, it was possible to in

Re: IndexWriter.updateDocument(Term, Document) not removing old Document?

2007-07-03 Thread Chris Hostetter
It sounds like your problem is that your id field is analyzed and as a result contains more then one token per document ... both the deleteDocument and updateDocument methods that take in a Term only remove documents that have that exact Term in them. You need to add your documents with the "id"

Re: IndexWriter.updateDocument(Term, Document) not removing old Document?

2007-07-03 Thread Joe Attardi
Update on my problem.. it looks like I am having the same problem with deleteDocuments that I am with updateDocument. Not sure why it's not working, though. I'm using the StandardAnalyzer, as I mentioned. Are there any other things that I might want to check that would keep this from working? Tha

Re: product based term combination for BooleanQuery?

2007-07-03 Thread Grant Ingersoll
When you do an explain on these results, what are all the factors that contribute to the score? Could you increase the coord() factor in a custom Similarity implementation, to give a bigger boost to documents that have more matching terms? The point of coord is to give a little bump to tho

Re: Lucene Wiki Editing Guidelines

2007-07-03 Thread Grant Ingersoll
Sounds like a welcome addition! I don't know of any guidelines other than the general community ones about behaving nicely, don't spam, etc. :-) On Jul 3, 2007, at 2:24 PM, Renaud Waldura wrote: Regarding the Lucene Wiki, is there an editing policy or should I feel free to change stuff a

Re: product based term combination for BooleanQuery?

2007-07-03 Thread Tim Sturge
That's true, but it's not clear that I want phrase matches. Consider for example: "Lucene Download" as a query. I want something that strongly references "Lucene" (in the title) and strongly references "Download" but "Download Lucene" or "Lucene Project Download" are better than some page that

Re: product based term combination for BooleanQuery?

2007-07-03 Thread Jason Pump
You're not using any type of phrase search. Try -> ( (title:"John Bush"^4.0) OR (body:"John Bush") ) AND ( (title:John^4.0 body:John) AND (title:Bush^4.0 body:Bush) ) or maybe ( (title:"John Bush"~4^4.0) OR (body:"John Bush"~4) ) AND ( (title:John^4.0 body:John) AND (title:Bush^4.0 body:Bush

Re: IndexWriter.updateDocument(Term, Document) not removing old Document?

2007-07-03 Thread Joe Attardi
Hi Erick, I'm guessing that your problem is what gets indexed. What analyzer are you using when indexing? One that breaks words apart on, say, periods? I am using the StandardAnalyzer. When I do a test query using Luke, it returns the object I'm looking for. The query I use is: id:"com.mycomp

Re: IndexWriter.updateDocument(Term, Document) not removing old Document?

2007-07-03 Thread Erick Erickson
I'm guessing that your problem is what gets indexed. What analyzer are you using when indexing? One that breaks words apart on, say, periods? The way to check this would be to get a copy of Luke and examine your index (or part thereof). Google (lucene luke). It'll help greatly. What is your evid

Lucene Wiki Editing Guidelines

2007-07-03 Thread Renaud Waldura
Regarding the Lucene Wiki, is there an editing policy or should I feel free to change stuff as I see fit? E.g. I've added a page LuceneCaveats, and now I want to edit http://wiki.apache.org/lucene-java/ConceptsAndDefinitions and add a "Core Classes" section, and refactor that page. --Renaud

Re: product based term combination for BooleanQuery?

2007-07-03 Thread Mike Klaas
Try out: http://issues.apache.org/jira/browse/LUCENE-850 If this is useful to you, be sure to add a comment to the issue. -Mike On 3-Jul-07, at 10:51 AM, Tim Sturge wrote: I'm following myself up here to ask if anyone has experience or code with a BooleanQuery that weights the terms it encou

IndexWriter.updateDocument(Term, Document) not removing old Document?

2007-07-03 Thread Joe Attardi
Hi everybody, First-time poster here. I've got a search index that I am using to index live Java objects. I create a Document with the appropriate fields and index them. No problem. I am indexing objects of different types, so I have an "id" field in each Document which consists of the object's c

Retrieve nearest token based off location in original Text

2007-07-03 Thread John Paul Sondag
Hi, I was wondering if it's possible to get the token offset based of the position in the original text. My problem is I'm working on my own "Snippet Generator" and I'm giving a token index (call it t) as input and need to make a snippet of the original text. I want the Snippet to be some numbe

product based term combination for BooleanQuery?

2007-07-03 Thread Tim Sturge
I'm following myself up here to ask if anyone has experience or code with a BooleanQuery that weights the terms it encounters on a product basis rather than a sum basis. This would effectively compute the geometric mean of the term score (rather than the arithmetic mean) and would give me more

Re: Pagination

2007-07-03 Thread mark harwood
>>and "n" searches to get the Documents, ??? Where does the "n" come in? searcher.doc(id) is not a search. It is a call to IndexReader.document() to retrieve a specific document. Try run it. It shouldn't be slow. - Original Message From: Alixandre Santana <[EMAIL PROTECTED]> To: java-

Re: Pagination

2007-07-03 Thread Alixandre Santana
Mark, Thanks for the code. Well..I´m doing the same thing you are: Retrieve some Doc IDs and then use the code - Document doc=searcher.doc(sd[i].doc) - to get the Document itself. But in this case, we are doing a search to get the IDs, and "n" searches to get the Documents, which is not a good

Re: Pagination

2007-07-03 Thread mcmoisei
It looks that we may have different cases. What I do I index my items prior to insert them into the database. When I do a search I get the ids that have the best match and then lookup the items from the database. So far worked just fine. I have 5000 rows of items and I think will still work fi

Re: Pagination

2007-07-03 Thread mark harwood
>>I get the ids then I do look the items in the database using select item.* >>from item where item.id in ( ids ) Hmm. That's likely to confuse the already confused :) The ids referred to so far are Lucene internal document ids and are typically only meaningful to Lucene during a single IndexRea

Re: Lucene index in memcache

2007-07-03 Thread Chris Hostetter
: I have done some profiling , and it seems the response is slow when there : are long queries(more than 5-6 words per query). : The way I have implemented is : I pass in the search query and lucene : returns the total number of hits, along with ids . I then fetch objects : for only those ids , as

Re: Lucene 2.2, NFS, Lock obtain timed out

2007-07-03 Thread Michael McCandless
"Patrick Kimber" <[EMAIL PROTECTED]> wrote: > I have been running the test for over an hour without any problem. > The index writer log file is getting rather large so I cannot leave > the test running overnight. I will run the test again tomorrow > morning and let you know how it goes. Ahhh, th

RE: Pagination

2007-07-03 Thread mcmoisei
I get the ids then I do look the items in the database using select item.* from item where item.id in ( ids ) -- Original message -- From: "Lee Li Bin" <[EMAIL PROTECTED]> > Hi, > > Thanks Mark! > > I do have the same question as Alixandre. How do I get the con

Re: Lucene 2.2, NFS, Lock obtain timed out

2007-07-03 Thread Patrick Kimber
Hi Michael I have been running the test for over an hour without any problem. The index writer log file is getting rather large so I cannot leave the test running overnight. I will run the test again tomorrow morning and let you know how it goes. Thanks again... Patrick On 03/07/07, Patrick K

Re: Lucene 2.2, NFS, Lock obtain timed out

2007-07-03 Thread Patrick Kimber
Hi Michael I am setting up the test with the "take2" jar and will let you know the results as soon as I have them. Thanks for your help Patrick On 03/07/07, Michael McCandless <[EMAIL PROTECTED]> wrote: OK I opened issue LUCENE-948, and attached a patch & new 2.2.0 JAR. Please make sure you u

Re: Lucene 2.2, NFS, Lock obtain timed out

2007-07-03 Thread Michael McCandless
OK I opened issue LUCENE-948, and attached a patch & new 2.2.0 JAR. Please make sure you use the "take2" versions (they have added instrumentation to help us debug): https://issues.apache.org/jira/browse/LUCENE-948 Patrick, could you please test the above "take2" JAR? Could you also call Ind

Re: Lucene 2.2, NFS, Lock obtain timed out

2007-07-03 Thread Patrick Kimber
Hi Michael I am really pleased we have a potential fix. I will look out for the patch. Thanks for your help. Patrick On 03/07/07, Michael McCandless <[EMAIL PROTECTED]> wrote: "Patrick Kimber" <[EMAIL PROTECTED]> wrote: > I am using the NativeFSLockFactory. I was hoping this would have >

Re: Lucene 2.2, NFS, Lock obtain timed out

2007-07-03 Thread Michael McCandless
"Patrick Kimber" <[EMAIL PROTECTED]> wrote: > I am using the NativeFSLockFactory. I was hoping this would have > stopped these errors. I believe this is not a locking issue and NativeFSLockFactory should be working correctly over NFS. > Here is the whole of the stack trace: > > Caused by: java

Re: Lucene 2.2, NFS, Lock obtain timed out

2007-07-03 Thread Neeraj Gupta
I think you should get " NFS, Lock obtain timed out" Exception (that you mentioned in subject line) , instead of "java.io.FileNotFoundException:". Because if one server is holding lock on the directory then other server will wait till default LockTime Out and will throw Time out Exception aft

Re: Lucene 2.2, NFS, Lock obtain timed out

2007-07-03 Thread Patrick Kimber
Hi I am using the NativeFSLockFactory. I was hoping this would have stopped these errors. Patrick On 03/07/07, Neeraj Gupta <[EMAIL PROTECTED]> wrote: Hi this is the case where index create by one server is updated by other server, results into index corruption. This exception occuring while

Re: Lucene 2.2, NFS, Lock obtain timed out

2007-07-03 Thread Neeraj Gupta
Hi this is the case where index create by one server is updated by other server, results into index corruption. This exception occuring while creating instance of Index writer because at the time of index writer instance creation it checks if index exists or not, if you are not creating a new

Re: Lucene 2.2, NFS, Lock obtain timed out

2007-07-03 Thread Patrick Kimber
Hi I have added more logging to my test application. I have two servers writing to a shared Lucene index on an NFS partition... Here is the logging from one server... [10:49:18] [DEBUG] LuceneIndexAccessor closing cached writer [10:49:18] [DEBUG] ExpirationTimeDeletionPolicy onCommit() delete

Re: highlighting phrase query

2007-07-03 Thread Mark Miller
has any one used Lucene-794? how stable it it. is it widely used in industry. I have used it extensively and I would say it is extremely stable. As I said, much of the code from it is literally the same compiled code from Contrib Highlighter (It is really just a new Scorer class for the