Re: Newbie: Luke and fields

2009-09-04 Thread Ian Vink
Case? Hmm. I thought it was case intensive. I will re-index and revert all to lower case and see. The language is stored as "English" not "english" ian P.S. Here's what I built with a basic understanding of Lucene. http://BahaiResearch.com it's open source, ad-free. It allows people in 20 langu

Re: Newbie: Luke and fields

2009-09-04 Thread Erick Erickson
Hm. Let's see the queries, and query.toString() may give you some clues. I *suspect* that you really didn't index language. Did you, perhaps, not re-index all your docs? Or use an analyzer that didn't fold case when searching but did when searching (or vice-versa)? It's *possible* that you've

Re: Extending Sort/FieldCache

2009-09-04 Thread Erick Erickson
No, not sloth. Making use of the fine work that others have done in order to help get your product out the door faster/cheaper As in "There's no virtue in re-inventing the wheel. No matter how productive it feels" . Best Erick On Fri, Sep 4, 2009 at 12:19 PM, Shai Erera wrote: > Thanks Mi

Re: how can I merge indexes without deleting the original index?

2009-09-04 Thread Erick Erickson
OK, that makes sense. Note that Shashi's solution and mine are essentially identical. Shashi's code snippet, I think, pre-supposes that source and dest index directories are different. So copying your A/D index off some place else is functionally equivalent. I'm not sure what would happen if you h

Newbie: Luke and fields

2009-09-04 Thread Ian Vink
I have created an index and each document has a contents field and a language field. contents has the flags: Indexed Tokenized Stored Vector language has the flags: Indexed Stored In luke I can search contents fine, but when I try to search the field language, I never ever get results. Every doc

Re: Best way to create own version of StandardTokenizer ?

2009-09-04 Thread Robert Muir
Paul, no problem. it is not fully functional right now (incomplete, bugs, etc). patch is kinda for reading only :) but if you have other similar issues on your project, feel free to post links to them on that jira ticket. this way we can look at what problems you have and if appropriate maybe they

Re: Best way to create own version of StandardTokenizer ?

2009-09-04 Thread Paul Taylor
Robert Muir wrote: Paul, thanks for the examples. In my opinion, only one of these is a tokenizer problem :) none of these will be affected by a unicode upgrade. Thanks for taking the time to write that response, it will take me a bit of time to understand all this because I've ever used Lucene

Re: Best way to create own version of StandardTokenizer ?

2009-09-04 Thread Robert Muir
Paul, thanks for the examples. In my opinion, only one of these is a tokenizer problem :) none of these will be affected by a unicode upgrade. > Things like: > > http://bugs.musicbrainz.org/ticket/1006 in this case, it appears you want to do script conversion, and it appears from the ticket you a

Re: Best way to create own version of StandardTokenizer ?

2009-09-04 Thread Paul Taylor
Robert Muir wrote: On Fri, Sep 4, 2009 at 11:18 AM, Paul Taylor wrote: I submitted this https://issues.apache.org/jira/browse/LUCENE-1787 patch to StandardTokenizerImpl, understandably it hasn't been incoroprated into Lucene (yet) but I need it for the project Im working on. So would you reco

Re: Extending Sort/FieldCache

2009-09-04 Thread Shai Erera
Thanks Mike. I did not phrase well my understanding of Cache reload. I didn't mean literally as part of the reopen, but *because* of the reopen. Because FieldCache is tied to an IndexReader instance, after reopen it gets refreshed. If I keep my own Cache, I'll need to code that logic, and I prefer

Re: Best way to create own version of StandardTokenizer ?

2009-09-04 Thread Robert Muir
On Fri, Sep 4, 2009 at 11:18 AM, Paul Taylor wrote: > I submitted this https://issues.apache.org/jira/browse/LUCENE-1787 patch to > StandardTokenizerImpl, understandably it hasn't been incoroprated into > Lucene (yet) but I need it for the project Im working on. So would you > recommend keeping the

Re: how can I merge indexes without deleting the original index?

2009-09-04 Thread Francisco Borges
Hello, Many thanks for the sample. I've already written a proof of concept with it. Cheers, Francisco On Sep 4, 2009 3:53 PM, "Shashi Kant" wrote: Here is some code to help you along. This should leave the source indices intact and merges them into a destination. //the index t

Best way to create own version of StandardTokenizer ?

2009-09-04 Thread Paul Taylor
I submitted this https://issues.apache.org/jira/browse/LUCENE-1787 patch to StandardTokenizerImpl, understandably it hasn't been incoroprated into Lucene (yet) but I need it for the project Im working on. So would you recommend keeping the same class name, and just putting in the classpath befo

Re: how can I merge indexes without deleting the original index?

2009-09-04 Thread Francisco Borges
Hello Erick, On Fri, Sep 4, 2009 at 3:26 PM, Erick Erickson wrote: > Sure, copy them first to some other directory > We might have something more helpful if you'd tell us *why* you want to do > this? What problem are you trying to solve? Because having two copies of > your index in the same d

Re: how can I merge indexes without deleting the original index?

2009-09-04 Thread Shashi Kant
Here is some code to help you along. This should leave the source indices intact and merges them into a destination. //the index to hold our merged index IndexWriter iw = new IndexWriter(dest, new StandardAnalyzer(), true); string[] sourceIndices;

Re: how can I merge indexes without deleting the original index?

2009-09-04 Thread Erick Erickson
Sure, copy them first to some other directory We might have something more helpful if you'd tell us *why* you want to do this? What problem are you trying to solve? Because having two copies of your index in the same directory doesn't sound very save. Best Erick On Fri, Sep 4, 2009 at 5:53 A

Re: Filtering question/advice

2009-09-04 Thread Amin Mohammed-Coleman
Hi, Apologies for resending this email but just wondering if I could get some input on the below. I am in the final stages of getting a proof of concept together and this is the final piece of the puzzle. Sorry again for sending this! Cheers Amin On Fri, Sep 4, 2009 at 10:38 AM, Amin Mohammed-

Re: too many file descriptors opened by Lucene shows (deleted) in /proc

2009-09-04 Thread Ganesh
I am closing the readers when not in use. I tried testing explicitly not closing the reader and found that the file is not actually deleted and it remains in the disk. I remeber reading this information. In my case, readers and searchers are closed and the files are not existing in disk but /p

RE: New "Stream closed" exception with Java 6

2009-09-04 Thread Chris Bamford
Hi Grant > Seems like something is closing your InputStreamReader out from under you. > Is there a concurrency issue, perhaps? I'm coming to the same conclusion - there must be >1 threads accessing this index at the same time. Better go figure it out ... :-) Thanks again - Chris Chris Bam

how can I merge indexes without deleting the original index?

2009-09-04 Thread Francisco Borges
Hello everyone, As I understood it, merging indexes will lead to the deletion of the original indexes. Is there a way to merge indexes while keeping the original indexes intact? Kind regards, -- Francisco - To unsubscribe, e-m

RE: too many file descriptors opened by Lucene shows (deleted) in /proc

2009-09-04 Thread Uwe Schindler
Yes you are doing the reopen right, but my question was: in your code "// reOpen", which is not visible, do you close the old reader after reopen? If you do not do this, it stays open forever. This is what I suggested by my example. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://w

Re: too many file descriptors opened by Lucene shows (deleted) in /proc

2009-09-04 Thread Ganesh
I doing the following way. if (!reader.isCurrent()) { //reOpen } I tried debugging and my log shows the correct reference count. Any other idea? Regards Ganesh - Original Message - From: "Uwe Schindler" To: Sent: Friday, September 04, 2009 1:12 PM Subject: RE: too many file descript

Re: Filtering question/advice

2009-09-04 Thread Amin Mohammed-Coleman
Hi I include a testcase to show what I am trying to do. Testcase number 3 fails. Thanks Amin On Fri, Sep 4, 2009 at 10:17 AM, Amin Mohammed-Coleman wrote: > Hi, > > I am looking at applying a security filter for our lucene document and I > was wondering if I could get feedback on whether the so

Re: Extending Sort/FieldCache

2009-09-04 Thread Michael McCandless
On Fri, Sep 4, 2009 at 12:33 AM, Shai Erera wrote: > 1) Refactor the FieldCache API (and TopFieldCollector) such that one can > provide its own Cache of native values. I'd hate to rewrite the > FieldComparators logic just because the current API is not extendable. That > I agree should be quite st

Filtering question/advice

2009-09-04 Thread Amin Mohammed-Coleman
Hi, I am looking at applying a security filter for our lucene document and I was wondering if I could get feedback on whether the solution I have come up with. Firstly I will explain the scenario and followed by the proposed solution: We have a concept of a Layer which is a project whereby a br

Re: Use of tika for parsing, offsets questions

2009-09-04 Thread David Causse
On Thu, Sep 03, 2009 at 03:07:18PM +0200, Jukka Zitting wrote: > Hi, > > On Wed, Sep 2, 2009 at 2:40 PM, David Causse wrote: > > If I use tika for parsing HTML code and inject parsed String to a lucene > > analyzer. What about the offset information for KWIC and return to text > > (like the google

Doubt about Fieldcache.DEFAUL.getStrings[] and Fieldcache.DEFAULT.getStringIndex

2009-09-04 Thread Marc Sturlese
Hey there, I am iterating over a DocSet and for every id I neew to get the value of a field wich is analyzed with KeyworddAnalyzer and is not sored. I have noticed to ways of doing it using Fieldcache. Can someone pleas explain me the pros and contras of using one or another? Using StringIndex:

RE: too many file descriptors opened by Lucene shows (deleted) in /proc

2009-09-04 Thread Uwe Schindler
One general trap with reopen(): Reopen() returns a *new* IndexReader. If this new IndexReader is different from the actual one, you have to close the old reader when you are finished working on it. If you only have one thread working on this IndexReader that is reopened, you can close the old reade

Re: too many file descriptors opened by Lucene shows (deleted) in /proc

2009-09-04 Thread Ganesh
I am having only one process using Lucene DB. The same process writes and reads. I do re-open indexreader. I am maintaing ref count for each reader/searcher and closing it if it is not used. I am not able to understand, why the file descriptor is showing as (deleted)? I guessing some issues? Co

Re: First result in the group

2009-09-04 Thread Mark Harwood
>>It removes the duplicates at query time and not in the results. Not sure I understand that statement. Do you mean you want index-time rejection of potentially duplicate inserts? On 4 Sep 2009, at 07:01, Ganesh wrote: It removes the duplicates at query time and not in the results. --