Filtering a SpanQuery

2008-05-06 Thread Eran Sevi
Hi, I am looking for a way to filter a SpanQuery according to some other query (on another field from the one used for the SpanQuery). I need to get access to the spans themselves of course. I don't care about the scoring of the filter results and just need the positions of hits found in the

Re: index corruption with latest lucene

2008-05-06 Thread Gopikrishnan Subramani
[ Sorry if I'm hijacking this thread, if you feel this error is unrelated to this thread, I'll move this to a separate thread. ] Even after upgrading to 2.3.1 I'm running into index corruption problems. I'm posting below the exception that is generated while searching. The stack trace looks like,

Re: index corruption with latest lucene

2008-05-06 Thread Michael McCandless
Are you using JRE 1.6.0_04 or 1.6.0_05? This sounds exactly the same as this: http://www.gossamer-threads.com/lists/lucene/java-user/59650 If it is the same issue, which seems to be a bug in the hotspot compiler, downgrading to JRE 1.6.0_03, or running Java with -Xbatch (forces

Re: index corruption with latest lucene

2008-05-06 Thread Michael McCandless
Could you provide more detail on how you hit these two exceptions? Are they reproducible from scratch (creating a new index)? Are you using multiple threads against IndexWriter? Is autoCommit true or false? Any prior exceptions hit? Do your documents have varying number/configuration

Re: index corruption with latest lucene

2008-05-06 Thread Gopikrishnan Subramani
Thanks Mike. Sorry, I should have mentioned that I'm using 1.6.0_04. I happened to look at the thread a while ago and used -Xbatch but that didn't help which made me think may be it's a different issue. I'll try with -Xint before downgrading to 1.6.0_03 to be doubly sure. -Gopi On 5/6/08,

RE: lucene farsi problem

2008-05-06 Thread esra
Hi Steven , Hi Steven, i tried the class and it works fine with the locale parameter ar. Actually we are using fa for farsi and ar for arabic. I have added a little control for the locale parameter in my code and now i can see the correct results. Thank you very much for ypur help. Esra.

Re: Filtering a SpanQuery

2008-05-06 Thread Paul Elschot
Eran, Op Tuesday 06 May 2008 10:15:10 schreef Eran Sevi: Hi, I am looking for a way to filter a SpanQuery according to some other query (on another field from the one used for the SpanQuery). I need to get access to the spans themselves of course. I don't care about the scoring of the

How to make a query that associates 2 index files

2008-05-06 Thread Michael Siu
Hi, I am a newbie to Lucene. I have a question for making a query that associate 2 index files: - One index has the content index for a list of documents and a key to the document. That means the Lucene document of this index contains 2 fields: the 'content' and the 'key'. - another

Postcode/zipcode search

2008-05-06 Thread Chris Mannion
Hi all I've got a bit of a niggling problem with how one of my searches is working as opposed to how my users would like it too work. We're indexing on UK postcodes, which are in the format of a 3 or 4 character area code followed by a 3 or 4 character street specific code, e.g. NW10 7NY or M11

Re: Multiple Field search

2008-05-06 Thread Erick Erickson
One of my favorite quotes from Roger Zelazny... postulating infinity, the rest is easy. In this case, infinity is how you break up your query. The easy part is making your search return what you want. Assuming you know that you want greatest and hits to go against the title field and beatles to

Re: How to make a query that associates 2 index files

2008-05-06 Thread Erick Erickson
You don't. You really have to roll your own solution here, there's no inter-index awareness that I know of in Lucene. Typically, people either do a half-half solution (that is, put the text search in Lucene and leave the DB parts in the DB) or de-normalize the data in a Lucene index so you don't

Re: Multiple Field search

2008-05-06 Thread Kelvin Foo Chuan Lyi
Thanks... that's what I thought of ... but was wondering if that was the best method to do so... i guess it is then... :) On Wed, May 7, 2008 at 12:32 AM, Erick Erickson [EMAIL PROTECTED] wrote: One of my favorite quotes from Roger Zelazny... postulating infinity, the rest is easy. In

Re: Postcode/zipcode search

2008-05-06 Thread Grant Ingersoll
You might have a look at using a phrase query when you have more than one term in the query in addition to your term query, but giving the phrase query more weight (i.e. give an exact match more weight) and keep your original tokenization process. Something like: NW10 7NY^5 OR NW10 OR 7NY

Re: How to make a query that associates 2 index files

2008-05-06 Thread Chris Lu
No easy way unless you merge your 2 indexes into: Index: [who][accessed] [key] [content] David1/1/2007 Abcblah blah 123 ... Someone 1/2/2005 Abcblah blah 123 ... Guess12/1/2000Xyz

Re: Postcode/zipcode search

2008-05-06 Thread Erick Erickson
Have you looked at PrefixQuery? If that doesn't work for you, could you give a few more examples of expected inputs and outputs? Best Erick On Tue, May 6, 2008 at 12:28 PM, Chris Mannion [EMAIL PROTECTED] wrote: Hi all I've got a bit of a niggling problem with how one of my searches is

Re: Multiple Field search

2008-05-06 Thread Erick Erickson
Well, it's the one I'd use. Whether it's the best or not is...er...not so certain G. Erick On Tue, May 6, 2008 at 12:37 PM, Kelvin Foo Chuan Lyi [EMAIL PROTECTED] wrote: Thanks... that's what I thought of ... but was wondering if that was the best method to do so... i guess it is then... :)

RE: Postcode/zipcode search

2008-05-06 Thread Will Johnson
You could split up the field into 2 separate fields: Postcode:NW10 7NY - post1:NW10 post2:7NY Then rewrite user's queries using the same logic: ie if the enter 1 term 'NW10' it gets rewritten to post1:NW10, if they enter 2 terms post1:NW10 AND post2:7NY. It also lets you do fuzzy search ie

Re: Postcode/zipcode search

2008-05-06 Thread AJ Weber
Maybe I'm oversimplifying it, and maybe this isn't what you desire, but... What about breaking the postcode into two (or three) different fields? Seems easy to parse on the ingestion-side, as you just break the string at the middle space. Then store postal_area, postal_street, and optionally

Re: Postcode/zipcode search

2008-05-06 Thread mark harwood
Can you not convert all postcodes to coordinates and do actual distance-based matching? You will have to pay Royal Mail or 3rd party suppliers to get hold of the PAF data required for this geocoding (despite having funded this already as a UK tax payer- g) Cheers Mark - Original

Are those runtime errors about the jdk, or lucene's jar, or my code?

2008-05-06 Thread crspan
-- OS: Linux lg99 2.6.5-7.276-smp #1 SMP Fri Sep 28 20:33:22 AKDT 2007 x86_64 x86_64 x86_64 GNU/Linux -- Lucene: 2.3.2 (tried 2.2.0 as well, since the index was built around 2.2.0, jdk1.6.0_01 ) -- JDK: Sun jdk1.6.0_06 ( from jdk-6u6-linux-x64.bin ) Sun jdk1.5.0_15 ( from

Re: Are those runtime errors about the jdk, or lucene's jar, or my code?

2008-05-06 Thread Michael McCandless
Hi, Could you run org.apache.lucene.index.CheckIndex on your index and post the result? Are these exceptions easily reproduced starting from scratch (new index)? More responses/questions below: crspan wrote: -- OS: Linux lg99 2.6.5-7.276-smp #1 SMP Fri Sep 28 20:33:22 AKDT 2007

RE: How to make a query that associates 2 index files

2008-05-06 Thread Michael Siu
My problem is: the [content] value can be huge. Duplicating it in more than one index document waste disk space (and search time?). In additions, when new documents are added to the second index, it will be faster to just index the linked [content] once (in first index file) and any subsequent

Re: How to make a query that associates 2 index files

2008-05-06 Thread Erick Erickson
Sure, just include different fields in different docs in your index. Then, when you search since each term is on a field, docs without that field are excluded from the search. But this is really not very different in terms of a solution than your earlier one. You still have the issue of searching

Help to solve an issue when upgrading Lucene-Oracle integration to lucene 2.3.1

2008-05-06 Thread Marcelo Ochoa
Hi Lucene experts: I am working upgrading Lucene-Oracle integration project to latest Lucene 2.3.1 code. After correcting a minor issue on OJVMDirectory file implementation I have the integration running with latest 2.3.1 code. But it only works with small indexes, I think index which are

Re: Help to solve an issue when upgrading Lucene-Oracle integration to lucene 2.3.1

2008-05-06 Thread Michael McCandless
Hi Marcelo, Hmmm something is not right. Somehow the byte slices, which DocumentsWriter uses to hold the postings in RAM, became corrupt. Is this easily reproduced? Mike Marcelo Ochoa wrote: Hi Lucene experts: I am working upgrading Lucene-Oracle integration project to latest Lucene

Re: Help to solve an issue when upgrading Lucene-Oracle integration to lucene 2.3.1

2008-05-06 Thread Marcelo Ochoa
Hi Mike: Well the problem is consitently, but to test the code and the project its necesary an Oracle 11g database :( I don't know why the computation of bufferUpto variable is wrong in the last step, during all other calls pool.buffers.length is consitently to 366 so I asume that its OK. And

RE: How to make a query that associates 2 index files

2008-05-06 Thread Michael Siu
Yes, there is many-to-one mapping to the content index. And the size of content data is varying say from 1K to multiple Gs. That why it is not wise to repeat the same content in a index document. Thanks for telling that the doc IDs are not constant. Yes, the keys to content are generated on the

Re: Are those runtime errors about the jdk, or lucene's jar, or my code?

2008-05-06 Thread crspan
Thanks so much, Mike. Those runtime errors were caused by one corrupted index, somehow corrupted during scp. It has Nothing to do with lucene 2.3.2. For those who come by this thread: Please CheckIndex That would saved me many hours of fruitless debugging. Cheers, Charlie Michael