Multiple Languages with Lucene (Arabic English)

2007-07-24 Thread Elie Choueiri
Hi I'm new to searching and am trying to use Lucene to search English Arabic documents. I've got a bunch of questions (hopefully you'll find some interesting!) and am hoping someone's gone through some of them and has some answers for me! First, do I have to worry about the Arabic

Re: Search for null

2007-07-24 Thread daniel rosher
Perhaps you can use a filter in the following way. -Create a filter (via QueryFilter) that would contain all document that do not have null values for the field -flip the bits of the filter so that it now contains documents that have null values for a field -Use the filter in conjunction with

Re: Multiple Languages with Lucene (Arabic English)

2007-07-24 Thread Grant Ingersoll
On Jul 24, 2007, at 3:21 AM, Elie Choueiri wrote: Hi I'm new to searching and am trying to use Lucene to search English Arabic documents. I've got a bunch of questions (hopefully you'll find some interesting!) and am hoping someone's gone through some of them and has some answers

Re: Search for null

2007-07-24 Thread testn
Would it be more efficient to create an additional inverted field where I assign a value to that field only when the field I would like to search is NULL? daniel rosher wrote: Perhaps you can use a filter in the following way. -Create a filter (via QueryFilter) that would contain all

Re: Search for null

2007-07-24 Thread Yonik Seeley
On 7/24/07, daniel rosher [EMAIL PROTECTED] wrote: Perhaps you can use a filter in the following way. -Create a filter (via QueryFilter) that would contain all document that do not have null values for the field -flip the bits of the filter so that it now contains documents that have null

ArrayIndexOutOfBoundsException on TermScorer

2007-07-24 Thread Rafael Rossini
Hello all, I´m using solr in an app, but I´m getting an error that it might be a lucene problem. When I perform a simple query like q = brasil I´m getting this exception: java.lang.ArrayIndexOutOfBoundsException: 1226511 at org.apache.lucene.search.TermScorer.score(TermScorer.java:74) at

Re: Multiple Languages with Lucene (Arabic English)

2007-07-24 Thread Erick Erickson
You'll also find lots of discussion about indexing multiple languages if you search the mail archive for things like multiple language. I think one thing you're missing is that Lucene indexes data however you tell it to. You have both total control over and total responsibility for how things

Re: Search for null

2007-07-24 Thread Erick Erickson
Nobody can answer that question, you have to test in your particular situation. Filters are very efficient to use once created, can be created once and used often, etc. Adding a special value to stand for an empty field is conceptually simple, and queries are straight forward. Unless you can

Re: ArrayIndexOutOfBoundsException on TermScorer

2007-07-24 Thread Rafael Rossini
I don´t know the exact date of the build, but it is certainly before July 4, and before the LUCENE-843 patch was committed. My index has 1.119.934 docs on it and is about 8.2G. I really don´t know how to reproduce this, the only query that I get this error, so far, is brasil... and I don´t know

Re: Search for null

2007-07-24 Thread Jay Yu
daniel rosher wrote: Perhaps you can use a filter in the following way. -Create a filter (via QueryFilter) that would contain all document that do not have null values for the field Interesting: what does the QueryFilter look like? Isn't it just as hard as finding out what docs have the null

Re: Lucene 2.2 + Not Merging Segments

2007-07-24 Thread Harini Raghavan
I figured out the problem. The issue had nothing to do with Lucene 2.2. I had accidentally reset the default mergeFactor to 1000. This was the reason it was not merging the segments. With the default mergeFactor, the indexing is working perfectly fine. Thanks, Harini On 7/24/07, Michael

Re: ArrayIndexOutOfBoundsException on TermScorer

2007-07-24 Thread Rafael Rossini
I did a litle debug and found that in the TermScorer, the byte[] norms has size = 1.119.933, wich is the number of docs on my index, and there is a docID = 1226511, that is if the doc variable in the method is the docID. I tried to access this document with reader.document() and got a *

Re: ArrayIndexOutOfBoundsException on TermScorer

2007-07-24 Thread Yonik Seeley
On 7/24/07, Rafael Rossini [EMAIL PROTECTED] wrote: I did a litle debug and found that in the TermScorer, the byte[] norms has size = 1.119.933, wich is the number of docs on my index, and there is a docID = 1226511, that is if the doc variable in the method is the docID. I tried to access this

Re: ArrayIndexOutOfBoundsException on TermScorer

2007-07-24 Thread Rafael Rossini
Got it, I don´t have a clue if this corruption was caused by hardware failure, but that is possible because we suffer with a lot of power failures from time to time. But the thing is that I´ve been using lucene for a long time and I never got this kind of exception. The thing is that I´d

Fine Tuning Lucene implementation

2007-07-24 Thread Askar Zaidi
Hey Guys, I just finished up using Lucene in my application. I have data in a database , so while indexing I extract this data from the database and pump it into the index. Specifically , I have the following data in the index: itemID tags title summary contents where itemID is just a number

FieldCache for Search

2007-07-24 Thread Askar Zaidi
Hey Guys, From what I understand, FieldCache is used to store only the field required for search. I am using a Document object and then using doc.get(item). One of my fields is HUGE, so using Document will slow things down. How can I use FieldCache ? an example ? thanks, AZ

Lucene and Eastern languages (Japanese, Korean and Chinese)

2007-07-24 Thread Shaw, James
Hi, guys, I found Analyzers for Japanese, Korean and Chinese, but not stemmers; the Snowball stemmers only include European languages. Does stemming not make sense for ideograph-based languages (i.e., no stemming is needed for Japanese, Korean and Chinese)? Also for spell checking, does the

Re: Fine Tuning Lucene implementation

2007-07-24 Thread Grant Ingersoll
Where are you getting your numbers from? That is, where are your timers? Are you timing the rs.next() loop, or the individual calls to Lucene? What do the getX methods look like? How big are your queries? How big is your index? Essentially, we need more info to really help you.

Re: Fine Tuning Lucene implementation

2007-07-24 Thread Askar Zaidi
Thanks for the reply. I am timing the entire search process with a stop watch, a bit ghetto style. My getXXX methods are: Document doc = hits.doc(i); String str = doc.get(item); So you can see that I am retrieving the entire document in a search query. Ideally , I'd like to just retrieve the

Re: Fine Tuning Lucene implementation

2007-07-24 Thread Askar Zaidi
Can someone please tell me how to cache results in Lucene ? I know the classes, but I don't know how to go about it. thanks, Askar On 7/24/07, Askar Zaidi [EMAIL PROTECTED] wrote: Thanks for the reply. I am timing the entire search process with a stop watch, a bit ghetto style. My getXXX

Re: Fine Tuning Lucene implementation

2007-07-24 Thread Grant Ingersoll
Sorry, I mistyped. I don't mean the get methods, I mean the doTagSearch, doTitleSearch, etc. As for the stop watch, not really sure what to make of that... Try System.currentTimeMillis()... You can get just the fields you want when loading a Document by using the FieldSelector API

Re: Fine Tuning Lucene implementation

2007-07-24 Thread Askar Zaidi
I ran some tests and it seems that the slowness is from Lucene calls when I do doBodySearch, if I remove that call, Lucene gives me results in 5 seconds. otherwise it takes about 50 seconds. But I need to do Body search and that field contains lots of text. The field is contents. How can I

Re: Fine Tuning Lucene implementation

2007-07-24 Thread N. Hira
Could you show us the relevant source from doBodySearch()? -h On Tue, 2007-07-24 at 19:58 -0400, Askar Zaidi wrote: I ran some tests and it seems that the slowness is from Lucene calls when I do doBodySearch, if I remove that call, Lucene gives me results in 5 seconds. otherwise it takes

Re: Fine Tuning Lucene implementation

2007-07-24 Thread Askar Zaidi
Sure. public float doBodySearch(Searcher searcher,String query, int id){ try{ score = search(searcher, query,id); } catch(IOException io){} catch(ParseException pe){}

Re: Fine Tuning Lucene implementation

2007-07-24 Thread Mark Miller
Are you sure you are using the same Searcher for every search? Don't open a new one unless you have modified the index. You are iterating over every hit with the Hits class. You don't ever want to do this. Use a HitCollector if you want to iterate over more than a hundred or so hits. You will

Re: Fine Tuning Lucene implementation

2007-07-24 Thread Grant Ingersoll
Inline below On Jul 24, 2007, at 8:14 PM, Askar Zaidi wrote: Sure. public float doBodySearch(Searcher searcher,String query, int id){ try{ score = search(searcher, query,id); } catch(IOException

Re: Fine Tuning Lucene implementation

2007-07-24 Thread N. Hira
I'm no expert on this (so please accept the comments in that context) but 2 things seem weird to me: 1. Iterating over each hit is an expensive proposition. I've often seen people recommending a HitCollector. 2. It seems that doBodySearch() is essentially saying, do this search and return the

What replaced org.apache.lucene.document.Field.Text?

2007-07-24 Thread Lindsey Hess
I'm trying to get some relatively old Lucene code to compile (please see below), and it appears that Field.Text has been deprecated. Can someone please suggest what I should use in its place? Thank you. Lindsey public static void main(String args[]) throws Exception

Re: Fine Tuning Lucene implementation

2007-07-24 Thread Askar Zaidi
Hey Hira , Thanks so much for the reply. Much appreciate it. Quote: Would it be possible to just include a query clause? - i.e., instead of just contents:userQuery, also add +id:idWeCareAbout How can I do that ? I see my query as : +contents:harvard +contents:business +contents:review

RE: What replaced org.apache.lucene.document.Field.Text?

2007-07-24 Thread Liu_Andy2
Please reference How do I get code written for Lucene 1.4.x to work with Lucene 2.x? http://wiki.apache.org/lucene-java/LuceneFAQ#head-86d479476c63a2579e867b 75d4faa9664ef6cf4d Andy -Original Message- From: Lindsey Hess [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 25, 2007 12:31 PM