Lucene shows parts of search query as a HIT

2007-07-18 Thread Askar Zaidi
Hey folks, I am a new Lucene user , I used the following after indexing: search(searcher, "W. Chan Kim"); Lucene showed me hits of documents where "channel" word existed. Notice that "Chan" is a part of "Channel" . How do I stop this ? I am keen to find the exact word. I used the following, b

Re: Lucene shows parts of search query as a HIT

2007-07-18 Thread Askar Zaidi
how queries parse using various analyzers. It's an invaluable tool... Best Erick On 7/18/07, Askar Zaidi <[EMAIL PROTECTED]> wrote: > > Hey folks, > > I am a new Lucene user , I used the following after indexing: > > search(searcher, "W. Chan Kim"); > &

Re: Lucene shows parts of search query as a HIT

2007-07-19 Thread Askar Zaidi
; I started using Lucene yesterday, so I am fairly new ! > > > > thanks > > AZ > > > > On 7/18/07, Erick Erickson <[EMAIL PROTECTED]> wrote: > > > > > > Are you sure that the hit wasn't on "w" or "kim"? The

Re: Lucene shows parts of search query as a HIT

2007-07-19 Thread Askar Zaidi
> > > > > Are you sure that the hit wasn't on "w" or "kim"? The > > > default for searching is OR... > > > > > > I recommend that you get a copy of Luke (google lucene luke) > > > which allows you to examine your index as well

Re: Lucene shows parts of search query as a HIT

2007-07-20 Thread Askar Zaidi
} This shows me the item value. Now I wanna see the score related to this item, how do I get that? thanks, AZ On 7/19/07, Erick Erickson <[EMAIL PROTECTED]> wrote: > > QueryParser.setDefaultOperator > > On 7/19/07, Askar Zaidi <[EMAIL PROTECTED]> wrote: > &g

Fine Tuning Lucene implementation

2007-07-24 Thread Askar Zaidi
Hey Guys, I just finished up using Lucene in my application. I have data in a database , so while indexing I extract this data from the database and pump it into the index. Specifically , I have the following data in the index: where itemID is just a number (primary key in the DB) tags : te

FieldCache for Search

2007-07-24 Thread Askar Zaidi
Hey Guys, >From what I understand, FieldCache is used to store only the field required for search. I am using a Document object and then using doc.get("item"). One of my fields is HUGE, so using Document will slow things down. How can I use FieldCache ? an example ? thanks, AZ

Re: Fine Tuning Lucene implementation

2007-07-24 Thread Askar Zaidi
class machine ? I have also done some of the optimizations that are mentioned on the Lucene website. thanks, AZ On 7/24/07, Askar Zaidi <[EMAIL PROTECTED]> wrote: > > Hey Guys, > > I just finished up using Lucene in my application. I have data in a > database , so while indexin

Re: Fine Tuning Lucene implementation

2007-07-24 Thread Askar Zaidi
kly, I surprised your slowdown is only linear. > > On Jul 24, 2007, at 4:31 PM, Askar Zaidi wrote: > > > I have 512MB RAM allocated to JVM Heap. If I double my system RAM > > from 768MB > > to say 2GB or so, and give JVM 1.5GB Heap space, will I get quicker > &g

Re: Fine Tuning Lucene implementation

2007-07-24 Thread Askar Zaidi
Can someone please tell me how to cache results in Lucene ? I know the classes, but I don't know how to go about it. thanks, Askar On 7/24/07, Askar Zaidi <[EMAIL PROTECTED]> wrote: > > Thanks for the reply. > > I am timing the entire search process with a stop watch, a

Re: Fine Tuning Lucene implementation

2007-07-24 Thread Askar Zaidi
where the slowness is. Please try to isolate the Lucene calls from > the DB calls and look at the timings for both. > > On Jul 24, 2007, at 5:28 PM, Askar Zaidi wrote: > > > Thanks for the reply. > > > > I am timing the entire search process with a stop watch, a bit &g

Re: Fine Tuning Lucene implementation

2007-07-24 Thread Askar Zaidi
Shall I setMergeFactor = 2 ? Slow indexing is not a bother. On 7/24/07, Askar Zaidi <[EMAIL PROTECTED]> wrote: > > I ran some tests and it seems that the slowness is from Lucene calls when > I do "doBodySearch", if I remove that call, Lucene gives me results in 5 >

Re: Fine Tuning Lucene implementation

2007-07-24 Thread Askar Zaidi
t hitCount = hits.length(); for(int i=0;i wrote: > > Could you show us the relevant source from doBodySearch()? > > -h > > On Tue, 2007-07-24 at 19:58 -0400, Askar Zaidi wrote: > > I ran some tests and it seems that the slowness is from Lucene calls > when I > &g

Re: Fine Tuning Lucene implementation

2007-07-24 Thread Askar Zaidi
re about". > > What would break if you: > 1. Included "creator" in the Lucene index (or, filtered out the Hits > using a BitSet or something like it) > 2. Executed 1 search > 3. Collected the results of the first N Hits (where N is some > reasonable limit, like

Re: Fine Tuning Lucene implementation

2007-07-25 Thread Askar Zaidi
a lot, Askar On 7/25/07, Dmitry <[EMAIL PROTECTED]> wrote: > > Askar, > why do you need to add +id:? > thanks, > dt, > www.ejinz.com > search engine news forms > ----- Original Message - > From: "Askar Zaidi" <[EMAIL PROTECTED]> > To: ; <

Re: Fine Tuning Lucene implementation

2007-07-25 Thread Askar Zaidi
gt; database? Or is it that you want to score the item based on some > terms as well. If that is the case, there are other ways of doing > this and we can discuss them. > > -Grant > > On Jul 25, 2007, at 10:10 AM, Askar Zaidi wrote: > > > Hey Guys, > > > > I n

Re: Fine Tuning Lucene implementation

2007-07-25 Thread Askar Zaidi
ine the score for > you > } > > MemoryIndex info can be found at http://lucene.zones.apache.org:8080/ > hudson/job/Lucene-Nightly/javadoc/org/apache/lucene/index/memory/ > package-summary.html > > -Grant > > On Jul 25, 2007, at 11:45 AM, Askar Zaidi wrote: > > &

Re: Fine Tuning Lucene implementation

2007-07-25 Thread Askar Zaidi
intln(query); I get: +contents:Harvard +contents:Business + contents: Review Can I just add: +contents:Harvard +contents:Business + contents: Review +itemID=id ?? That query would just return one document. On 7/25/07, Askar Zaidi <[EMAIL PROTECTED]> wrote: > > Instead of refact

Re: Fine Tuning Lucene implementation

2007-07-25 Thread Askar Zaidi
k the one document I need from the Index and give me the score. I don't have to iterate over Hits. Any clues ? I can't find any examples on query building . thanks ! Askar On 7/25/07, Grant Ingersoll <[EMAIL PROTECTED]> wrote: > > Yes, you can do that. > > > On Jul 25

Re: Assembling a query from multiple fields

2007-07-26 Thread Askar Zaidi
I did this yesterday. Manually appended an extra field to the query. It works fine. On 7/26/07, Erik Hatcher <[EMAIL PROTECTED]> wrote: > > > On Jul 25, 2007, at 5:05 PM, Joe Attardi wrote: > > As far as I can tell, I basically have two options: > > (1) Manually prepend the field identifier to the

Re: Fine Tuning Lucene implementation

2007-07-25 Thread Askar Zaidi
ll be active on this list from now on and try and answer questions to which I was seeking answers. later, Askar On 7/25/07, Doron Cohen <[EMAIL PROTECTED]> wrote: > > "Askar Zaidi" wrote: > > > ... Heres what I am trying to accomplish: > > > > 1. Iterate ov

Lucene Field score value

2007-07-31 Thread Askar Zaidi
Hey guys, I was wondering if there is a way to retrieve score of a field in a document ? If my document looks like this: {itemID},{field 1},{field 2} I'd like to get score of individual fields 1 and 2 rather than the score of the entire document. Is it possible ? thanks, AZ

Re: Lucene Field score value

2007-07-31 Thread Askar Zaidi
Hi, Does anyone know how to retrieve the score of an individual field instead of doing: hits = score(i); This will get me the entire score of the document. I'd like to get the score of a single field by specifying the field name. thanks, AZ On 7/31/07, Askar Zaidi <[EMAIL PROTECTED

Re: Lucene Field score value

2007-07-31 Thread Askar Zaidi
To be more specific: I want to retrieve the scores of individual fields inside a document so that I can manipulate the score of one field. This is the requirement of my application. After the manipulation I can add these scores and then show the total. thanks, AZ On 7/31/07, Askar Zaidi

Re: Lucene Field score value

2007-07-31 Thread Askar Zaidi
other 3 fields ? Will that help ? Would there be a way to bring down the score of the contents field ? thanks, AZ On 7/31/07, Erick Erickson <[EMAIL PROTECTED]> wrote: > > Wouldn't boosting handle this for you? > > On 7/31/07, Askar Zaidi <[EMAIL PROTECTED]> wrote

Re: Lucene Field score value

2007-07-31 Thread Askar Zaidi
ied with the current scoring . > > All I can say is try it and find out. You might consider using Luke > to try various boosts without having to mess with too much code. > > Erick > > On 7/31/07, Askar Zaidi <[EMAIL PROTECTED]> wrote: > > > > Boosting

Re: Lucene Field score value

2007-07-31 Thread Askar Zaidi
field ? I came across Boosting of a term in the query so that would mean, "apache^4 jakarta" This means I am more keen to find apache than jakarta. I am keen to boost the score of a field, how can that be done ? On 7/31/07, Askar Zaidi <[EMAIL PROTECTED]> wrote: > > Using

Re: Lucene Field score value

2007-07-31 Thread Askar Zaidi
Guys, Heres someone who did this hack: http://blog.mindbridge.com/?p=55 Cheers, AZ On 7/31/07, Askar Zaidi <[EMAIL PROTECTED]> wrote: > > I'll have to use StringBuffer and get the Explanation in it as a String. > Then parse StringBuffer to get the scores of each field, then

Do AND + OR Search in Lucene

2007-08-02 Thread Askar Zaidi
Hey Guys, Quick question: I do this in my code for searching: queryParser.setDefaultOperator(QueryParser.Operator.AND); Lucene is OR by default so I change it to AND for my requirements. Now, I have a requirement to do OR as well. I mean while doing AND I'd like to include results from OR too .

Re: 答复: 答复: Lucene in large database contexts

2007-08-10 Thread Askar Zaidi
Hey Guys, I am trying to do something similar. Make the content search-able as soon as it is added to the website. The way it can work in my scenario is that , I create the Index for a every new user account created. Then, whenever a new document is uploaded, its contents are added to the users I

Re: Indexing

2007-08-22 Thread Askar Zaidi
Thats exactly what I do. The moment something is added to the database , I add it to the lucene index of the user. Upon new account creation, open a new lucene index for this new user. Whenever something is uploaded, just add it to the index. - Askar On 8/22/07, Ard Schrijvers <[EMAIL PROTECTED]>

Re: OutOfMemoryError tokenizing a boring text file

2007-09-01 Thread Askar Zaidi
I have indexed around 100 M of data with 512M to the JVM heap. So that gives you an idea. If every token is the same word in one file, shouldn't the tokenizer recognize that ? Try using Luke. That helps solving lots of issues. - AZ On 9/1/07, Erick Erickson <[EMAIL PROTECTED]> wrote: > > I can't

Re: JdbcDirectory

2007-09-03 Thread Askar Zaidi
1) I don't understand why the index would get corrupted. We store huge data and meta-data using Lucene. 2) For this, I synced Lucene with the DB operations. If you use Hibernate, theres an API for that. Or, you could just write your own factory methods to add/delete/edit index documents when a DB o

Re: JdbcDirectory

2007-09-03 Thread Askar Zaidi
Yes. Every time a user updates a piece of information, you do the update in the DB as well as the Index. If you are using Hibernate, they have an API that does this mapping. I am not sure why you plan to store data in the Index ?? Storing data is the DBs job, searching is the Index job. I would sug