Inrease the performance of Indexing in Lucene

2007-07-18 Thread miztaken
Hi, Please help me. Its been a month since i am trying lucene. My requirements are huge, i have to index and search in TB of data. I have question regarding three topics: 1. Problem in Indexing As i need to index TB of data, so by googling and visiting different forum i deployed following fash

Re: StandardTokenizer is slowing down highlighting a lot

2007-07-18 Thread Michael Stoppelman
Might be nice to add a line of documentation to the highlighter on the possible performance hit if one uses StandardAnalyzer which probably is a common case. Thanks for the speedy response. -M On 7/18/07, Mark Miller <[EMAIL PROTECTED]> wrote: Unfortunately, StandardAnalyzer is slow. StandardA

Re: StandardTokenizer is slowing down highlighting a lot

2007-07-18 Thread Mark Miller
Unfortunately, StandardAnalyzer is slow. StandardAnalyzer is really limited by JavaCC speed. You cannot shave much more performance out of the grammar as it is already about as simple as it gets. You should first see if you can get away without it and use a different Analyzer, or if you can re-

StandardTokenizer is slowing down highlighting a lot

2007-07-18 Thread Michael Stoppelman
Hi all, I was tracking down slowness in the contrib highlighter code and it seems the seemingly simple tokenStream.next() is the culprit. I've seen multiple posts about this being a possible cause. Has anyone looked into how to speed up StandardTokenizer? For my documents it's taking about 70ms p

Re: MoreLikeThis

2007-07-18 Thread Akanksha Baid
Right , I was making a silly mistake there. I have it working now. Thanks for the reply. yu wrote: You can put lucene-queries-2.2.0.jar on your class path or your Eclipse project build path. That's all you need. Jay Akanksha Baid wrote: I am using Lucene 2.1.0 and want to use MoreLikeThis f

Re: MoreLikeThis

2007-07-18 Thread yu
You can put lucene-queries-2.2.0.jar on your class path or your Eclipse project build path. That's all you need. Jay Akanksha Baid wrote: I am using Lucene 2.1.0 and want to use MoreLikeThis for querying documents. I understand that the jar file for the same is in contrib. I have the contrib

MoreLikeThis

2007-07-18 Thread Akanksha Baid
I am using Lucene 2.1.0 and want to use MoreLikeThis for querying documents. I understand that the jar file for the same is in contrib. I have the contrib folder extracted, but am not sure what to do from this point on. What jar file am I looking for and where should put it. I am using Eclipse

TermEnum - previous() method ?

2007-07-18 Thread muraalee
Hi All, I searched in this forum for anybody looking for need for previous() method in TermEnum. I found only this link http://www.nabble.com/How-to-navigate-through-indexed-terms-tf28148.html#a189225 Would it be possible to implement previous() method ? I know i am asking for quick solution here

Dictionary Type Lookup

2007-07-18 Thread muraalee
Hi, I am trying to model a Dictionary Type Search in Lucene. My approach was this - Load the dictionary file ( words & their meanings ) and index each dictionary term and associated meaning as a Lucene Document. - Use IndexReader's term method to peek at the index and get the TermEnum. TermEnum'

Re: Lucene shows parts of search query as a HIT

2007-07-18 Thread Askar Zaidi
Hey Guys, I just checked my Lucene results. It shows a document with the word hit "change" when I am searching for "Chan", and it considers that as a hit. Is there a way to stop this and show just the exact word match ? I started using Lucene yesterday, so I am fairly new ! thanks AZ On 7/18/0

Re: lucene version?

2007-07-18 Thread Michael McCandless
I don't think this is stored in the index. I think the closest you can get is the "format" of the segments_N file which changes every time the index file format changes. That at least lets you narrow it down possibly to a single release if the file format is changing frequently (eg it has in the

Re: Lucene shows parts of search query as a HIT

2007-07-18 Thread Erick Erickson
Are you sure that the hit wasn't on "w" or "kim"? The default for searching is OR... I recommend that you get a copy of Luke (google lucene luke) which allows you to examine your index as well as see how queries parse using various analyzers. It's an invaluable tool... Best Erick On 7/18/07, As

lucene version?

2007-07-18 Thread Akanksha Baid
Is there a way to test as to which version of Lucene was used to build an index? -Akanksha - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Lucene shows parts of search query as a HIT

2007-07-18 Thread Askar Zaidi
Hey folks, I am a new Lucene user , I used the following after indexing: search(searcher, "W. Chan Kim"); Lucene showed me hits of documents where "channel" word existed. Notice that "Chan" is a part of "Channel" . How do I stop this ? I am keen to find the exact word. I used the following, b

Re: WildcardQuery and SpanQuery

2007-07-18 Thread Paul Elschot
On Wednesday 18 July 2007 12:30, Cedric Ho wrote: > Thanks for the quick response Paul =) > > However I am lost while looking at the surround package. That is not really surprising, the code is factored to the bone, and it is hardly documented. You could have a look at the test code to start. Al

Re: Does Index have a Tokenizer Built into it

2007-07-18 Thread John Paul Sondag
Is there a way to know how big to make the array before hand (how many terms are in the topic total?). I'm worried about the efficiency of this, since I'd have to rebuild every document that is a "hit" on the fly to make a snippet for each "hit" on the page (say 10 a page). Now I have to wonder

Re: Query in lucene

2007-07-18 Thread Erick Erickson
When in doubt, WhitespaceAnalyzer is the most predictable. Note that it doesn't lower-case the tokens though. Depending upon your requirements, you can always pre-process your query and indexing streams and do your own lowercasing and/or character stripping. You can always create your own analyze

Query in lucene

2007-07-18 Thread WATHELET Thomas
Witch analyser I have to use to find text like this ''?

Re: WildcardQuery and SpanQuery

2007-07-18 Thread Mark Miller
You could give this a shot (From my Qsol query parser): package com.mhs.qsol.spans; /** * Copyright 2006 Mark Miller ([EMAIL PROTECTED]) * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy

Re: WildcardQuery and SpanQuery

2007-07-18 Thread Cedric Ho
Thanks for the quick response Paul =) However I am lost while looking at the surround package. Are you suggesting I can solve my problem at hand using the surround package? On 7/18/07, Paul Elschot <[EMAIL PROTECTED]> wrote: On Wednesday 18 July 2007 05:58, Cedric Ho wrote: > Hi everybody, > >