What is the best practice of using synonymy ?

2010-03-22 Thread Jeff Zhang
Hi all, I'd like to use the synonymy in my project. And I think there's two candidates solution : 1. using the synonymy in the indexing stage, enhance the index by using synonymy 2. using the synonymy in the search stage, enhance the search query by synonymy . I'd like to know which one is better

BooleanQuery and SpanQuery : how to get « com bined » spans?

2010-03-22 Thread Benoit Mercier
Hi, I would like to write a query composed of a BooleanQuery (several clauses) and a SpanQuery (SpanNearQuery), where both are mandatory. Sounds simple but I have to work on spans returned by this query. I know that I could use a Filter, but my goal is to get the spans from the « combined

Re: Corrupt index? Can I recover it?

2010-03-22 Thread Andrew Bruno
Also, I am trying to do an optimize, and I am getting java.lang.IllegalStateException: docs out of order (-82 < 0 ) On Tue, Mar 23, 2010 at 3:04 PM, Andrew Bruno wrote: > Thanks for this. > > Does anyone know how I can do this with version 2.0 > > http://lucene.apache.org/java/2_0_0/api/index.

Re: Corrupt index? Can I recover it?

2010-03-22 Thread Andrew Bruno
Thanks for this. Does anyone know how I can do this with version 2.0 http://lucene.apache.org/java/2_0_0/api/index.html like http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/index/CheckIndex.html On Fri, Mar 19, 2010 at 8:42 PM, Michael McCandless wrote: > It sounds like you should

Re: another question about phrasequery?(thanks again)

2010-03-22 Thread luocan19826164
sorry, describle again.I mean that query for "little boy" will match the both document.the Document that only has term "boy" and "little" will match the query.the document one add some sore beacuse it exactly match the query(term position totally match).do I describe clearly? for example:Documen

Re: another question about phrasequery?(thanks again)

2010-03-22 Thread luocan19826164
sorry, describle again.   Document 1: little boy is runningDocument 2:boy is littleI mean that a  query for "little boy" will match the both document. the Document that only has term "boy" and "little" will match the query. the document 1 add some sore beacuse it exactly match the query(term posit

UNC speed vs DOS path speed

2010-03-22 Thread Woolf, Ross
On a Windows machine I have noticed that using a UNC path instead of a DOS path when instantiating an index writer causes the performance to slow considerably, even when the UNC is to the same location as DOS path. Is anyone aware of this and know why? Is there anything that can be done to im

Re: how to filter numeric values?

2010-03-22 Thread Erick Erickson
Why not just use SimpleAnalyzer? From the javadocs: An Analyzer that filters LetterTokenizer with LowerCaseFilter

Re: how to filter numeric values?

2010-03-22 Thread juniol
hello thanks about the reply i found another solution: StopAnalyzer std1 = new StopAnalyzer(Version.LUCENE_CURRENT); PorterStemFilter std =new PorterStemFilter(std1.tokenStream("field", reader)); juniol wrote: > > hello; > > i want to filter my tokens and keep only string tokens ( rem

Re: how to filter numeric values?

2010-03-22 Thread Ahmet Arslan
> hello; > > i want to filter my tokens and keep only string tokens ( > remove numbers > ect). > i sue this : > > public TokenStream tokenStream(String fieldName, Reader > reader) { >     return new PorterStemFilter( >       new StopFilter( >         new LowerCaseFilter( >           new Standar

how to filter numeric values?

2010-03-22 Thread juniol
hello; i want to filter my tokens and keep only string tokens ( remove numbers ect). i sue this : public TokenStream tokenStream(String fieldName, Reader reader) { return new PorterStemFilter( new StopFilter( new LowerCaseFilter( new StandardFilter( new St

Re: Field method problems in first lucene program

2010-03-22 Thread rohit dholakia
Actually,I have been trying to get the new edition as an ebook :) . Will try to get that asap . That did help . Thanks a lot :) . To tell everyone,i am using an opensource s/w for the first time and this mailing list system is amazing. Just awesome :) . Pity though that I am not able to follow t

Re: Field method problems in first lucene program

2010-03-22 Thread Erick Erickson
You're missing that the original LIA was written against 1.4, and the current version is 3.x, so lots of stuff has been deprecated. You can get the second edition from Manning as an e-book, which whould have more current examples. HTH Erick On Mon, Mar 22, 2010 at 11:37 AM, rohit dholakia wrote:

Re: Is Lucene good to maintain metadata in a hierarchical manner?

2010-03-22 Thread Erick Erickson
I don't see any problems from your brief description. There are some considerations if you expect to update the information really, really, really frequently, but those are discussed in many threads in the archive But without more details on how you want to use the data, it's pretty hard to an

Re: access payload from HitCollector.collect()

2010-03-22 Thread Grant Ingersoll
On Mar 22, 2010, at 8:56 AM, prasenjit mukherjee wrote: > I am trying to implement oracle's aggregation like SQL's ( e.g. > SUM(col3) where col1='foo' and col2='bar' ) using lucene's payload > feature. > > I can add the integer_value ( of col3 ) as a payload to my searchable > fields ( col1 and

Field method problems in first lucene program

2010-03-22 Thread rohit dholakia
Hi, I am writing my first lucene program and following the 1st edition of lucene in action book and the blog article by grant on the lucid imagination blog . Now,if i am using the doc.add(field.text()) method with arguments,it says,cant resolve .. If i follow the blog,it is asking for an e

Re: PhraseQuery Performance Issues [Lucene 2.9.0]

2010-03-22 Thread Daniel Shane
Indeed! I found a very good article on this as well at : http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-1 It really sums up what you are saying. Thanks for the help! Daniel Shane - Original Message - From: "Michael McCandless" To: java-user@lucen

Re: another question about phrasequery?(thanks again)

2010-03-22 Thread Ian Lea
I'm not clear what exactly you are asking. With your examples: Document 1: little boy is running Document 2:boy is little a phrase query for "little boy" will match the first and not the second. Is that what you want? a phrase query for "litter boy" won't match either, but a general query migh

Re: Is Lucene good to maintain metadata in a hierarchical manner?

2010-03-22 Thread Jürgen Jakobitsch
hi, i'd go for RDF check out LucenSail (http://dev.nepomuk.semanticdesktop.org/wiki/LuceneSail) or our alpha @ turnguard.com/tuqs which is a quad store based on lucene, with high speed (lucence) data retrieval capabilities you would have your metadata in rdf ( :) where it belongs) and have a

another question about phrasequery?(thanks again)

2010-03-22 Thread luocanrao
I don't think the current phrasequery can meet my requirement. Can someone help me implement such a phrasequery? Exact match document add some score All other match document add 0 score.(no matter how big slop is) For example: Document 1: little boy is running Document 2:boy is little

Re: [Fwd: Re: Lucene 3.0 Search Performance Stats]

2010-03-22 Thread Jamie
Hi Suman Here are some of the things we did: - cache searcher/s - cache indexreader/s - all users use the same searchers - perform a background search when apps starts to warm up search engine - use numerics where necessary - use shorter dates (i.e. do you really need a granularity of up to the

Is Lucene good to maintain metadata in a hierarchical manner?

2010-03-22 Thread Disc Magnet
Hi, I want to know whether Lucene is good for a situation like this: We need to store metadata about various users of our application in this format. 1. Name 2. Time of registration 3. Other details The users are divided into various classes, e.g. prospective customer, customer, employee, etc.

答复: about lucene doc id recycle

2010-03-22 Thread luocanrao
Thanks,do you means that in any situation the docids maximum number maybe somewhere between current doc count and about 3 times doc count. No more than 4 times doc count -邮件原件- 发件人: Uwe Schindler [mailto:u...@thetaphi.de] 发送时间: 2010年3月22日 21:11 收件人: java-user@lucene.apache.org 主题: RE: ab

[Fwd: Re: Lucene 3.0 Search Performance Stats]

2010-03-22 Thread suman . holani
Hi , I am also using range based searches for dates .I am converting time to utc based seconds format and storing them to indexes. and then running range queries Is there something needed to make it more efficient. Thanks, Suman > Very nice! Thanks for sharing :) > > Mike > > On Fri, Ma

RE: about lucene doc id recycle

2010-03-22 Thread Uwe Schindler
Andi s you not optimize, as soon as two segments are merged, the docids are also reassigned. It just takes some time. Normally the docids maximum number maybe somewhere between current doc count and about 3 times doc count. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thet

Re: about lucene doc id recycle

2010-03-22 Thread Erick Erickson
Yes, when you call optimize, one side effect is that all the doc IDs are reassigned so they're contiguous.. HTH Erick On Mon, Mar 22, 2010 at 8:22 AM, luocanrao wrote: > Total document number is not very big, but update is very frequency. > > So I wonder whether the doc id is growing bigger

access payload from HitCollector.collect()

2010-03-22 Thread prasenjit mukherjee
I am trying to implement oracle's aggregation like SQL's ( e.g. SUM(col3) where col1='foo' and col2='bar' ) using lucene's payload feature. I can add the integer_value ( of col3 ) as a payload to my searchable fields ( col1 and col2 ). I can probably extend the DefaultSImilarity's scorePayload()

Re: Lucene 3.0 Search Performance Stats

2010-03-22 Thread Michael McCandless
Looks like the bulk of your RAM usage is from the 370K index terms in your terms dict... The flex branch (once it lands) should substantially reduce that... Mike On Mon, Mar 22, 2010 at 8:35 AM, Jamie wrote: > Hi Everyone > > The stats I sent through earlier were erroneous due to fact the date

Re: Lucene 3.0 Search Performance Stats

2010-03-22 Thread Jamie
Hi Everyone The stats I sent through earlier were erroneous due to fact the date range query selected fewer records than stated. The correct stats are: Lucene 3.0 Stats: Search conducted using Lucene's Realtime search feature (writer.getReader() for each search) Analyzer: Russian Analyzer

about lucene doc id recycle

2010-03-22 Thread luocanrao
Total document number is not very big, but update is very frequency. So I wonder whether the doc id is growing bigger and bigger and never getting smaller. Do lucene has some technique recycling doc id?? Ps: I never call optimize method.