date:20060613

Re: Getting count on distinct values of a field.

2006-06-13 Thread heritrix . lucene

I am sorry for my stupid question. Thanks. :-) Regards, On 6/13/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: : But what if that word is present in other fields also. : does "docFreq " only look into that particular field ?? docFreq tells you the frequency of a term, a term is a field a

Re: Use one or more indexes?

2006-06-13 Thread wu fox

哥们: 这要看你打算如何组织你的索引了.多索引的情况下必须要考虑一个合并的问题 ,比如你要查找全文和标题就必须涉及到两个索引的搜索结果,那么你按照什么来合并呢?还有,自己合并结果是一个愚蠢的想法,你必须让lucene替你合并, 这是由于算法的速度决定的.这是多索引最主要的问题,如何去合并各个分区的结果.如果是单分区 ,当然你可以把所有相关的东西都放在一个document里,搜索是没有问题的,难度在于"更新",lucene是没有更新操作的,他会先删除doc,再重现添加,如果doc里比较复杂的话你就需要重新去做关于这个doc的索引,如果还涉及到抽取全文, 这个过程需要的时间可就大发了.比如用

Re: Detecting index existance

2006-06-13 Thread Eduardo S. Cordeiro

Hi, Kent's suggestion worked (in fact, I had looked for such a method in other classes of the API -- forgot to look in IndexReader). It works just as expected :) Thanks again On 6/13/06, Erick Erickson <[EMAIL PROTECTED]> wrote: Well, I just tried it (opening an IndexSearcher) and got this ex

Use one or more indexes?

2006-06-13 Thread Liao Xuefeng

hi, I'm new to lucene. Now I want to add full-text search for my website to search articles, images and bbs topics. I'm not sure to use only one index to search all types of these, or create 3 indexes for each of type. If I use only one index, do I have to add a 'type' field to identify document

Re: Detecting index existance

2006-06-13 Thread Erick Erickson

Well, I just tried it (opening an IndexSearcher) and got this exception... java.io.FileNotFoundException: C:\blank\segments (The system cannot find the file specified) The directory c:\blank exists, but is empty. So, it seems you can just catch the exception and infer that your admin users aren'

Re: Detecting index existance

2006-06-13 Thread kent.fitch

Try IndexReader static method indexExists: http://lucene.apache.org/java/docs/api/org/apache/lucene/index/IndexReader.html#indexExists(java.lang.String) Kent Fitch - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional comm

Detecting index existance

2006-06-13 Thread Eduardo S. Cordeiro

Hi there, I'm just starting up with Lucene after reading bits and pieces from Gospodnetic and Hatcher's "Lucene in Action" (and noticing the API has changed for 2.0.0). My question is this: is there a way to detect whether or not the index exists? I'm currently developing a web application that

Re: JVM Crash

2006-06-13 Thread kieran

It may well be to do with this Hotspot bug: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6407471 Note, the bug only appears when you invoke java with the "-server" command line option. Kieran Dan Armbrust wrote: Ross Rankin wrote: We keep getting JVM crashes on 1.4.3. I found in t

Re: JVM Crash

2006-06-13 Thread Dan Armbrust

Ross Rankin wrote: We keep getting JVM crashes on 1.4.3. I found in the archive that setting a JVM parameter solved the problem for a few users. We've tried that and it has not worked. Here's our JVM parameters: Why not try a new JVM? Either a newer sun... or a JDK, or a blackdown... In o

Re: Count occurrences of worths within a corpus.

2006-06-13 Thread Grant Ingersoll

Hi Sergi, Take a look at TermEnum and TermDocs in the API. You will have to iterate over these, summing as you go. You could also, during indexing, store these counts external to Lucene as you come across the term during the Analysis phase. Sergi Fernandez wrote: Hi there, I'm new in Luc

Lucene usage

2006-06-13 Thread Leandro Saad

Hi all. I'm writting a wrapper component around Lucene (using Avalon) and I'd like to know the common api usage. How should I bootstrap the index? Should I create the IndexSearcher when I initialize the component? For how long should I let the IndexWriter open? For one document: should I create

Count occurrences of worths within a corpus.

2006-06-13 Thread Sergi Fernandez

Hi there, I'm new in Lucene, and I just know to index a corpus, and run a query. I thought I can count the times that a word appears in the whole corpus with a simple query, but it seems to be not so easy. Somebody knows how to do it? Many Thanks! Sergi Fernandez.

Re: JVM Crash

2006-06-13 Thread Bob Carpenter

Java apps shouldn't throw these kind of seg faults. Sounds like a problem with memory. Especially if you can't reproduce the error in the same location. Double especially if you have the same problems elsewhere under heavy memory load. I had all kinds of problems with seg faults in the JVM unt

Re: question with spellchecker

2006-06-13 Thread Bob Carpenter

Very nice idea. This is the basis of most of the work on word-sense-disambiguation (e.g. is it "run" as in baseball, "run" as in stock, or "run" as in stocking? or is "John Smith" CEO of GM or "John Smith" lover of Pocahantas?). TF/IDF's not a bad way to compute this, either, though there are d

Re: JVM Crash

2006-06-13 Thread N Hira

We had a similar problem. We discovered that it was basically that eden/from was out of memory and made two changes and that seems to have helped: 1. Reduce [Max]PermSize to 128M 2. Use the concurrent garbage collector Good luck. -h --- Ross Rankin <[EMAIL PROTECTED]> wrote: > We keep gettin

JVM Crash

2006-06-13 Thread Ross Rankin

We keep getting JVM crashes on 1.4.3. I found in the archive that setting a JVM parameter solved the problem for a few users. We've tried that and it has not worked. Here's our JVM parameters: -Xms512m -Xmx1024m -XX:PermSize=256m We're running Tomcat 5.5.16. Any Idea? If it's an

Re: Document design and analyzer questions?

2006-06-13 Thread Chris Hostetter

: I will have millions of entries in my index. Would storing them cause : any performance issues? only testing will tell ... but generally speaking i don't think stored affect query performance very much -- just disk usage. : >another important thing you should consider is field norms: they don

Re: Document design and analyzer questions?

2006-06-13 Thread Michael J. Prichard

Hey Chris, Thanks for the response. Chris Hostetter wrote: : Question is two fold. One, here is the layout I was thinking: my rule of thumb: if a field is going to contain less then a few dozen bytes (ie: a date, an email address, etc) you might as well store it ... it will make your life ea

RE: Using more than one index

2006-06-13 Thread Chris Hostetter

: A document (in our case an xml that has many metadata) can have more : than one date, each date with 2 attributes: : 00-00-1886 : : In the date index I have for every in the input xml a document : with fields: type (document |other), date, art (birthday | deportation | : death...). For example

Re: Getting count on distinct values of a field.

2006-06-13 Thread Chris Hostetter

: But what if that word is present in other fields also. : does "docFreq " only look into that particular field ?? docFreq tells you the frequency of a term, a term is a field and a value -- if you want the counts of a value across multiple fields, you'll have to add them up yourself. (or make a

How to use Query and TermQuery in a single file

2006-06-13 Thread Ramesh Salla

Hi, I am new to Lucene but feel quite comfortable using the API. I retrieve the Meta tags and the body from HTML files and their respective Title and Description from the database and then index documents. I use Query class to parse the search query. I get the results and I display the Title an

RE: Using more than one index

2006-06-13 Thread Mile Rosu

Hi Hoss, Thanks for your quick answer. One of the problems left with the date is this: A document (in our case an xml that has many metadata) can have more than one date, each date with 2 attributes: Eg: 00-00-1886 In the date index I have for every in the input xml a document with fields: t

RE: about PrefixQuery Matching

2006-06-13 Thread Mordo, Aviran (EXP N-NANNATEK)

The query should be test* The brackets will be eliminated by the analyzer Aviran http://www.aviransplace.com -Original Message- From: Flik Shen [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 13, 2006 6:07 AM To: java-user@lucene.apache.org Subject: about PrefixQuery Matching When I s

RE: How can I tell Lucene to also use analyzer for Keyword fields

2006-06-13 Thread Ramana Jelda

Thanks for your replies. > -Original Message- > From: Chris Hostetter [mailto:[EMAIL PROTECTED] > Sent: Tuesday, June 13, 2006 9:13 AM > To: java-user@lucene.apache.org > Subject: Re: How can I tell Lucene to also use analyzer for > Keyword fields > > > : It seems anaylzers are never

Re: question with spellchecker

2006-06-13 Thread mark harwood

For those with the luxury of a large store of historical queries it's interesting to note Google's approach to this. Not some fancy spell checker - just mining searcher behaviour patterns. Google's Bosworth describes this approach approx 13 minutes into this podcast: http://www.itconversations.c

about PrefixQuery Matching

2006-06-13 Thread Flik Shen

When I study PrefixQuery, I found a problem. For example search string: test(*) This could match testX, testX...X, but not to match test only. Is it real problem? CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended s

Index (speed) optimization

2006-06-13 Thread Trieschnigg, R.B. \(Dolf\)

Hi, I just looked at the log of my indexing program and saw that after adding 4.5 million documents (16 Gb of text) to a newly created index, it took 7 hours (!) to carry out the optimization (indexWriter.optimize()). I am running the indexing program on a (3.2Ghz, 1Gb RAM) desktop computer wit

Re: Getting count on distinct values of a field.

2006-06-13 Thread heritrix . lucene

But what if that word is present in other fields also. does "docFreq " only look into that particular field ?? On 6/13/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: Look at the TermEnum class... iterate over the terms in your field, and docFreq is the number of docs with that term. : Date:

Re: How can I tell Lucene to also use analyzer for Keyword fields

2006-06-13 Thread Chris Hostetter

: It seems anaylzers are never get called for UnTokenized fields(Seems no luck : either using PerFieldAnalyzer). The label "tokenized" is somewhat missleading .. it assumes that your analyzer will do some tokenizing (which it doesn't have to do in the case of the KeywordAnalyzer). The best thing

Re: Getting count on distinct values of a field.

Re: Use one or more indexes?

Re: Detecting index existance

Use one or more indexes?

Re: Detecting index existance

Re: Detecting index existance

Detecting index existance

Re: JVM Crash

Re: JVM Crash

Re: Count occurrences of worths within a corpus.

Lucene usage

Count occurrences of worths within a corpus.

Re: JVM Crash

Re: question with spellchecker

Re: JVM Crash

JVM Crash

Re: Document design and analyzer questions?

Re: Document design and analyzer questions?

RE: Using more than one index

Re: Getting count on distinct values of a field.

How to use Query and TermQuery in a single file

RE: Using more than one index

RE: about PrefixQuery Matching

RE: How can I tell Lucene to also use analyzer for Keyword fields

Re: question with spellchecker

about PrefixQuery Matching

Index (speed) optimization

Re: Getting count on distinct values of a field.

Re: How can I tell Lucene to also use analyzer for Keyword fields

29 matches

Site Navigation

Mail list logo

Footer information