Re: Field Question

2008-08-22 Thread Michael McCandless
Actually, Field.NO_NORMS means Field.UN_TOKENIZED plus Field.setOmitNorms(true). Mike John Griffin wrote: Dimitri, Field.TOKENIZED and Field.NO_NORMs send their field's contents through a tokenizer and make their contents indexed and therefore searchable. FIELD.UN_TOKENIZED does n

Clarification about segments

2008-08-22 Thread David Lee
So from what I understand, is it true that if mergeFactor is 10, then when I index my first 9 documents, I have 9 separate segments, each containing 1 document? And when searching, it will search through every segment? Thanks! David

RE: Field Question

2008-08-22 Thread John Griffin
Dimitri, Field.TOKENIZED and Field.NO_NORMs send their field's contents through a tokenizer and make their contents indexed and therefore searchable. FIELD.UN_TOKENIZED does not send its field's contents through a tokenizer but it still indexes its contents. Only Field.NO does not index it

Re: How do TeeTokenizer and SinkTokenizer work?

2008-08-22 Thread Grant Ingersoll
On Aug 22, 2008, at 3:47 PM, Teruhiko Kurosaka wrote: Hello, I'm interested in knowing how these tokenizers work together. The API doc for TeeTokenizer http://lucene.apache.org/java/2_3_1/api/org/apache/lucene/analysis/TeeTokenFilter.html has this sample code: SinkTokenizer sink1 = new SinkTok

RE: Lucene Indexing DB records?

2008-08-22 Thread John Griffin
Try Hibernate Search - http://www.hibernate.org/410.html John G. -Original Message- From: ??? [mailto:[EMAIL PROTECTED] Sent: Friday, August 22, 2008 3:27 AM To: java-user@lucene.apache.org Subject: Lucene Indexing DB records? Guess I don't quite understand why there are so few posts ab

How do TeeTokenizer and SinkTokenizer work?

2008-08-22 Thread Teruhiko Kurosaka
Hello, I'm interested in knowing how these tokenizers work together. The API doc for TeeTokenizer http://lucene.apache.org/java/2_3_1/api/org/apache/lucene/analysis/TeeTokenFilter.html has this sample code: SinkTokenizer sink1 = new SinkTokenizer(null); SinkTokenizer sink2 = new SinkTokenizer(null

Re: Score Boosting

2008-08-22 Thread Grant Ingersoll
Normalization is done on a field by field basis, as is most scoring. It doesn't factor all fields in, b/c someone might not be querying all fields. The field it does use is based on the query. On Aug 18, 2008, at 10:44 PM, blazingwolf7 wrote: Hi, I am currently working on the calculatio

Re: Lucene Indexing DB records?

2008-08-22 Thread Marcelo Ochoa
> Actually there are many projects for Lucene + Database. Here is a list I > know: > > * Hibernate Search > * Compass, (also Hibernate + Lucene) > * Solr + DataImportHandler (Searching + Crawler) > * DBSight, (Specific for database, closed source, but very customizable, > easy to setup) > * Browse

Re: SnowballAnalyzer question

2008-08-22 Thread Chris Hostetter
: I am using the SnowballAnalyzer because of it's multi-language stemming : capabilities - and am very happy with that. : There is one small glitch which I'm hoping to overcome - can I get it to split : up internet domain names in the same way that StopAnalyzer does? 90% of the Lucene Analyzers

Re: Lucene Indexing DB records?

2008-08-22 Thread Chris Lu
Actually there are many projects for Lucene + Database. Here is a list I know: * Hibernate Search * Compass, (also Hibernate + Lucene) * Solr + DataImportHandler (Searching + Crawler) * DBSight, (Specific for database, closed source, but very customizable, easy to setup) * Browse Engine -- Chris

Field Question

2008-08-22 Thread DimitriD
I am new to lucene. Here is my question. The document has fields. When I add a field to the document I can specify that field is Indexed, Tokenized, etc.. So the same field can be Tokenized in one document and be not-tokenized in another document. However the is a method IndexReader.getFieldNames(

Re: question about modifying indexed Documents

2008-08-22 Thread Erick Erickson
Not that I know of. But if you're storing Lucene doc IDs as part of existing search results, you're playing with fire anyway. Unless there's a compelling reason to avoid it, you're usually better off storing your own unique doc ID in a different field and using that because you can guarantee that i

Re: Re: question about modifying indexed Documents

2008-08-22 Thread tom
AUTOMATIC REPLY Tom Roberts is out of the office till 2nd September 2008. LUX reopens on 1st September 2008 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: question about modifying indexed Documents

2008-08-22 Thread tom
AUTOMATIC REPLY Tom Roberts is out of the office till 2nd September 2008. LUX reopens on 1st September 2008 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

question about modifying indexed Documents

2008-08-22 Thread Carlos del Cacho
Hello, I'd like to modify a Field in an already indexed Document. The only way so far that I have found is to delete the document through an IndexReader and add a new one with an IndexWriter. This has the undesirable property that it alters existing search results for a given keyword. Is there a b

RE: Case Sensitivity

2008-08-22 Thread Dino Korah
That is very clever. With that, the text we index will get through the analyser, but will not get tokenized. Will hit the analyser the same way when we search, again untokenized. Brilliant!! -Original Message- From: Andre Rubin [mailto:[EMAIL PROTECTED] Sent: 21 August 2008 08:21 To: ja

Re: Lucene Indexing DB records?

2008-08-22 Thread Shalin Shekhar Mangar
You might also want to look at Solr and DataImportHandler. http://lucene.apache.org/solr http://wiki.apache.org/solr/DataImportHandler On Fri, Aug 22, 2008 at 2:56 PM, ??? <[EMAIL PROTECTED]> wrote: > Guess I don't quite understand why there are so few posts about Lucene > indexing DB records. S

Lucene Indexing DB records?

2008-08-22 Thread ???
Guess I don't quite understand why there are so few posts about Lucene indexing DB records. Searched Markmail, but most of the Lucene+DB posts have to do with lucene index management. The only thing I found so far is the following, if you have a minute or two: http://kalanir.blogspot.com/2008/06