Standard Analyzer

2008-08-25 Thread Kalani Ruwanpathirana
Hi, I am using StandardAnalyzer when creating the Lucene index. It indexes the word work as it is but does not index the word wo*rk in that manner. Can I index such words (including * and ?) as it is? Otherwise I have no way to index and search for words like wo*rk, you?, etc. Thanks -- Kalani

Re: Standard Analyzer

2008-08-25 Thread tom
AUTOMATIC REPLY Tom Roberts is out of the office till 2nd September 2008. LUX reopens on 1st September 2008 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Re: Standard Analyzer

2008-08-25 Thread tom
AUTOMATIC REPLY Tom Roberts is out of the office till 2nd September 2008. LUX reopens on 1st September 2008 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Standard Analyzer

2008-08-25 Thread Karl Wettin
25 aug 2008 kl. 09.19 skrev Kalani Ruwanpathirana: Hi, I am using StandardAnalyzer when creating the Lucene index. It indexes the word work as it is but does not index the word wo*rk in that manner. Can I index such words (including * and ?) as it is? Otherwise I have no way to index

Re: Standard Analyzer

2008-08-25 Thread Kalani Ruwanpathirana
Hi, Thanks, I tried WhitespaceAnalyzer too, but it seems case sensitive. If I need to search for words like correct?, html (it escapes , and another few characters too) I need to index those kind of words. On Mon, Aug 25, 2008 at 1:15 PM, Karl Wettin [EMAIL PROTECTED] wrote: 25 aug 2008 kl.

Re: Field Question

2008-08-25 Thread Michael McCandless
I think you meant Field.Index.NO and Field.Index.TOKENIZED, for those two docs. The answer is yes -- Lucene considers the field indexed if ever any doc, even a single doc, had set Index.TOKENIZED or Index.UN_TOKENIZED for that field. However, your document A still will not have been

Re: Standard Analyzer

2008-08-25 Thread Karl Wettin
25 aug 2008 kl. 11.14 skrev Kalani Ruwanpathirana: Hi, Thanks, I tried WhitespaceAnalyzer too, but it seems case sensitive. Then you simply add a LowercaseFilter to the chain in the Analyzer: public final class WhitespaceAnalyzer extends Analyzer { public TokenStream tokenStream(String

How to search

2008-08-25 Thread Venkata Subbarayudu
Hi All, I am new to this Lucene, and I am using this for indexing and searching. Is it possible to search substrings using this, for example if a field holds the value LuceneIndex and if a give the query as Index, I want to get this field also.. is there anyway for this. Thanks in Advance,

Re: How to search

2008-08-25 Thread Anshum
Hi , You could use wildcard queries in that case (In case I got you right). Though because of the way the indexed terms are stored, it would not be advisable to have a *word like query but a word* like would be doable in real world environment. Hope this answers your question. -- Anshum Gupta

Re: How to search

2008-08-25 Thread Venkata Subbarayudu
Hi Anshum Gupta, Thanks for your replay, but when I gone through querySyntax-Document for Lucene, I read that Lucene does not allow queries like *findthis i.e. I think it doesnot allow wildcards in the beginning of the query. is it? Thanks, Venkata Subbarayudu. Anshum-2 wrote: Hi ,

Re: How to search

2008-08-25 Thread Karl Wettin
25 aug 2008 kl. 13.54 skrev Venkata Subbarayudu: Hi All, I am new to this Lucene, and I am using this for indexing and searching. Is it possible to search substrings using this, for example if a field holds the value LuceneIndex and if a give the query as Index, I want to get this

Re: How to search

2008-08-25 Thread Anshum
Yes, and that is the reason why I said, *it would not be advisable to have a *word like query but a word* like would be doable* *word : is a prefix wildcard, which can be done, but its not all that straight, and still would be highly against what I would advise word* : is Doable and ok. Else if

Re: How to search

2008-08-25 Thread Shalin Shekhar Mangar
On Mon, Aug 25, 2008 at 5:37 PM, Karl Wettin [EMAIL PROTECTED] wrote: Is this the specific use case, that you want to handle composite words as in javaFieldAndClassNames? There is no native support for that in Lucene to my knowledge, but it should not be too hard to implement a TokenStream

FilteredQuery

2008-08-25 Thread Heiko
Hi All, i would like to use the FilteredQuery to filter my search results with the occurrence or absence of certain ids. Example A: query - text:albert einstein filterQuery - doctype:letter That's ok. I am getting the expected results. But i got no results, if i filter with the absence of an

MultiPhrase search

2008-08-25 Thread Andre Rubin
Hi all, Let's say that I have in my index the value One Two Three for field 'A'. I'm using a custom analyzer that is described in the forwarded message. My Search query is built like this: QueryParser parser = new QueryParser(LABEL_FIELD, ANALYZER); Query query =

SpanQuery and FilteredQuery

2008-08-25 Thread Christopher M Collins
Hello, Can anyone tell me if it's possible to apply a filter to a SpanQuery and still use query.getSpans(indexReader)? I'm using getSpans to get back the original positions in the text but I would like to filter the results returned by getSpans. I have a Filter I can apply if I just search

Re: FilteredQuery

2008-08-25 Thread Otis Gospodnetic
Heiko, It's most likely because that B case has a purely negative query. Perhaps you can combine it with MatchAllDocs query? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Heiko [EMAIL PROTECTED] To: java-user@lucene.apache.org Sent:

Re: Clarification about segments

2008-08-25 Thread David Lee
ok, thanks. I knew that the documents were buffered in memory until they were flushed, but I thought that in memory, they were still separate documents/segments until they were merged together at the appropriate time (dependent on the mergeFactor). Do you mean that when the IndexWriter flushes

Re: MultiPhrase search

2008-08-25 Thread Daniel Naber
On Montag, 25. August 2008, Andre Rubin wrote: I tried it out but with no luck (I think I did it wrong). In any case, is MultiPhraseQuery what I'm looking for? If it is, how should I use the MultiPhraseQuery class? No, you won't need it. If you know that the field is not really tokenized

Question: Lucene MoreLikeThis score values all the same:

2008-08-25 Thread vinay b
As a test, I tried to compare a few documents on various topics (a few on linux, and another on the U.S. constitution) to a source document on linux using a query formed by MoreLikeThis. 1. Looking at the hits, they have the same score. I'd expect them to be different, based on their relevance to

Re: FilteredQuery

2008-08-25 Thread German Kondolf
Exactly as Otis sais, you should use MatchAllDocs as query, but it has a drawback in performance, it checks every single document deletion state, I've solved the issue by making my own EnhancedMatchAllDocs query that is optimized to do not check this document state. Perhaps the SegmentReader

Re: FilteredQuery

2008-08-25 Thread Otis Gospodnetic
Mike just committed a read-only IndexReader recently. If you pull Lucene out of the svn trunk, you'll be able to make use of that. The r-o IR doesn't have a synchronized isDeleted, I believe. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message

RE: How do TeeTokenizer and SinkTokenizer work?

2008-08-25 Thread Teruhiko Kurosaka
Thank you, Grant and (Koji) Sekiguchi-san. but I don't understand how the input from reader1 and reader2 are mixed together. Will sink1 first reaturn the reader1 text, and reader2? It depends on the order the fields are added. If source1 is used first, then reader1 will be first.

Re: MultiPhrase search

2008-08-25 Thread Andre Rubin
For some reason, the TermQuery is not returning any results, even when querying for a single word (like on*). query = new TermQuery(new Term(LABEL_FIELD, searchString)); On 8/25/08, Daniel Naber [EMAIL PROTECTED] wrote: On Montag, 25. August 2008, Andre Rubin wrote: I tried it out but with

How to get all terms with a special field and document after indexed

2008-08-25 Thread Beijing2008
When a document add to index, fields data will split to many terms and saved into index. Now, How can I get these terms with special field and special document from index. -- View this message in context:

Re: How to search

2008-08-25 Thread Daniel Noll
Venkata Subbarayudu wrote: Hi Anshum Gupta, Thanks for your replay, but when I gone through querySyntax-Document for Lucene, I read that Lucene does not allow queries like *findthis i.e. I think it doesnot allow wildcards in the beginning of the query. It has supported this for some time

Re: How to get all terms with a special field and document after indexed

2008-08-25 Thread Jarvis . Guo
I like your nick name. For the question, I think you must iterate all the terms in index with TermEnum and see if term will satisfy any of your concerns. Best 2008/8/26 Beijing2008 [EMAIL PROTECTED] When a document add to index, fields data will split to many terms and saved into index.

Re: How to get all terms with a special field and document after indexed

2008-08-25 Thread Beijing2008
Very Thanks. But I'm sorry I can not catch what's your meaning. A sentence through Analyzer.TokenStream method and will get a TokenStream result. this TokenStream will save into index with someway, now I'm just to get all token for this input sentence from index. my english is very pool, maybe