Re: Does Index have a Tokenizer Built into it

2007-07-13 Thread John Paul Sondag
Ard, I do have access to the URL's of the documents, but because I will be making short snippets for many pages (suppose it had about 20 hits per page and I need to make Snippets for each of them) I was worried it would be inefficient to open each "hit" tokenize it and then make the Snippet, of c

Re: Standard Analyzer Escapes

2007-07-13 Thread Mark Miller
This is certainly the case. StandardAnalyzer has a regex matcher that looks for a possible company name involving an & or an @. The QueryParser is escaping the '&' -- all of the affects described are standard results of using the StandardAnalzyer. Any double '&&' will break text, but 'sdfdf&dfs

Re: User Defined Matcher

2007-07-13 Thread Mathieu Lecarme
You creates your own class wich extends SynonymTokenFilter. You pipe it with your Analyzer. M. Le 13 juil. 07 à 19:09, Mohsen Saboorian a écrit : Thanks for quick replies. Mathieu Lecarme wrote: Uses synonyms in the query parser Sorry, I didn't get the point of synonym. Can you explain

Re: User Defined Matcher

2007-07-13 Thread Mohsen Saboorian
Thanks for quick replies. Mathieu Lecarme wrote: > > Uses synonyms in the query parser > Sorry, I didn't get the point of synonym. Can you explain more? -- View this message in context: http://www.nabble.com/User-Defined-Matcher-tf4075337.html#a11583698 Sent from the Lucene - Java Users mai

Token offset values for custom Tokenizer

2007-07-13 Thread Shahan Khatchadourian
Hi, I am storing custom values in the Tokens provided by a Tokenizer but when retrieving them from the index the values don't match. I've looked in the LIA book but it's not current since it mentioned term vectors aren't stored. I'm using Lucene Nightly 146 but the same thing has happened with

Re: User Defined Matcher

2007-07-13 Thread Mathieu Lecarme
Mohsen Saboorian a écrit : > Is it possible that I inject my own matching mechanism into Lucene > IndexSearcher? In other words, is this possible that my own method be called > in order to collect results (hits)? Suppose the case that I want to match - > for example - "foo" with both "foo" and "oof

Re: User Defined Matcher

2007-07-13 Thread Yonik Seeley
On 7/13/07, Mohsen Saboorian <[EMAIL PROTECTED]> wrote: Is it possible that I inject my own matching mechanism into Lucene IndexSearcher? In other words, is this possible that my own method be called in order to collect results (hits)? Suppose the case that I want to match - for example - "foo" w

User Defined Matcher

2007-07-13 Thread Mohsen Saboorian
Is it possible that I inject my own matching mechanism into Lucene IndexSearcher? In other words, is this possible that my own method be called in order to collect results (hits)? Suppose the case that I want to match - for example - "foo" with both "foo" and "oof". Thanks in advance, Mohsen. --

Re: Standard Analyzer Escapes

2007-07-13 Thread Yonik Seeley
I just tried some things fast via the Solr admin interface, and everything seems fine. I think you are probably confusing what the parser does vs what the analyzer does. Try your tests with an un-tokenized field to remove that effect. -Yonik On 7/13/07, Walt Stoneburner <[EMAIL PROTECTED]> wrote

Re: Customizing Stop Word List?

2007-07-13 Thread Michael Barbarelli
Please disregard previous request for assistance. I've fixed the bug I was struggling with and it actually had nothing to do with the analyzer in question. Thanks very much. On 7/13/07, Michael Barbarelli <[EMAIL PROTECTED]> wrote: Here's the sample code. Incidentally, this is in C#. I am us

Standard Analyzer Escapes

2007-07-13 Thread Walt Stoneburner
In reading the documentation for escape characters, I'm having a little trouble understanding what it wants me to do for certain special cases. http://lucene.apache.org/java/docs/queryparsersyntax.html#Escaping%20Special%20Characters says: "Lucene supports escaping special characters that are par

Re: Customizing Stop Word List?

2007-07-13 Thread Michael Barbarelli
Here's the sample code. Incidentally, this is in C#. I am using Lucene.NET, but I am assuming this problem could be universal to all versions and that this is a question that is best exposed to the collective wisdom of the Java user group. default list of ISO country codes. * public string[] DEF

Re: How to reflect index changes to search automatically

2007-07-13 Thread Sonu SR
Thanks Ard. I think option 2 is good. I will try this. On 7/13/07, Ard Schrijvers <[EMAIL PROTECTED]> wrote: The SearchClient is obviously not aware of a changing index, so doesn't know when it has to be reopened. You can at least do the following: 1) you periodically check for the index fold

RE: Does Index have a Tokenizer Built into it

2007-07-13 Thread Ard Schrijvers
Hello, > I'm wondering if after > opening the > index I can retrieve the Tokens (not the terms) of a > document, something > akin to IndexReader.Document(n).getTokenizer(). It is obviously not possible to get the original tokens of the document back when you haven't stored the document, becaus

RE: How to reflect index changes to search automatically

2007-07-13 Thread Ard Schrijvers
The SearchClient is obviously not aware of a changing index, so doesn't know when it has to be reopened. You can at least do the following: 1) you periodically check for the index folder wether its timestamp did change (or if this stays the same, do it with the files in it) --> if changed, reo