Hello,
I have some questions about what kind of behavior is expected when passing
Version.LUCENE_24/29/30 to QueryParser and the StandardAnalyzer when parsing
a query. I know that passing the Version to the constructors make Lucene
act that like version, with all features and bugs intact. The
You could look in to modifying the standard tokenizer lexer code to
handle punctuation (there is a patch in the isssue tracker for the old
javacc grammer to handle punctuation) and there is also the Gate NLP
project which has a fairly nice sentence splitter you might find
useful. Add a
I would like my searches to match John Smith when John Smith is in a
document, but not separated with punctuation. For example, when I was using
StandardAnalyzer, John. Smith was matching, which is wrong for me. Right
now I am using WhitespaceAnalyzer but instead searching for John Smith
John
Phil Whelan phil...@gmail.com wrote:
On Thu, Jul 30, 2009 at 7:12 PM, oh...@cox.net wrote:
I was wonder if there is a list of special characters for the standard
analyzer?
What I mean by special is characters that the analyzer considers break
characters.
For example, if I
I guess that the obvious question is Which characters are
considered 'punctuation characters'?.
Punctuation = (_|-|/|.|,)
In particular, does the analyzer consider = (equal) and
: (colon) to be punctuation characters?
: is special character at QueryParser (if you are using it). If you
Hi Ahmet,
Thanks for the clarification and information! That was exactly what I was
looking for.
Jim
AHMET ARSLAN iori...@yahoo.com wrote:
I guess that the obvious question is Which characters are
considered 'punctuation characters'?.
Punctuation = (_|-|/|.|,)
In
On Fri, Jul 31, 2009 at 5:00 PM, oh...@cox.net wrote:
Hi Ahmet,
Thanks for the clarification and information! That was exactly what I was
looking for.
Jim
AHMET ARSLAN iori...@yahoo.com wrote:
I guess that the obvious question is Which characters are
considered 'punctuation
Hi,
I was wonder if there is a list of special characters for the standard
analyzer?
What I mean by special is characters that the analyzer considers break
characters. For example, if I have something like foo=something, apparently
the analyzer considers this as two terms, foo
On Thu, Jul 30, 2009 at 7:12 PM, oh...@cox.net wrote:
I was wonder if there is a list of special characters for the standard
analyzer?
What I mean by special is characters that the analyzer considers break
characters.
For example, if I have something like foo=something, apparently
Hi,
I am using StandardAnalyzer when creating the Lucene index. It indexes the
word work as it is but does not index the word wo*rk in that manner.
Can I index such words (including * and ?) as it is? Otherwise I have no way
to index and search for words like wo*rk, you?, etc.
Thanks
--
Kalani
AUTOMATIC REPLY
Tom Roberts is out of the office till 2nd September 2008.
LUX reopens on 1st September 2008
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
AUTOMATIC REPLY
Tom Roberts is out of the office till 2nd September 2008.
LUX reopens on 1st September 2008
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
25 aug 2008 kl. 09.19 skrev Kalani Ruwanpathirana:
Hi,
I am using StandardAnalyzer when creating the Lucene index. It
indexes the
word work as it is but does not index the word wo*rk in that
manner.
Can I index such words (including * and ?) as it is? Otherwise I
have no way
to index
Hi,
Thanks, I tried WhitespaceAnalyzer too, but it seems case sensitive.
If I need to search for words like correct?, html (it escapes , and
another few characters too) I need to index those kind of words.
On Mon, Aug 25, 2008 at 1:15 PM, Karl Wettin [EMAIL PROTECTED] wrote:
25 aug 2008 kl.
25 aug 2008 kl. 11.14 skrev Kalani Ruwanpathirana:
Hi,
Thanks, I tried WhitespaceAnalyzer too, but it seems case sensitive.
Then you simply add a LowercaseFilter to the chain in the Analyzer:
public final class WhitespaceAnalyzer extends Analyzer {
public TokenStream tokenStream(String
HI ,
We have been observing the following problem while tokenizing using
lucene's StandardAnalyzer. Tokens that we get is different on different
machines. I am suspecting it has something to do with the Locale settings on
individual machines?
For example
the word 'CÃ(c)sar' is split as
Hi Prashant,
On 04/22/2008 at 2:23 PM, Prashant Malik wrote:
We have been observing the following problem while
tokenizing using lucene's StandardAnalyzer. Tokens that we get is
different on different machines. I am suspecting it has something to do
with the Locale settings on individual
Yes the version of lucene and java are exactly the same on the different
machines.
Infact we unjared lucene and jared it with our jar and are running from the
same nfs mounts on both the machines
Also we have tried with lucene2.2.0 and 2.3.1. with the same result .
also about the actual string u
Hi Prashant,
What is the Unicode code point associated with the 3,4,5 character?
Steve
On 04/22/2008 at 4:45 PM, Prashant Malik wrote:
Yes the version of lucene and java are exactly the same on
the different
machines.
Infact we unjared lucene and jared it with our jar and are
running from
: Yes the version of lucene and java are exactly the same on the different
: machines.
: Infact we unjared lucene and jared it with our jar and are running from the
: same nfs mounts on both the machines
i didn't do an indepth code read, but a quick skim of
StandardTokenizerImpl didn't turn up
In reading the documentation for escape characters, I'm having a
little trouble understanding what it wants me to do for certain
special cases.
http://lucene.apache.org/java/docs/queryparsersyntax.html#Escaping%20Special%20Characters
says: Lucene supports escaping special characters that are
I just tried some things fast via the Solr admin interface, and
everything seems fine.
I think you are probably confusing what the parser does vs what the
analyzer does.
Try your tests with an un-tokenized field to remove that effect.
-Yonik
On 7/13/07, Walt Stoneburner [EMAIL PROTECTED] wrote:
This is certainly the case. StandardAnalyzer has a regex matcher that
looks for a possible company name involving an or an @. The
QueryParser is escaping the '' -- all of the affects described are
standard results of using the StandardAnalzyer. Any double '' will
break text, but 'sdfdfdfsdf'
: Apologies and thanks all at the same time, everyone.
No apologies neccessary, you're not the first person to be confused by
this, which is why I asked if you had any ideas on how we can improve hte
docs -- people who know the APIs inside and out aren't in the best
position to understand how to
I know this is a relatively fundamental thing to arrange, but I'm having
trouble.
Can I instantiate a standard analyzer with an argument containing my own
stop words? If so, how? Will they be appended to or override the built-in
stop words?
Or, do I have to modify the analyzer class itself
Michael Barbarelli wrote:
Can I instantiate a standard analyzer with an argument containing my own
stop words? If so, how? Will they be appended to or override the built-in
stop words?
You can do it with one of the alternate constructors, and they'll
override the build-in list
: Michael Barbarelli wrote:
: Can I instantiate a standard analyzer with an argument containing my own
: stop words? If so, how? Will they be appended to or override the built-in
I'm relly suprised how often this question gets asked ... Michael (or
anyone else for that matter) do you have
: But ParseException extends IOException, so I don't see a problem there.
: I wish my compiler agreed with you:) Which it seems to do until you
: rebuild the files with javacc. I saw at least two other posts about this
: problem on the web with no answer given...
: This guy also found the same
Thank you so much. I apologize for my ignorance.
Mark
On 7/7/06, Chris Hostetter [EMAIL PROTECTED] wrote:
: But ParseException extends IOException, so I don't see a problem
there.
: I wish my compiler agreed with you:) Which it seems to do until you
: rebuild the files with javacc. I saw
, September 26, 2005 5:46 AM
To: java-user@lucene.apache.org
Subject: Problems in standard Analyzer
Hi Mark and other Gurus,
I am indexing one value as a key field (rtf txt indexing) , value is like
12345 or 123-09-34 or it can be like MN12345.
Problem is if the value is like 12345 or 123-23-98
I thought of not using any Analyzer, but the problem is I got other queries
that I am appending to this value with either OR or AND, so for that part of
query I need Standard Analyzer ,
I think I should index that value like normal text, then may be it will
work.
-Original Message
@lucene.apache.org
Subject: RE: Problems in standard Analyzer
I thought of not using any Analyzer, but the problem is I got other queries
that I am appending to this value with either OR or AND, so for that part of
query I need Standard Analyzer ,
I think I should index that value like normal text
To: java-user@lucene.apache.org
Subject: RE: Problems in standard Analyzer
It should be possible to combine queries using different types of
analyzers.
The only problem I can see is if you're using one single line for the
whole
query.
Frank
-Original Message-
From: M å n i s h
]
Sent: Monday, September 26, 2005 3:07 PM
To: java-user@lucene.apache.org
Subject: RE: Problems in standard Analyzer
The problem is that in limo you can only use standard analyzers for your
queries. As you've already seen some of them will change the key value to
something else or even remove them
Hi Mark and other Gurus,
I am indexing one value as a key field (rtf txt indexing) , value is like
12345 or 123-09-34 or it can be like MN12345.
Problem is if the value is like 12345 or 123-23-98 , Standard Analyzer is
able to search it, but if the value is like MN12345 search will not return
35 matches
Mail list logo