Hi Sergey,
Here is the table of tags from http://www.nltk.org/book/ch05.html
Tag Meaning English Examples
ADJ adjective new, good, high, special, big, local
ADP adposition on, of, at, with, by, into, under
ADV adverb really, already, still, early, now
CONJ conjunction and, or, but, if, while,
Hello. My name is Sergeiy, I'm working on Lucene's functionality extension.
As I've read in JavaDoc for "org.apache.lucene.analysis" package, it's
preferably to ask this email before extending, because some features
could be done.
So I want to have opportunity to perform search by parts of
It is (b).
D.
On Fri, Aug 7, 2015 at 3:05 AM, Trejkaz trej...@trypticon.org wrote:
I have recently done updates from Lucene 3.6 to 4.x and 4.x to 5.2.
During this process, I noticed that the FST used by the Japanese
analyser (AKA Kuromoji) was changing between releases. As I fear
breakages
I have recently done updates from Lucene 3.6 to 4.x and 4.x to 5.2.
During this process, I noticed that the FST used by the Japanese
analyser (AKA Kuromoji) was changing between releases. As I fear
breakages in backwards compatibility, I worried that the dictionary
had changed, so I wrote
Hi,
This is what I've tried:
https://gist.github.com/anonymous/7383104
So far so good except that something is definitely wrong in my code as the
synonym is not emitted as a valid token it seems. This is how my indexing
analyzer is built:
private static final class MyIndexAnalyzer extends
Replying to self: silly me. I am obviously creating the array with the
wrong length.
final String term = new String(buffer, 1, length);
should be replaced by
final String term = new String(buffer, 1, length -1);
and the silly trim can go away. I guess I need more coffee.
S.
On Sat, Nov 9,
This is a parts-of-speech analyzer for tweets. It would make your index
far more useful.
http://www.ark.cs.cmu.edu/TweetNLP/
On 11/04/2013 11:40 PM, Stéphane Nicoll wrote:
Hi,
I am building an application that indexes tweet and offer some basic
search facilities on them.
I am trying to find
If your universe of items you want to match this way is small,
consider something akin to synonyms. Your indexing process
emits two tokens, with and without the @ or # which should
cover your situation.
FWIW,
Erick
On Tue, Nov 5, 2013 at 2:40 AM, Stéphane Nicoll
stephane.nic...@gmail.comwrote:
Hi,
Thanks for the reply. It's an index with tweets so any word really is a
target for this. This would mean a significant increase of the index. My
volumes are really small so that shouldn't be a problem (but
performance/scalability is a concern).
I have the control over the query. Another
You have to get the values _into_ the index with the special characters,
that's where the issue is. Depending on your analysis chain special
characters may or may not be even in your index to search in the first
place.
So it's not how many different words are after the special characters as
much
protWords)
See:
http://lucene.apache.org/core/4_5_0/analyzers-common/org/apache/lucene/analysis/miscellaneous/WordDelimiterFilter.html
-- Jack Krupansky
-Original Message-
From: Stéphane Nicoll
Sent: Tuesday, November 05, 2013 2:40 AM
To: java-user@lucene.apache.org
Subject: Twitter analyser
Hi,
I am building an application that indexes tweet and offer some basic
search facilities on them.
I am trying to find a combination where the following would work:
* foo matches the foo word, a mention (@foo) or the hashtag (#foo)
* @foo only matches the mention
* #foo matches only the
(Version.LUCENE_34, tokenStream);
return tokenStream;
}
}
Best,
Anna
-Ursprüngliche Nachricht-
Von: Jamir Shaikh [mailto:shaikhja...@gmail.com]
Gesendet: Samstag, 15. Oktober 2011 02:22
An: java-user@lucene.apache.org
Betreff: Case insensitive Keyword Analyser
-Ursprüngliche Nachricht-
Von: Jamir Shaikh [mailto:shaikhja...@gmail.com]
Gesendet: Samstag, 15. Oktober 2011 02:22
An: java-user@lucene.apache.org
Betreff: Case insensitive Keyword Analyser
Hi Guys,
Use Case: Field: Name
Data: Jose ,
Jose Sam
Grant Ingersoll wrote:
On Mar 17, 2010, at 11:34 AM, Paul Taylor wrote:
Grant Ingersoll wrote:
What's your current chain of TokenFilters? How many exceptions do you expect?
That is, could you enumerate them?
Very few, yes I could enumerate them, but not sure what exactly
What's your current chain of TokenFilters? How many exceptions do you expect?
That is, could you enumerate them?
On Mar 12, 2010, at 5:27 AM, Paul Taylor wrote:
Hi, I'm using a custom analyser based on standardanalyser with good results
to search artists (i.e rolling stones/beatles
Grant Ingersoll wrote:
What's your current chain of TokenFilters? How many exceptions do you expect?
That is, could you enumerate them?
Very few, yes I could enumerate them, but not sure what exactly you are
suggesting, what I was going to do would be add to the charConvertMap
(when I
On Mar 17, 2010, at 11:34 AM, Paul Taylor wrote:
Grant Ingersoll wrote:
What's your current chain of TokenFilters? How many exceptions do you
expect? That is, could you enumerate them?
Very few, yes I could enumerate them, but not sure what exactly you are
suggesting, what I was
Hi, I'm using a custom analyser based on standardanalyser with good
results to search artists (i.e rolling stones/beatles) but it fails to
match some weird artists names such as '!!!', this is not suprising
because the analyser ignores punctuation which is what I want it to
normally. I just
sorry i meant farsi analyser instead of farsi parser.
--
View this message in context:
http://www.nabble.com/farsi-analyser-tf2472949.html#a6895440
Sent from the Lucene - Java Users mailing list archive at Nabble.com
Raghavendra Prabhu wrote:
While Indexing, I use a different Analyser
While searching, I use a simple standard Analyzer
Will this prevent me from getting the same best fragments when i do a search
for two terms say term1 and term2
It depends on the differences, but in general you will always
21 matches
Mail list logo