(Note: cross posted announcement, please confine any replies to solr-user)
Hey folks,
On Wednesday, I'll be doing a Stump The Chump session at Lucene
Revolution EU in Dublin Ireland.
http://lucenerevolution.org/stump-the-chump
If you aren't familiar with Stump The Chump it is a QA style
If your universe of items you want to match this way is small,
consider something akin to synonyms. Your indexing process
emits two tokens, with and without the @ or # which should
cover your situation.
FWIW,
Erick
On Tue, Nov 5, 2013 at 2:40 AM, Stéphane Nicoll
stephane.nic...@gmail.comwrote:
Hi,
Thanks for the reply. It's an index with tweets so any word really is a
target for this. This would mean a significant increase of the index. My
volumes are really small so that shouldn't be a problem (but
performance/scalability is a concern).
I have the control over the query. Another
You have to get the values _into_ the index with the special characters,
that's where the issue is. Depending on your analysis chain special
characters may or may not be even in your index to search in the first
place.
So it's not how many different words are after the special characters as
much
You can specify custom character types with the word delimiter filter, so
you could define @ and # as digit and set SPLIT_ON_NUMERICS. This
would cause @foo to tokenize as two adjacent terms, ditto for #foo.
Unfortunately, A user name or tag that starts with a digit would not
tokenize as
Hello,
I got an index corruption in production, and was wondering if it might be a
known bug (still with Lucene 3.1), or is my code doing something wrong.
It's a local disk index. No known machine power lose. No suppose to even
happen, right?
This index that got corrupted is updated every 30sec;
Currently I'm using StandardTokenizerFactory which tokenizes the words
bases on spaces. For Toy Story it will create tokens toy and story.
Ideally, I would want to extend the functionality ofStandardTokenizerFactory to
create tokens toy, story, and toy story. How do I do that?
How would you expect to recognize that 'Toy Story' is a thing?
On Tue, Nov 5, 2013 at 6:32 PM, Kevin glidekensing...@gmail.com wrote:
Currently I'm using StandardTokenizerFactory which tokenizes the words
bases on spaces. For Toy Story it will create tokens toy and story.
Ideally, I would