Spell checking and keyword tokenizer

2010-09-14 Thread Glen Stampoultzis
Hi, I'm trying to spell check a whole field using a lowercasing keyword tokenizer [1]. for example if I query for furntree gully I'm hoping to get back ferntree gully as a suggestion. Unfortunately the spell checker seems to be recognizing this as two tokens and returning suggestions for both.

Re: Spell checking and keyword tokenizer

2010-09-14 Thread Glen Stampoultzis
Nevermind this one... With a bit more research I discovered I can use spellcheck.q to provide the correct suggestion. On 14 September 2010 16:02, Glen Stampoultzis gst...@gmail.com wrote: Hi, I'm trying to spell check a whole field using a lowercasing keyword tokenizer [1]. for example if I

Re: Tuning Solr caches with high commit rates (NRT)

2010-09-14 Thread Peter Karich
Hi Peter, this scenario would be really great for us - I didn't know that this is possible and works, so: thanks! At the moment we are doing similar with replicating to the readonly instance but the replication is somewhat lengthy and resource-intensive at this datavolume ;-) Regards, Peter.

A question on WordDelimiterFilterFactory

2010-09-14 Thread yandong yao
Hi Guys, I encountered a problem when enabling WordDelimiterFilterFactory for both index and query (pasted relative part of schema.xml at the bottom of email). *1. Steps to reproduce:* 1.1 The indexed sample document contains only one sentence: This is a TechNote. 1.2 Query is:

Re: A question on WordDelimiterFilterFactory

2010-09-14 Thread Erick Erickson
Really well done problem statement by the way On Tue, Sep 14, 2010 at 5:40 AM, yandong yao yydz...@gmail.com wrote: Hi Guys, I encountered a problem when enabling WordDelimiterFilterFactory for both index and query (pasted relative part of schema.xml at the bottom of email). *1. Steps

order of analyzers, tokeinizers and filters

2010-09-14 Thread Markus.Rietzler
hi, it's the second time i am stumble across some strange behaviour: in my schema.xml i have defined fieldType name=textspell class=solr.TextField positionIncrementGap=100 analyzer type=index !-- sg324 inkl. HTMLStrip... -- charFilter

Re: order of analyzers, tokeinizers and filters

2010-09-14 Thread Rafał Kuć
Hello! Tokenizer is executed before filters, because tokenizer is generating tokens and than filters operate on them. hi, it's the second time i am stumble across some strange behaviour: in my schema.xml i have defined fieldType name=textspell class=solr.TextField

How to install DuplicatesDetectorService

2010-09-14 Thread hellboy
I found http://www.jarvana.com/jarvana/browse/org/ow2/weblab/service/solr-duplicates-detector/2.0/ Is anybody knows, hot to install ans use this lib on existing Solr instance? -- View this message in context:

Re: How to install DuplicatesDetectorService

2010-09-14 Thread Erick Erickson
Why do you want to? Perhaps there's a better solution for your underlying problem if you'd explain shat it is... Best Erick On Tue, Sep 14, 2010 at 8:05 AM, hellboy pbon...@googlemail.com wrote: I found

RE: order of analyzers, tokeinizers and filters

2010-09-14 Thread Jonathan Rochkind
CharFilters go before Tokenizers which go before (token) Filters. Token filters (called just filter in the config) operate on tokens, so need to go after the tokenizer. WhitespaceTokenizer is a tokenizer. PatternReplaceFilterFactory is a token filter. What you probably want instead is

possible bug in zookeeper / solrCloud ?

2010-09-14 Thread Yatir Ben Shlomo
Hi I am using solrCloud which uses an ensemble of 3 zookeeper instances. I am performing survivability tests: Taking one of the zookeeper instances down I would expect the client to use a different zookeeper server instance. But as you can see in the below logs attached Depending on which

Re: A question on WordDelimiterFilterFactory

2010-09-14 Thread yandong yao
Hi Robert, I am using solr 1.4, will try with 1.4.1 tomorrow. Thanks very much! Regards, Yandong Yao 2010/9/14 Robert Muir rcm...@gmail.com did you index with solr 1.4 (or are you using solr 1.4) ? at a quick glance, it looks like it might be this:

Re: Swapping cores with SolrJ

2010-09-14 Thread MitchK
Hi Shaun, I think it is more easy to fix this problem, if we got more information about what is going on in your application. Please, could you provide the CoreAdminResponse returned by car.process() for us? Kind regards, - Mitch -- View this message in context:

Re: Swapping cores with SolrJ

2010-09-14 Thread Shaun Campbell
Hi Mitch Thanks for responding. Not actually sure what you wanted from CoreAdminResponse but I put the following in: CoreAdminRequest car = new CoreAdminRequest(); car.setCoreName(live); car.setOtherCoreName(rebuild);

Returning max value of fields within documents

2010-09-14 Thread Kura
Hey guys, Is there a way of doing the following: We want to get the highest value from a list of multiple fields within a document. Example below: max(field1,field2,field3,field4) The values are as follow: field1 = 100 field2 = 300 field3 = 250 field4 = not indexed in document (null) The

Re: PatternReplaceCharFilterFactory?

2010-09-14 Thread Jonathan Rochkind
Shawn Heisey wrote: The one called PatternReplaceFilterFactory (no Char) has been around forever. It is not mentioned on the Wiki page about analyzers. The one called PatternReplaceCharFilterFactory is only available from svn. This seems to be true, which I hadn't realized either. The

Re: Returning max value of fields within documents

2010-09-14 Thread Jonathan Rochkind
The stats component will give you the maximum value within one field: http://wiki.apache.org/solr/StatsComponent You're going to have to compute the max amongst several fields client-side, having StatsComponent return the max for each field, and then just max-ing them client side. Not hard.

Re: Returning max value of fields within documents

2010-09-14 Thread Jonathan Rochkind
Oh wait, I misunderstood, you want just the highest value _for one document_, from stored fields, given for each document? StatsComponent won't help you there. Either do it client side, or do it at index time in a single stored field, that's it. Maybe there's some confusing way to use a

Solr 1.4.1 and field collapsing

2010-09-14 Thread Moazzam Khan
Hey guys, Has anyone successfully compiled and used Field Collapsing patch (236) with Solr 1.4.1? I keep getting this exception when I search: null java.lang.NullPointerException at

LowerCaseTokenizerFactory - Tokenizer Options? Why does it behave this way?

2010-09-14 Thread Scott Gonyea
Hi, I'm tweaking my schema and the LowerCaseTokenizerFactory doesn't create tokens, based solely on lower-casing characters. Is there a way to tell it NOT to drop non-characters? It's amazingly frustrating that the TokenizerFactory and the FilterFactory have two entirely different modes of

Solr Rolling Log Files

2010-09-14 Thread Vladimir Sutskever
Can SOLR be configured out of the box to handle rolling log files? Kind regards, Vladimir Sutskever Investment Bank - Technology JPMorgan Chase, Inc. Tel: (212) 552.5097 This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or

Geographic clustering

2010-09-14 Thread Charlie DeTar
Hi, I'm interested in using geographic clustering of records in a Solr search index. Specifically, I want to be able to efficiently produce a map with clustered bubbles that represent the number of documents that are indexed with points in that general area. I'd like to combine this with other

Re: LowerCaseTokenizerFactory - Tokenizer Options? Why does it behave this way?

2010-09-14 Thread Robert Muir
On Tue, Sep 14, 2010 at 1:54 PM, Scott Gonyea sc...@aitrus.org wrote: Hi, I'm tweaking my schema and the LowerCaseTokenizerFactory doesn't create tokens, based solely on lower-casing characters. Is there a way to tell it NOT to drop non-characters? It's amazingly frustrating that the

Facet Field Value truncation

2010-09-14 Thread Niall O'Connor
Hi, Has anyone come across a situation where they have seen their facet field values wrap into a new facet entry when the value exceeds 256 characters? For example: lst name=facet_fields lst name=pub_articletitle int name=12302/int int name=hiv1403/int int name=type1382/int /lst lst

Re: Facet Field Value truncation

2010-09-14 Thread Jonathan Rochkind
Faceting on a multi-value field? I wonder if your positionIncrementGap for your field definition in your schema is 256. I am not sure what it defaults to. But it seems possible if it's 256 it could lead to what you observed. Try explicitly defining it to be really really big maybe? I'm not

Re: Facet Field Value truncation

2010-09-14 Thread Yonik Seeley
On Tue, Sep 14, 2010 at 3:35 PM, Niall O'Connor ocon...@jimmy.harvard.edu wrote: Has anyone come across a situation where they have seen their facet field values wrap into a new facet entry when the value exceeds 256 characters? Yes, for indexed string fields, there currently is a limit of 256

RE: Field names

2010-09-14 Thread Peter A. Kirk
From: Simon Willnauer [simon.willna...@googlemail.com] Sent: Tuesday, 14 September 2010 17:47 To: solr-user@lucene.apache.org Subject: Re: Field names On Tue, Sep 14, 2010 at 1:39 AM, Peter A. Kirk p...@alpha-solutions.dk wrote: result name=response numFound=9 start=0 doc So it only finds 9?

Re: PatternReplaceCharFilterFactory?

2010-09-14 Thread Erick Erickson
Hmmm, were you logged in on the Wiki? If not, you can create a login pretty easily... Or someone might pick it up.. Erick On Tue, Sep 14, 2010 at 12:18 PM, Jonathan Rochkind rochk...@jhu.eduwrote: Shawn Heisey wrote: The one called PatternReplaceFilterFactory (no Char) has been around

Re: Solr Rolling Log Files

2010-09-14 Thread Erick Erickson
What does handle mean? Create them or index them? Erick On Tue, Sep 14, 2010 at 2:02 PM, Vladimir Sutskever vladimir.sutske...@jpmorgan.com wrote: Can SOLR be configured out of the box to handle rolling log files? Kind regards, Vladimir Sutskever Investment Bank - Technology JPMorgan

Re: solr.DateField: org.apache.solr.common.SolrException: Error while creating field

2010-09-14 Thread Yonik Seeley
On Tue, Sep 14, 2010 at 4:54 PM, h00kpub...@gmail.com h00kpub...@googlemail.com wrote: SEVERE: org.apache.solr.common.SolrException: Error while creating field 'metadata_last_modified{type=date,properties=indexed,stored,omitNorms}' from value '2010-09-14T22:29:24+0200' Different timezones are

Re: Facet Field Value truncation

2010-09-14 Thread Niall O'Connor
I opened a bug for this issue: https://issues.apache.org/jira/browse/SOLR-2120 On 09/14/2010 03:51 PM, Yonik Seeley wrote: On Tue, Sep 14, 2010 at 3:35 PM, Niall O'Connor ocon...@jimmy.harvard.edu wrote: Has anyone come across a situation where they have seen their facet field values wrap

RE: Re: solr.DateField: org.apache.solr.common.SolrException: Error while creating field

2010-09-14 Thread Markus Jelsma
It would be a nice feature if Solr supports queries with time zone support on an index where all times are UTC. There is some chatter about this in SOLR-750 but i haven't found an issue that would add support for time zone queries.   Did i do a lousy search or is the issue missing as of yet?  

Re: LowerCaseTokenizerFactory - Tokenizer Options? Why does it behave this way?

2010-09-14 Thread Scott Gonyea
I went for a different route: https://issues.apache.org/jira/browse/LUCENE-2644 Scott On Tue, Sep 14, 2010 at 11:18 AM, Robert Muir rcm...@gmail.com wrote: On Tue, Sep 14, 2010 at 1:54 PM, Scott Gonyea sc...@aitrus.org wrote: Hi, I'm tweaking my schema and the LowerCaseTokenizerFactory

Re: solr.DateField: org.apache.solr.common.SolrException: Error while creating field

2010-09-14 Thread Erick Erickson
If you're using Javas SimpleDateFormat, try enclosing your Z in the format string with single quotes, like: SimpleDateFormat sdf = new SimpleDateFormat(-MM-dd'T'HH:mm:ss'Z'); HTH Erick On Tue, Sep 14, 2010 at 4:54 PM, h00kpub...@gmail.com h00kpub...@googlemail.com wrote: hi... i am using

Re: LowerCaseTokenizerFactory - Tokenizer Options? Why does it behave this way?

2010-09-14 Thread Jonathan Rochkind
Why would you want to do that, instead of just using another tokenizer and a lowercasefilter? It's more confusing less DRY code to leave them separate -- the LowerCaseTokenizerFactory combines anyway because someone decided it was such a common use case that it was worth it for the

Re: LowerCaseTokenizerFactory - Tokenizer Options? Why does it behave this way?

2010-09-14 Thread Robert Muir
Jonathan, you bring up an excellent point. I think its worth our time to actually benchmark this LowerCaseTokenizer versus LetterTokenizer + LowerCaseFilter This tokenizer is quite old, and although I can understand there is no doubt its technically faster than LetterTokenizer + LowerCaseFilter

Shingle filter factory and the min shingles

2010-09-14 Thread Jason Rutherglen
fieldType name=text_shingle4 class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.HTMLStripWhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.ShingleFilterFactory

wildcard searches not consistent

2010-09-14 Thread Rico Lelina
Hi, Still working on extending my proof of concept by working off the example configuration and modifying the schema.xml. Having trouble with wildcard searches: factory OR faction -- 40 results (ok) factory -- 1 result (ok) faction -- 39 results (ok) facti?n -- 39 results (ok) fact* -- 40

Re: LowerCaseTokenizerFactory - Tokenizer Options? Why does it behave this way?

2010-09-14 Thread Scott Gonyea
I'd agree with your point entirely. My attacking LowerCaseTokenizer was a result of not wanting to create yet more Classes. That said, rightfully dumping LowerCaseTokenizer would probably have me creating my own Tokenizer. I could very well be thinking about this wrong... But what if I wanted

Re: Null pointer exception when mixing highlighter shards q.alt

2010-09-14 Thread Chris Hostetter
I didn't see any open Jira issues for this, so i created one... https://issues.apache.org/jira/browse/SOLR-2121 : Date: Tue, 7 Sep 2010 01:35:39 -0700 (PDT) : From: Marc Sturlese marc.sturl...@gmail.com : Reply-To: solr-user@lucene.apache.org : To: solr-user@lucene.apache.org : Subject: Re:

Re: PatternReplaceCharFilterFactory?

2010-09-14 Thread Erick Erickson
K, just making sure. Erick On Tue, Sep 14, 2010 at 5:20 PM, Jonathan Rochkind rochk...@jhu.edu wrote: Erick Erickson wrote: Hmmm, were you logged in on the Wiki? If not, you can create a login pretty easily... Or someone might pick it up.. I was logged in, created an account just

Re: wildcard searches not consistent

2010-09-14 Thread Rico Lelina
That was it! Thank you very much. - Original Message From: Robert Muir rcm...@gmail.com To: solr-user@lucene.apache.org Sent: Tue, September 14, 2010 5:58:03 PM Subject: Re: wildcard searches not consistent but facto?y -- 0 (expecting 1) you have stemming enabled for the field?

Re: Geographic clustering

2010-09-14 Thread Dennis Gearon
You are probably not talking about clusters in the physical structure of data on this disk, right? What do YOU mean by clusters if not? Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at

Re: LowerCaseTokenizerFactory - Tokenizer Options? Why does it behave this way?

2010-09-14 Thread Scott Gonyea
There's a lot of reasons, with the performance hit being notable--but also because I feel that using a regex on something this basic amounts to a lazy hack. I'm typically against regular expressions in XML. I'm vehemently opposed to them in cases where not using them should otherwise be quite

Re: LowerCaseTokenizerFactory - Tokenizer Options? Why does it behave this way?

2010-09-14 Thread Jonathan Rochkind
Because (just IMO, I'm not an expert here either) the basic framework in Solr is that tokenizers tokenize, but they don't generally change bytes inside values. What changes bytes (or adds or removes tokens to the token stream initially created by a tokenizer, etc) is filters. And there's

Re: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out : SingleInstanceLock: write.lock

2010-09-14 Thread Bharat Jain
Thanks Mark for taking time to reply. What else could cause this issue to happen so frequently. We have a master/slave configuration and only one update server that writes to index. We have plenty of disk space available. Thanks Bharat Jain On Fri, Sep 10, 2010 at 8:19 AM, Mark Miller

about SolrCloud

2010-09-14 Thread 郭芸
Dear All: I am studying SolrCloud now,I downloaded it from:https://svn.apache.org/repos/asf/lucene/solr/branches/cloud/ but i found that there no webapps:https://svn.apache.org/repos/asf/lucene/solr/branches/cloud/example/webapps/ but we need

Re: A question on WordDelimiterFilterFactory

2010-09-14 Thread yandong yao
After upgrading to 1.4.1, it is fixed. Thanks very much for your help! Regards, Yandong Yao 2010/9/14 yandong yao yydz...@gmail.com Hi Robert, I am using solr 1.4, will try with 1.4.1 tomorrow. Thanks very much! Regards, Yandong Yao 2010/9/14 Robert Muir rcm...@gmail.com did you

LukeRequestHandler numTerms

2010-09-14 Thread Peter A. Kirk
Hi when using LukeRequestHandler, I can for example call: http://localhost:8983/solr/admin/luke?fl=namefl=cat which will return data including the frequency of the top 10 search terms in the specified fields. I can also add a numTerms parameter to obtain more than the top 10. But how do I

Re: Geographic clustering

2010-09-14 Thread Dennis Gearon
So, basically, faceting geographically? within 100 meters within 300 meters within 1km within 3km within 10km within 100km This type of results? Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at