Re: Spell checking and keyword tokenizer

2010-09-13 Thread Glen Stampoultzis
Nevermind this one... With a bit more research I discovered I can use spellcheck.q to provide the correct suggestion. On 14 September 2010 16:02, Glen Stampoultzis wrote: > Hi, > > I'm trying to spell check a whole field using a lowercasing keyword > tokenizer [1]. > > for example if I query for

Spell checking and keyword tokenizer

2010-09-13 Thread Glen Stampoultzis
Hi, I'm trying to spell check a whole field using a lowercasing keyword tokenizer [1]. for example if I query for "furntree gully" I'm hoping to get back "ferntree gully" as a suggestion. Unfortunately the spell checker seems to be recognizing this as two tokens and returning suggestions for bot

Re: Field names

2010-09-13 Thread Simon Willnauer
On Tue, Sep 14, 2010 at 1:39 AM, Peter A. Kirk wrote: > Fantastic - that is exactly what I was looking for! > > But here is one thing I don't undertstand: > > If I call the url: > http://localhost:8983/solr/admin/luke?numTerms=10&fl=name > > Some of the result looks like: > > >   >     >      18

Our SOLR instance seems to be single-threading and therefore not taking advantage of its multi-proc host

2010-09-13 Thread David Crane
We are running SOLR 1.4.1 (Lucene 2.9.3) on a 2-CPU Linux host, but it seems that only 1 CPU is ever being used. It almost seems like something is single-threading inside the SOLR application. The CPU utilization is very seldom over 0.9 even under load. We are running on virtual Linux hosts and o

geographic sharding . . . or not

2010-09-13 Thread Dennis Gearon
Think about THE big one - google. (First, China for this example is avoided because much Chinese data is ILLEGAL to be provided for search outside of China) If there is data generated by people in Europe, in various languages: 1/ Is it stored close to where it is generated? 2/ Are sharding an

Re: Solr and jvm Garbage Collection tuning

2010-09-13 Thread Stephen Green
On Mon, Sep 13, 2010 at 6:45 PM, Burton-West, Tom wrote: > Thanks Kent for your info. > > We are not doing any faceting, sorting, or much else.  My guess is that most > of the memory increase is just the data structures created when parts of the > frq and prx files get read into memory.  Our frq

Re: Distance sorting with spatial filtering

2010-09-13 Thread Scott K
I tracked down the problem and found a workaround. If there is a wildcard entry in schema.xml such as the following. then sort by function fails and returns Error 400 can not sort on unindexed field: Removing the name="*" entry from schema.xml is a workaround. I noted this in the Solr-1

Re: Solr memory use, jmap and TermInfos/tii

2010-09-13 Thread Michael McCandless
On Mon, Sep 13, 2010 at 6:29 PM, Burton-West, Tom wrote: > Thanks Robert and everyone! > > I'm working on changing our JVM settings today, since putting Solr 1.4.1 into > production will take a bit more work and testing.  Hopefully, I'll be able to > test the setTermIndexDivisor on our test serv

RE: Field names

2010-09-13 Thread Peter A. Kirk
Fantastic - that is exactly what I was looking for! But here is one thing I don't undertstand: If I call the url: http://localhost:8983/solr/admin/luke?numTerms=10&fl=name Some of the result looks like: 18 Does this mean that the term "gb" occurs 18 times in the name field? Be

Re: Field names

2010-09-13 Thread Ryan McKinley
check: http://wiki.apache.org/solr/LukeRequestHandler On Mon, Sep 13, 2010 at 7:00 PM, Peter A. Kirk wrote: > Hi > > is it possible to issue a query to solr, to get a list which contains all the > field names in the index? > > What about to get a list of the freqency of individual words in eac

Re: Need Advice for Finding Freelance Solr Expert

2010-09-13 Thread Chris Hostetter
: References: : <4c881061.60...@jhu.edu> : : In-Reply-To: : : Subject: Need Advice for Finding Freelance Solr Expert http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an exis

Re: How to extend IndexSchema and SchemaField

2010-09-13 Thread Chris Hostetter
: Yes, I have thought of that, or even extending field type. But this does not : work for my use case, since I can have multiple fields of a same type : (therefore with the same field type, and same analyzer), but each one of them : needs specific information. Therefore, I think the only "nice" wa

Field names

2010-09-13 Thread Peter A. Kirk
Hi is it possible to issue a query to solr, to get a list which contains all the field names in the index? What about to get a list of the freqency of individual words in each field? thanks, Peter

RE: Solr and jvm Garbage Collection tuning

2010-09-13 Thread Burton-West, Tom
Thanks Kent for your info. We are not doing any faceting, sorting, or much else. My guess is that most of the memory increase is just the data structures created when parts of the frq and prx files get read into memory. Our frq files are about 77GB and the prx files are about 260GB per sha

RE: Solr memory use, jmap and TermInfos/tii

2010-09-13 Thread Burton-West, Tom
Thanks Robert and everyone! I'm working on changing our JVM settings today, since putting Solr 1.4.1 into production will take a bit more work and testing. Hopefully, I'll be able to test the setTermIndexDivisor on our test server tomorrow. Mike, I've started the process to see if we can provi

Re: How to Update Value of One Field of a Document in Index?

2010-09-13 Thread Zachary Chang
Hi Savannah, if you *only want to boost* documents based on the information you calculate from the MoreLikeThis results (i.e. numeric measure), you might want to take a look at the ExternalFileField type. This field type reads its contents from a file which contains key-value pairs, e.g. the

Re: mm=0?

2010-09-13 Thread Simon Willnauer
On Mon, Sep 13, 2010 at 8:07 PM, Lance Norskog wrote: > "Java Swing" no longer gives ads for "swinger's clubs". damned no i have to explicitly enter it?! - argh! :) simon > > On Mon, Sep 13, 2010 at 9:37 AM, Dennis Gearon wrote: >> I just tried several searches again on google. >> >> I think th

Re: mm=0?

2010-09-13 Thread Lance Norskog
"Java Swing" no longer gives ads for "swinger's clubs". On Mon, Sep 13, 2010 at 9:37 AM, Dennis Gearon wrote: > I just tried several searches again on google. > > I think they've refined the ads placements so that certain kind of searches > return no ads, the kinds that I've been doing relative

Re: what differents between SolrCloud and Solr+Hadoop

2010-09-13 Thread Lance Norskog
You do not need either addition if you just want to have multiple Solr instances on different machines, and query them all at once. Look at this for the simplest way: http://wiki.apache.org/solr/DistributedSearch On Mon, Sep 13, 2010 at 12:52 AM, Marc Sturlese wrote: > > Well these are pretty di

Re: Tuning Solr caches with high commit rates (NRT)

2010-09-13 Thread Dennis Gearon
Thanks guys for the explanation. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Mon, 9/13/10, Simon Willnauer wrote: > From: Simon Willnauer > Subject: Re: Tuning

Re: mm=0?

2010-09-13 Thread Dennis Gearon
I just tried several searches again on google. I think they've refined the ads placements so that certain kind of searches return no ads, the kinds that I've been doing relative to programming being one of them. If OTOH I do some product related search, THEN lots of ads show up, but fairly acc

Re: mm=0?

2010-09-13 Thread Dennis Gearon
This issue is one I hope to head off in my application / on my site. Instead of an ad feed, I HOPE to be able to have an ad QUEUE on my site. If necessary, I'll convert the feed TO a queue. The queue will get a first pass done on it by either an employee or a compensated user. Either one genera

Re: Multiple sorting on text fields

2010-09-13 Thread Dennis Gearon
I thought I saw 'custom analyzer', but you wrote 'custom field'. My mistake. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Mon, 9/13/10, Stanislaw wrote: > From:

Re: Sorting not working on a string field

2010-09-13 Thread noel
You're right, it would be better to just give it a sortable numerical value. For now I gave time_code a sdouble type and see if it sorted, and it did. However all the 0's are trimmed, but that shouldn't be a problem unless it were to truncate any values past the hundreds column. Thanks. - Noel

Re: mm=0?

2010-09-13 Thread Satish Kumar
Hi Erik, I completely agree with you that showing a random document for user's query would be very poor experience. I have raised this in our product review meetings before. I was told that because of contractual agreement some sponsored content needs to be returned even if it meant no match. And

Re: stopwords in AND clauses

2010-09-13 Thread Xavier Noria
On Mon, Sep 13, 2010 at 4:29 PM, Simon Willnauer wrote: > On Mon, Sep 13, 2010 at 3:27 PM, Xavier Noria wrote: >> Let's suppose we have a regular search field body_t, and an internal >> boolean flag flag_t not exposed to the user. >> >> I'd like >> >>    body_t:foo AND flag_t:true > > this is so

Re: stopwords in AND clauses

2010-09-13 Thread Simon Willnauer
On Mon, Sep 13, 2010 at 3:27 PM, Xavier Noria wrote: > Let's suppose we have a regular search field body_t, and an internal > boolean flag flag_t not exposed to the user. > > I'd like > >    body_t:foo AND flag_t:true this is solr right? why don't you use filterquery for you unexposed flat_t fiel

stopwords in AND clauses

2010-09-13 Thread Xavier Noria
Let's suppose we have a regular search field body_t, and an internal boolean flag flag_t not exposed to the user. I'd like body_t:foo AND flag_t:true to be an intersection, but if "foo" is a stopword I get all documents for which flag_t is true, as if the first class was dropped, or if techn

Re: Multiple sorting on text fields

2010-09-13 Thread Erick Erickson
A couple of things come to mind: 1> what happens if you remove the sort clauses? Because I suspect they're irrelevant and your duplicate issue is something different. 2> SOLR admin should let you determine this. 3> Please show us the configurations that make you sure that the documen

Re: Solr CoreAdmin create ignores dataDir Parameter

2010-09-13 Thread Frank Wesemann
MitchK schrieb: Frank, have a look at SOLR-646. Do you think a workaround for the data-dir-tag in the solrconfig.xml can help? I think about something like ${solr./data/corename} for illustration. Unfortunately I am not very skilled in working with solr's variables and therefore I do not know

Re: Multiple sorting on text fields

2010-09-13 Thread Stanislaw
Hi Dennis, thanks for reply. Please explain me what filter do you mean. I'm searching only on one field with names: query.setQuery(suchstring); then I'm adding two sortings on another fields: query.addSortField("type", SolrQuery.ORDER.asc); query.addSortField("sortName", SolrQuery.ORDER.asc); th

Re: mm=0?

2010-09-13 Thread Jan Høydahl / Cominvent
As Erick points out, you don't want a random doc as response! What you're looking at is how to avoid the "0 hits" problem. You could look into one of these: * Introduce autosuggest to avoid many 0-hits cases * Introduce spellchecking * Re-run the failed query with fuzzy turned on (e.g. alpha~) * Re

Re: Sorting not working on a string field

2010-09-13 Thread Jan Høydahl / Cominvent
Hi, May you show us what result you actually get? Wouldn't it make more sense to choose a numeric fieldtype? To get proper sort order of numbers in a string field, all number need to be exactly same length since order will be lexiographical, i.e. "10" will come before "2", but after "02". -- J

Re: Tuning Solr caches with high commit rates (NRT)

2010-09-13 Thread Simon Willnauer
On Mon, Sep 13, 2010 at 8:02 AM, Dennis Gearon wrote: > BTW, what is a segment? On the Lucene level an index is composed of one or more index segments. Each segment is an index by itself and consists of several files like doc stores, proximity data, term dictionaries etc. During indexing Lucene /

Re: Tuning Solr caches with high commit rates (NRT)

2010-09-13 Thread Peter Sturge
Hi Dennis, These are the Lucene file segments that hold the index data on the file system. Have a look at: http://wiki.apache.org/solr/SolrPerformanceFactors Peter On Mon, Sep 13, 2010 at 7:02 AM, Dennis Gearon wrote: > BTW, what is a segment? > > I've only heard about them in the last 2 weeks

Re: Tuning Solr caches with high commit rates (NRT)

2010-09-13 Thread Peter Sturge
Hi Erik, I thought this would be good for the wiki, but I've not submitted to the wiki before, so I thought I'd put this info out there first, then add it if it was deemed useful. If you could let me know the procedure for submitting, it probably would be worth getting it into the wiki (couldn't d

Re: Tuning Solr caches with high commit rates (NRT)

2010-09-13 Thread Peter Sturge
1. You can run multiple Solr instances in separate JVMs, with both having their solr.xml configured to use the same index folder. You need to be careful that one and only one of these instances will ever update the index at a time. The best way to ensure this is to use one for writing only, and the

Re: Tuning Solr caches with high commit rates (NRT)

2010-09-13 Thread Peter Sturge
The balanced segment merging is a really cool idea. I'll definetely have a look at this, thanks! One thing I forgot to mention in the original post is we use a mergeFactor of 25. Somewhat on the high side, so that incoming commits aren't trying to merge new data into large segments. 25 is a good b

Re: Multiple sorting on text fields

2010-09-13 Thread Dennis Gearon
My guess is two things are happening: 1/ Your combination of filters is in parallel,or an OR expression. This I think for sure maybe, seen next. 2/ To get 3 duplicate results, your custom filter AND the OR expression above have to be working togther, or it's possible that your customer f

Re: what differents between SolrCloud and Solr+Hadoop

2010-09-13 Thread Marc Sturlese
Well these are pretty different things. SolrCloud is meant to handle distributed search in a more easy way that "raw" solr distributed search. You have to build the shards in your own way. Solr+hadoop is a way to build these shards/indexes in paralel. -- View this message in context: http://luc

Multiple sorting on text fields

2010-09-13 Thread Stanislaw
Hi all! i found some strange behavior of solr. If I do sorting by 2 text fields in chain, I do receive some results doubled. The both text fields are not multivalued, one of them is string, the other custom type based on text field and keyword analyzer. I do this: *CommonsHttpSolrServer