Re: Multiple servers support

2011-09-26 Thread Raja Ghulam Rasool
Erick, Many thanks for your suggestions and pointers, i am proceeding with my study and looking forward to do a POC with Solr. Thanks again. On Sun, Sep 25, 2011 at 7:40 PM, Erick Erickson erickerick...@gmail.comwrote: Well, this is not a neutral forum G... A common use-case for Solr is

Unique Key error on trunk

2011-09-26 Thread Viswa S
Hello, We use solr.UUIDField to generate unique ids, using the latest trunk (change list 1163767) seems to throw an error Document is missing mandatory uniqueKey field: id. The schema is setup to generate a id field on updates field name=id type=uuid indexed=true stored=true default=NEW /

error while replication

2011-09-26 Thread shinkanze
hi , I am replicating solr and getting this error . i am unable to make out the cause so please kindly help 26 Sep, 2011 8:00:14 AM org.slf4j.impl.JDK14LoggerAdapter fillCallerData SEVERE: Error during auto-warming of

multiple dateranges/timeslots per doc: modeling openinghours.

2011-09-26 Thread britske
Sorry for the somewhat length post, I would like to make clear that I covered my basis here, and looking for an alternative solution, because the more trivial solutions don't seem to work for my use-case. Consider Bars, musea, etc. These places have multiple openinghours that can depend on:

Solr stopword problem in Query

2011-09-26 Thread Isan Fulia
Hi all, I have a text field named* textForQuery* . Following content has been indexed into solr in field textForQuery *Coke Studio at MTV* when i fired the query as *textForQuery:(coke studio at mtv)* the results showed 0 documents After runing the same query in debugMode i got the following

AW: How to map database table for facted search?

2011-09-26 Thread Chorherr Nikolaus
Thx for your response, we will try dynamic fields for this -Ursprüngliche Nachricht- Von: Erick Erickson [mailto:erickerick...@gmail.com] Gesendet: Samstag, 24. September 2011 21:33 An: solr-user@lucene.apache.org Betreff: Re: How to map database table for facted search? In general, you

Re: Seek your wisdom for implementing 12 million docs..

2011-09-26 Thread Toke Eskildsen
On Sun, 2011-09-25 at 22:00 +0200, Ikhsvaku S wrote: Documents: We have close to ~12 million XML docs, of varying sizes average size 20 KB. These documents have 150 fields, which should be searchable indexed. [...] Approximately ~6000 such documents are updated 400-800 new ones are added

SOLR Index Speed

2011-09-26 Thread Lord Khan Han
Hi, We have 500K web document and usind solr (trunk) to index it. We have special anaylizer which little bit heavy cpu . Our machine config: 32 x cpu 32 gig ram SAS HD We are sending document with 16 reduce client (from hadoop) to the stand alone solr server. the problem is we couldnt get

Re: NRT and commit behavior

2011-09-26 Thread Vadim Kisselmann
Tirthankar, are you indexing 1.smaller docs or 2.books? if 1. your caches are too big for your memory, as Erick already said. Try to allocate 10GB für JVM, leave 14GB for your HDD-Cache and make your caches smaller. if 2. read the blog-posts on hathitrust.com.

RE: Best Solr escaping?

2011-09-26 Thread Bob Sandiford
I won't guarantee this is the 'best algorithm', but here's what we use. (This is in a final class with only static helper methods): // Set of characters / Strings SOLR treats as having special meaning in a query, and the corresponding Escaped versions. // Note that the actual operators

Re: email - DIH

2011-09-26 Thread jb
Hi Alonso, Gora, I run in the same Problem with the MailEntityProcessor. I have an Email-Folder called Test. Inside there a only two messages. When I run the DIH everything looks find, except that the two Emails doesn't get indexed. Are there any adidtional informations to this problem? I'm

Re: mlt content stream help

2011-09-26 Thread dan whelan
On 9/24/11 12:17 PM, Erick Erickson wrote: What version of Solr? I am using solr 3.2 When you copied the default, did you set up default values for MLT? This is what I need help with. How should the request handler / solrconfig be setup? Showing us the request you used The request is

Re: Update ingest rate drops suddenly

2011-09-26 Thread eks dev
Just to bring closure on this one, we were slurping data from the wrong DB (hardly desktop class machine)... Solr did not cough on 41Mio records @34k updates / sec., single threaded. Great! On Sat, Sep 24, 2011 at 9:18 PM, eks dev eks...@yahoo.co.uk wrote: just looking for hints where to

Re: Solr stopword problem in Query

2011-09-26 Thread Bill Bell
This is pretty serious issue Bill Bell Sent from mobile On Sep 26, 2011, at 4:09 AM, Isan Fulia isan.fu...@germinait.com wrote: Hi all, I have a text field named* textForQuery* . Following content has been indexed into solr in field textForQuery *Coke Studio at MTV* when i fired the

Re: mlt content stream help

2011-09-26 Thread Erick Erickson
Please don't say it's just like the example. If it was, then it would most likely be working. If you don't take the time to show us what you've tried, and the results you get back, then there's not much we can do to help. Best Erick On Mon, Sep 26, 2011 at 7:18 AM, dan whelan d...@adicio.com

drastic performance decrease with 20 cores

2011-09-26 Thread Bictor Man
Hi everyone, Sorry if this issue has been discussed before, but I'm new to the list. I have a solr (3.4) instance running with 20 cores (around 4 million docs each). The instance has allocated 13GB in a 16GB RAM server. If I run several sets of queries sequentially in each of the cores, the I/O

Solr Cloud Number of Shard Limitation?

2011-09-26 Thread Jamie Johnson
Is there any limitation, be it technical or for sanity reasons, on the number of shards that can be part of a solr cloud implementation?

Re: mlt content stream help

2011-09-26 Thread dan whelan
OK. This is exactly what i did. With a fresh download of solr 3.2 unpack and go to example directory start solr: java -jar start.jar the go to exampledocs and run: ./post.sh *xml Then go here:

Re: Solr stopword problem in Query

2011-09-26 Thread Rahul Warawdekar
Hi Isan, Does your search return any documents when you remove the 'at' keyword and just search for Coke studio MTV ? Also, can you please provide the snippet of schema.xml file where you have mentioned this field name and its type description ? On Mon, Sep 26, 2011 at 6:09 AM, Isan Fulia

solr DIH for mongodb

2011-09-26 Thread Kiwi de coder
hi, do we got any DIH plugin which is for mongodb? regards, kiwi

drastic performance decrease with 20 cores

2011-09-26 Thread Bictor Man
Hi everyone, Sorry if this issue has been discussed before, but I'm new to the list. I have a solr (3.4) instance running with 20 cores (around 4 million docs each). The instance has allocated 13GB in a 16GB RAM server. If I run several sets of queries sequentially in each of the cores, the I/O

Re: drastic performance decrease with 20 cores

2011-09-26 Thread Shawn Heisey
On 9/26/2011 9:33 AM, Bictor Man wrote: Hi everyone, Sorry if this issue has been discussed before, but I'm new to the list. I have a solr (3.4) instance running with 20 cores (around 4 million docs each). The instance has allocated 13GB in a 16GB RAM server. If I run several sets of queries

Re: drastic performance decrease with 20 cores

2011-09-26 Thread François Schiettecatte
You have not said how big your index is but I suspect that allocating 13GB for your 20 cores is starving the OS of memory for caching file data. Have you tried 6GB with 20 cores? I suspect you will see the same performance as 6GB 10 cores. Generally it is better to allocate just enough memory

how to implemente a query like like '%pattern%'

2011-09-26 Thread libnova
Hi all. how can we do a query similar to 'like' ? if I have this phrase like a single token in the index: This phrase has various words (using KeywordTokenizerFactory) and i like a exact match of: phrase has various or various words form instance... How can i do this?? Thanks a lot.

Re: SOLR error with custom FacetComponent

2011-09-26 Thread Chris Hostetter
: : Unfortunately the facet fields are not static. The field are dynamic SOLR : fields and are generated by different applications. : The field names will be populated into a data store (like memcache) and : facets have to be driven from that data store. : : I need to write a Custom

aggregate functions in Solr?

2011-09-26 Thread Esteban Donato
Hello guys,   I need to implement a functionality which requires something similar to aggregate functions in SQL.  My Solr schema looks like this: -doc_id: integer -date: date -value1: integer -value2: integer   Basically the index contains some numerical values (value1, value2, etc) per doc

Re: Unique Key error on trunk

2011-09-26 Thread Viswa S
You can replicate it with the example app by replacing the id definition in schema.xml with field name=id type=uuid indexed=true stored=true default=NEW / Removing the id fields in the one of the example doc.xml and posting it to solr. Thanks Viswa On Sep 26, 2011, at 12:15 AM, Viswa S

Re: mlt content stream help

2011-09-26 Thread Chris Hostetter
Dan: The disconnect here seems to be that these examples urls on the MoreLikeThisHandler wiki page assume a /mlt request handler exists, but no handler by that name has ever actually existed in the solr example configs. (the wiki page doesn't explicitly state that those URLs will work with

RE: SOLR Index Speed

2011-09-26 Thread Jaeger, Jay - DOT
500 / second would be 1,800,000 per hour (much more than 500K documents). 1) how big is each document? 2) how big are your index files? 3) as others have recently written, make sure you don't give your JRE so much memory that your OS is starved for memory to use for file system cache. JRJ

How to reserve ids?

2011-09-26 Thread Gabriele Kahlout
Hello, While indexing there are certain urls/ids I'd never want to appear in the search results (so be indexed). Is there already a 'supported by design' mechanism to do that to point me too, or should I just create this blacklist as an processor in the update chain? -- Regards, K. Gabriele

Boost Exact matches on Specific Fields

2011-09-26 Thread balaji
Hi all I am new to SOLR and have a doubt on Boosting the Exact Terms to the top on a Particular field For ex : I have a text field names ts_category and I want to give more boost to this field rather than other fields, SO in my Query I pass the following in the QF params qf=body^4.0

Re: how to implemente a query like like '%pattern%'

2011-09-26 Thread Tomás Fernández Löbbe
If you need those kinds of searches then you should probably not be using the KeywordTokenizerFactory, is there any reason why you can't switch to a WhitespaceTokenizer for example? then you could use a simple phrase query for your search case. if you need everything as a Token, you could use a

RE: A fieldType for a address street

2011-09-26 Thread Jaeger, Jay - DOT
We used copyField to copy the address to two fields: 1. Which contains just the first token up to the first whitespace 2. Which copies all of it, but translates to lower case. Then our users can enter either a street number, a street name, or both. We copied all of it to the second field

RE: SOLR Index Speed

2011-09-26 Thread Lan
Are you batching the documents before sending them to the solr server? Are you doing a commit only at the end? Also since you have 32 cores, you can try upping the number of concurrent updaters from 16 to 32. Jaeger, Jay - DOT wrote: 500 / second would be 1,800,000 per hour (much more than

Re: how to implemente a query like like '%pattern%'

2011-09-26 Thread Chris Hostetter
: References: : cafwsjvnqkaufwspqrkm4sckb-0gvak-vktkfrnmfwgzwltm...@mail.gmail.com : In-Reply-To: : cafwsjvnqkaufwspqrkm4sckb-0gvak-vktkfrnmfwgzwltm...@mail.gmail.com : Subject: how to implemente a query like like '%pattern%' https://people.apache.org/~hossman/#threadhijack Thread

Re: Unique Key error on trunk

2011-09-26 Thread Chris Hostetter
: Subject: Re: Unique Key error on trunk : : : You can replicate it with the example app by replacing the id definition in schema.xml with : : field name=id type=uuid indexed=true stored=true default=NEW / thanks for reporting this Viswa, I've filed a bug to track it...

Searching multiple fields

2011-09-26 Thread Mark
I have a use case where I would like to search across two fields but I do not want to weight a document that has a match in both fields higher than a document that has a match in only 1 field. For example. Document 1 - Field A: Foo Bar - Field B: Foo Baz Document 2 - Field A: Foo Blarg -

Re: How to apply filters to stored data

2011-09-26 Thread Chris Hostetter
: Hi Erick, The problem I am trying to solve is to filter invalid entities. : Users might mispell or enter a new entity name. This new/invalid entities : need to pass through a KeepWordFilter so that it won't pollute our : autocomplete result. how are you doing autocomplete? if you are using

Re: drastic performance decrease with 20 cores

2011-09-26 Thread Bictor Man
Hi guys, thanks for your replies. indeed the filesystem caching seems to be the difference. sadly I can't add more memory and the 6GB/20core combination doesn't work. so I'll just try to tweak it as much as I can. thanks a lot. 2011/9/26 François Schiettecatte fschietteca...@gmail.com You

Re: How to apply filters to stored data

2011-09-26 Thread Jithin
Is UpdateProcessor triggered when updating an existing document or for new documents also? On Tue, Sep 27, 2011 at 6:00 AM, Chris Hostetter-3 [via Lucene] ml-node+s472066n3371110...@n3.nabble.com wrote: : Hi Erick, The problem I am trying to solve is to filter invalid entities. : Users

Re: external file field partial data match in key field

2011-09-26 Thread abhayd
i found answer to my question .. basically it works only with complete match.. -- View this message in context: http://lucene.472066.n3.nabble.com/external-file-field-partial-data-match-in-key-field-tp3368547p3371328.html Sent from the Solr - User mailing list archive at Nabble.com.

Any plans to support function queries on score?

2011-09-26 Thread Way Cool
Hi, guys, Do you have any plans to support function queries on score field? for example, sort=floor(product(score, 100)+0.5) desc? So far I am getting the following error: undefined field score I can't use subquery in this case because I am trying to use secondary sorting, however I will be

Re: Searching multiple fields

2011-09-26 Thread Otis Gospodnetic
Hi Mark, Eh, I don't have Lucene/Solr source code handy, but I *think* for that you'd need to write custom Lucene similarity. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Mark

Re: How to reserve ids?

2011-09-26 Thread Otis Gospodnetic
Hi Gabriele, Either the latter option, or just treat them as stop words if you just want to remove those urls/ids from indexed docs (may still get highlighted). Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/

Re: Boost Exact matches on Specific Fields

2011-09-26 Thread Way Cool
If I were you, probably I will try defining two fields: 1. ts_category as a string type 2. ts_category1 as a text_en type Make sure copy ts_category to ts_category1. You can use the following as qf in your dismax: qf=body^4.0 title^5.0 ts_category^10.0 ts_category1^5.0 or something like that. YH

Re: solr DIH for mongodb

2011-09-26 Thread Otis Gospodnetic
Hi, Here is a 1 month old thread I found on search-lucene -- didn't even have to do a search, I got it as a suggestion from AutoComplete when I started typing the word mongodb :) http://search-lucene.com/m/8AEE31AaTd32 Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch

Re: Update ingest rate drops suddenly

2011-09-26 Thread Otis Gospodnetic
Aha!  See, it was the DB after all! ;)  Thanks for following up, I was curious. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: eks dev eks...@yahoo.co.uk To: solr-user

Re: SOLR Index Speed

2011-09-26 Thread Otis Gospodnetic
Hello, PS: solr streamindex  is not option because we need to submit javabin... If you are referring to StreamingUpdateSolrServer, then the above statement makes no sense and you should give SUSS a try. Are you sure your 16 reducers produce more than 500 docs/second? I think somebody already

Re: error while replication

2011-09-26 Thread Otis Gospodnetic
Rajat, What version?  If 3.4.0, I'd try 3.4.0 first. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: shinkanze rajatrastogi...@gmail.com To: solr-user@lucene.apache.org Sent:

Re: matching reponse and request

2011-09-26 Thread Otis Gospodnetic
Hi Roland, Have a look at hit #1 here: http://search-lucene.com/?q=manifoldcffc_project=Solr I think this is what you are after. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From:

Re: solr DIH for mongodb

2011-09-26 Thread Kiwi de coder
wow, this search engine is powerful ! too bad after look throught it, still got not solution. seem like I need to get my hand dirty to make one :) kiwi On Tue, Sep 27, 2011 at 12:08 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, Here is a 1 month old thread I found on

Re: Boost Exact matches on Specific Fields

2011-09-26 Thread Balaji S
Hi You mean to say copy the String field to a Text field or the reverse . This is the approach I am currently following Step 1: Created a FieldType fieldType name=string_lower class=solr.TextField sortMissingLast=true omitNorms=true analyzer tokenizer

Re: drastic performance decrease with 20 cores

2011-09-26 Thread Otis Gospodnetic
The following should help with size estimation: http://search-lucene.com/?q=estimate+memoryfc_project=Solr http://issues.apache.org/jira/browse/LUCENE-3435 I'll just add that with that much RAM you'll be more than fine. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene

Re: solr DIH for mongodb

2011-09-26 Thread Otis Gospodnetic
From: Kiwi de coder kiwio...@gmail.com wow, this search engine is powerful ! Thanks, glad it helps. too bad after look throught it, still got not solution. seem like I need to get my hand dirty to make one :) :) Please consider contributing: http://wiki.apache.org/solr/HowToContribute Otis

Re: How to implement Spell Checker using Solr?

2011-09-26 Thread anupamxyz
I have been able to setup Solr Spell checker on my web application. It is a file based spell checker that i have implemented. I would like to add that the same isn't that accurate, since I haven't applied any specific algorithm for having the most relevant search result. Kindly do let me know in

Re: How to implement Spell Checker using Solr?

2011-09-26 Thread tamanjit.bin...@yahoo.co.in
Firstly, just to make it clear the dictionary is made out of already indexed terms, rather it is built upon it if you are using *str name=classnamesolr.IndexBasedSpellChecker/str* which you are. Next lot of changes are required for your *solrconfig.xml* 1. str name=fieldspell/str is the name of

Re: Solr stopword problem in Query

2011-09-26 Thread Isan Fulia
Hi Rahul, I also tried searching Coke Studio MTV but no documents were returned. Here is the snippet of my schema file. fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer

Re: what is delata query and how to write?

2011-09-26 Thread Gora Mohanty
On Tue, Sep 27, 2011 at 10:51 AM, nagarjuna nagarjuna.avul...@gmail.com wrote: Hi everybody. right now i have little bit idea about the solr query ..but i am not clear about delta query wht it is? and how to write ?any sample delta query? http://lmgtfy.com/?q=solr+delta+query There

Re: what is delata query and how to write?

2011-09-26 Thread nagarjuna
Hi gora can u pls quit ur answers like these.. i may get the perfect answer from anybody but not u,so kindly please be quit i already googled and i saw many links as a beginner i am unable to got the main intention behind using the delta query,even we have query.and i

Re: How to reserve ids?

2011-09-26 Thread Gabriele Kahlout
I'm interested in the stopwords solution as it sounds like less work but i'm not sure i understand how it works. By having msn.com as a stopword it doesnt mean i wont get msn.com as a result for say 'hotmail'. My understanding is that msn.com will never make it to the similarity function and