Re: Solr Query Performance benchmarking

2017-04-28 Thread Shawn Heisey
On 4/28/2017 12:43 PM, Toke Eskildsen wrote: > Shawn Heisey wrote: >> Adding more shards as Toke suggested *might* help,[...] > I seem to have phrased my suggestion poorly. What I meant to suggest > was a switch to a single shard (with 4 replicas) setup, instead of the >

RE: Solr Query Performance benchmarking

2017-04-28 Thread Davis, Daniel (NIH/NLM) [C]
Beautiful, thank you. -Original Message- From: Walter Underwood [mailto:wun...@wunderwood.org] Sent: Friday, April 28, 2017 3:07 PM To: solr-user@lucene.apache.org Subject: Re: Solr Query Performance benchmarking I use the JMeter plugins. They’ve been reorganized recently, so they

Re: Solr Query Performance benchmarking

2017-04-28 Thread Walter Underwood
I use the JMeter plugins. They’ve been reorganized recently, so they aren’t where I originally downloaded them. Try this: https://jmeter-plugins.org/wiki/RespTimePercentiles/ https://jmeter-plugins.org/wiki/JMeterPluginsCMD/

Re: SolrIndexSearcher#getDocList() method returns zero results, if query includes tdate range query

2017-04-28 Thread Victor Solakhian
I used org.apache.solr.search.LuceneQParser instead of org.apache.lucene.queryparser.classic.QueryParser and now our code works. Here are some excerpts: ... QParser qParser = getParser(core, solrQueryString); Query query = qParser.parse(); ... private QParser getParser(final

RE: Solr Query Performance benchmarking

2017-04-28 Thread Davis, Daniel (NIH/NLM) [C]
Walter, If you can share a pointer to that JMeter add-on, I'd love it. -Original Message- From: Walter Underwood [mailto:wun...@wunderwood.org] Sent: Friday, April 28, 2017 2:53 PM To: solr-user@lucene.apache.org Subject: Re: Solr Query Performance benchmarking I use production logs

RE: Import Handler using shell scripts

2017-04-28 Thread Davis, Daniel (NIH/NLM) [C]
Attached is a Python script I use, with slight redactions, on several data import jobs. The main points here are: * Watch the job until the import finishes * Always send email whether it succeeds or fails * Put the hostname, and whether it was a success, in the subject for quick removal *

Re: Solr Query Performance benchmarking

2017-04-28 Thread Walter Underwood
I use production logs to get a mix of common and long-tail queries. It is very hard to get a realistic distribution with synthetic queries. A benchmark run goes like this, with a big shell script driving it. 1. Reload the collection to clear caches. 2. Split the log into a cache warming set

Re: Solr performance on EC2 linux

2017-04-28 Thread Erick Erickson
Well, 6.4.0 had a pretty severe performance issue so if you were using that release you might see this, 6.4.2 is the most recent 6.4 release. But I have no clue how changing linux settings would alter that and I sure can't square that issue with you having such different performance between local

Re: Import Handler using shell scripts

2017-04-28 Thread Erik Hatcher
Yes, via the HTTP API (via curl or other tool). See the commands and URL examples here: https://cwiki.apache.org/confluence/display/solr/Uploading+Structured+Data+Store+Data+with+the+Data+Import+Handler#UploadingStructuredDataStoreDatawiththeDataImportHandler-DataImportHandlerCommands

Re: Solr Query Performance benchmarking

2017-04-28 Thread Toke Eskildsen
Shawn Heisey wrote: > Adding more shards as Toke suggested *might* help,[...] I seem to have phrased my suggestion poorly. What I meant to suggest was a switch to a single shard (with 4 replicas) setup, instead of the current 2 shards (with 2 replicas). - Toke

Re: Solr Query Performance benchmarking

2017-04-28 Thread Erick Erickson
Well, the best way to get no cache hits is to set the cache sizes to zero ;). That provides worst-case scenarios and tells you exactly how much you're relying on caches. I'm not talking the lower-level Lucene caches here. One thing I've done is use the TermsComponent to generate a list of terms

Import Handler using shell scripts

2017-04-28 Thread Vijay Kokatnur
Is it possible to call dataimport handler from a shell script? I have not found any documentation regarding this. Any pointers? -- Best, Vijay

RE: Poll: Master-Slave or SolrCloud?

2017-04-28 Thread Davis, Daniel (NIH/NLM) [C]
I am also very surprised. Even though I am no longer using my solr-config-tool, the main thing I like about SolrCloud is how easy it is to bring up a new collection and set up the schema and fields that you want. I also like that I don't need to manage replication in the solr configuration.

Re: Solr Query Performance benchmarking

2017-04-28 Thread Rick Leir
(aside: Using Gatling or Jmeter?) Question: How can you easily randomize something in the query so you get no cache hits? I think there are several levels of caching. -- Sorry for being brief. Alternate email is rickleir at yahoo dot com

Re: Poll: Master-Slave or SolrCloud?

2017-04-28 Thread Rick Leir
Shawn, Would you consider writing this up in a blog? Thanks -- Rick On April 28, 2017 11:04:02 AM EDT, Shawn Heisey wrote: >On 4/24/2017 8:58 AM, Otis Gospodnetić wrote: >> I'm really really surprised here. Back in 2013 we did a poll to see >how >> people were running

Re: Clean checkbox on DIH

2017-04-28 Thread Alexandre Rafalovitch
Sounds like a good candidate for JIRA issue to me (maybe there is one even?). Then people can vote, propose patches or offer alternative solutions (e.g. a big exclamation mark or what not). Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, new and experienced On 28

Re: Clean checkbox on DIH

2017-04-28 Thread Vijay Kokatnur
Even though it's a minority, I think it should be disabled by default. ​That's a cleaner approach. Accidentally running it in prod without unchecking 'clean' can be disastrous. On Apr 28, 2017, at 8:01AM, Mahmoud Almokadem wrote: > > Thanks Shawn, > > We already using a

Solr performance on EC2 linux

2017-04-28 Thread Jeff Wartes
tldr: Recently, I tried moving an existing solrcloud configuration from a local datacenter to EC2. Performance was roughly 1/10th what I’d expected, until I applied a bunch of linux tweaks. This should’ve been a straight port: one datacenter server -> one EC2 node. Solr 5.4, Solrcloud, Ubuntu

Re: Solr Query Performance benchmarking

2017-04-28 Thread Erick Erickson
re: the q vs. fq question. My claim (not verified) is that the fastest of all would be q=*:*={!cache=false}. That would bypass the scoring that putting it in the "q" clause would entail as well as bypass the filter cache. But I have to agree with Walter, this is very suspicious IMO. Here's what

Re: Poll: Master-Slave or SolrCloud?

2017-04-28 Thread Shawn Heisey
On 4/24/2017 8:58 AM, Otis Gospodnetić wrote: > I'm really really surprised here. Back in 2013 we did a poll to see how > people were running Master-Slave (4.x back then) and SolrCloud was a bit > more popular than Master-Slave: > https://sematext.com/blog/2013/02/25/poll-solr-cloud-or-not/ > >

Re: Clean checkbox on DIH

2017-04-28 Thread Mahmoud Almokadem
Thanks Shawn, We already using a shell scripts to do our import and using fullimport command to do our delta import and everything is doing well several years ago. But default of the UI is full import with clean and commit. If I press the Execute button by mistake the whole index is cleaned

Re: Solr Query Performance benchmarking

2017-04-28 Thread Walter Underwood
More “unrealistic” than “amazing”. I bet the set of test queries is smaller than the query result cache size. Results from cache are about 2 ms, but network communication to the shards would add enough overhead to reach 40 ms. wunder Walter Underwood wun...@wunderwood.org

Re: Solr Query Performance benchmarking

2017-04-28 Thread Shawn Heisey
On 4/27/2017 5:20 PM, Suresh Pendap wrote: > Max throughput that I get: 12000 to 12500 reqs/sec > 95 percentile query latency: 30 to 40 msec These numbers are *amazing* ... far better than I would have expected to see on a 27GB index, even in a situation where it fits entirely into available

Re: Clean checkbox on DIH

2017-04-28 Thread Shawn Heisey
On 4/28/2017 5:11 AM, Mahmoud Almokadem wrote: > I'd like to request to uncheck the "Clean" checkbox by default on DIH page, > cause it cleaned the whole index about 2TB when I click Execute button by > wrong. Or show a confirmation message that the whole index will be cleaned!! When somebody is

Re: Empty value fields not indexed

2017-04-28 Thread Shawn Heisey
On 4/27/2017 10:06 PM, Zheng Lin Edwin Yeo wrote: > I'm using Solr 6.4.2, and I realized that for those fields which has no > values, the field name is not index into Solr. > > It was working fine in the previous version. > > Any reason for this or any settings which needs to be done so that the >

Re: 1 main collection or multiple smaller collections?

2017-04-28 Thread Rick Leir
Derek You could have one document per supplier which has no product info. It would have a flag to indicate this. Then your supplier search is simple. But grouping would be better, so the supplier search can show product counts and categories and ... +1 Walter on designing back from the

Clean checkbox on DIH

2017-04-28 Thread Mahmoud Almokadem
Hello, I'd like to request to uncheck the "Clean" checkbox by default on DIH page, cause it cleaned the whole index about 2TB when I click Execute button by wrong. Or show a confirmation message that the whole index will be cleaned!! Sincerely, Mahmoud

Re: Poll: Master-Slave or SolrCloud?

2017-04-28 Thread Charlie Hull
Like Sematext, we help clients with both ES and Solr. A particular difference is that ES is easier to start with (lots of sensible defaults) but then once you have got going (and perchance have thrown many millions of items at it) you can run into trouble because you don't really understand

Re: Solr Query Performance benchmarking

2017-04-28 Thread Toke Eskildsen
On Thu, 2017-04-27 at 23:20 +, Suresh Pendap wrote: > Number of Solr Nodes: 4 > Number of shards: 2 > replication-factor:  2 > Index size: 55 GB > Shard/Core size: 27.7 GB > maxConnsPerHost: 1000 The overhead of sharding is not trivial. Your overall index size is fairly small, relative to