Re: What should focus be on hardware for solr servers?

Michael Della Bitta Thu, 14 Feb 2013 07:55:38 -0800

My dual-core, HT-enabled Dell Latitude from last year has this CPU:
model name : Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz
bogomips: 4988.65


An m3.xlarge reports:
model name : Intel(R) Xeon(R) CPU           E5645  @ 2.40GHz
bogomips : 4000.14

I tried running geekbench and phoronx-test-suite and failed at both...
Anybody have a favorite, free, CLI benchmarking suite?

Michael Della Bitta

------------------------------------------------
Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Thu, Feb 14, 2013 at 8:10 AM, Jack Krupansky <j...@basetechnology.com> wrote:
> That raises the question of how your average professional notebook computer
> (PC or Mac or Linux) compares to a garden-variety cloud server such as an
> Amazon EC2 m1.large (or m3.xlarge) in terms of performance such as document
> ingestion rate or how many documents you can load before load and/or query
> performance starts to fall off the cliff. Anybody have any numbers? I mean,
> is a MacBook Pro half of an EC2 m1.large? Twice? Less? More? Any rough feel?
> (With all the usual caveats that "it all depends" and "your mileage will
> vary.) But the intent would be for a similar workload on both (like loading
> the wikipedia dump.)
>
> -- Jack Krupansky
>
> -----Original Message----- From: Erick Erickson
> Sent: Thursday, February 14, 2013 7:31 AM
> To: solr-user@lucene.apache.org
> Subject: Re: What should focus be on hardware for solr servers?
>
>
> One data point: I can comfortably index and search the Wikipedia dump (11M
> articles, 5M with text) on my Macbook Pro. Admittedly not heavy-duty
> queries, but....
>
> Erick
>
>
> On Wed, Feb 13, 2013 at 4:01 PM, Matthew Shapiro <m...@mshapiro.net> wrote:
>
>> Excellent, thank you very much for the reply!
>>
>> On Wed, Feb 13, 2013 at 2:08 PM, Toke Eskildsen <t...@statsbiblioteket.dk
>> >wrote:
>>
>> > Matthew Shapiro [m...@mshapiro.net] wrote:
>> >
>> > > Sorry, I should clarify our current statistics.  First of all I meant
>> > 183k
>> > > documents (not 183, woops). Around 100k of those are full fledged html
>> > > articles (not web pages but articles in our CMS with html content
>> inside
>> > > of them),
>> >
>> > If an article is around 10-30 pages (or the equivalent), this is still a
>> > small corpus.
>> >
>> > > the rest of the data are more like key/value data records with a lot
>> > > of attached meta data for searching.
>> >
>> > If the amount of unique categories (model, author, playtime, lix,
>> > favorite_band, year...) in the meta data is in the lower hundreds, you
>> > should be fine.
>> >
>> > > Also, what I meant by search without a search term is that probably >
>> > > > 80%
>> > > (hard to confirm due to the lack of stats given by the GSA) of our
>> > searches
>> > > are done on pure metadata clauses without any searching through the
>> > content
>> > > itself,
>> >
>> > That clarifies a lot, thanks. So we have roughly speaking 4000*5
>> > queries/day ~= 14 queries/minute. Guessing wildly that your peak time
>> > traffic is about 5 times that, we end up with about 1 query/second. That
>> is
>> > a very light load for the Solr installation we're discussing.
>> >
>> > > so for example "give me documents that have a content type of
>> > > video, that are marked for client X, have a category of Y or Z, and >
>> > > > was
>> > > published to platform A, ordered by date published".
>> >
>> > That is a near-trivial query and you should get a reply very fast on
>> > modest hardware.
>> >
>> > > The searches that use a search term are more like use the same query
>> > from the
>> > > example as before, but find me all the documents that have the string
>> > "My Video"
>> > > in it's title and description.
>> >
>> > Unless you experiment with fuzzy matches and phrase slop, this should
>> also
>> > be fast. Ignoring analyzers, there is practically no difference between
>> > > a
>> > meta data field and a larger content field in Solr.
>> >
>> > Your current search (guessing here) iterates all terms in the content
>> > fields and take a comparatively large penalty when a large document is
>> > encountered. The inversion of index in Solr means that the search terms
>> are
>> > looked up in a dictionary and refers to the documents they belong to. >
>> > The
>> > penalty for having thousands or millions of terms as compared to tens or
>> > hundreds in a field in an inverted index is very small.
>> >
>> > We're still in "any random machine you've got available"-land so I >
>> > second
>> > Michael's suggestion.
>> >
>> > Regards,
>> > Toke Eskildsen
>>
>

Re: What should focus be on hardware for solr servers?

Reply via email to