My dual-core, HT-enabled Dell Latitude from last year has this CPU: model name : Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz bogomips: 4988.65
An m3.xlarge reports: model name : Intel(R) Xeon(R) CPU E5645 @ 2.40GHz bogomips : 4000.14 I tried running geekbench and phoronx-test-suite and failed at both... Anybody have a favorite, free, CLI benchmarking suite? Michael Della Bitta ------------------------------------------------ Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Thu, Feb 14, 2013 at 8:10 AM, Jack Krupansky <j...@basetechnology.com> wrote: > That raises the question of how your average professional notebook computer > (PC or Mac or Linux) compares to a garden-variety cloud server such as an > Amazon EC2 m1.large (or m3.xlarge) in terms of performance such as document > ingestion rate or how many documents you can load before load and/or query > performance starts to fall off the cliff. Anybody have any numbers? I mean, > is a MacBook Pro half of an EC2 m1.large? Twice? Less? More? Any rough feel? > (With all the usual caveats that "it all depends" and "your mileage will > vary.) But the intent would be for a similar workload on both (like loading > the wikipedia dump.) > > -- Jack Krupansky > > -----Original Message----- From: Erick Erickson > Sent: Thursday, February 14, 2013 7:31 AM > To: solr-user@lucene.apache.org > Subject: Re: What should focus be on hardware for solr servers? > > > One data point: I can comfortably index and search the Wikipedia dump (11M > articles, 5M with text) on my Macbook Pro. Admittedly not heavy-duty > queries, but.... > > Erick > > > On Wed, Feb 13, 2013 at 4:01 PM, Matthew Shapiro <m...@mshapiro.net> wrote: > >> Excellent, thank you very much for the reply! >> >> On Wed, Feb 13, 2013 at 2:08 PM, Toke Eskildsen <t...@statsbiblioteket.dk >> >wrote: >> >> > Matthew Shapiro [m...@mshapiro.net] wrote: >> > >> > > Sorry, I should clarify our current statistics. First of all I meant >> > 183k >> > > documents (not 183, woops). Around 100k of those are full fledged html >> > > articles (not web pages but articles in our CMS with html content >> inside >> > > of them), >> > >> > If an article is around 10-30 pages (or the equivalent), this is still a >> > small corpus. >> > >> > > the rest of the data are more like key/value data records with a lot >> > > of attached meta data for searching. >> > >> > If the amount of unique categories (model, author, playtime, lix, >> > favorite_band, year...) in the meta data is in the lower hundreds, you >> > should be fine. >> > >> > > Also, what I meant by search without a search term is that probably > >> > > > 80% >> > > (hard to confirm due to the lack of stats given by the GSA) of our >> > searches >> > > are done on pure metadata clauses without any searching through the >> > content >> > > itself, >> > >> > That clarifies a lot, thanks. So we have roughly speaking 4000*5 >> > queries/day ~= 14 queries/minute. Guessing wildly that your peak time >> > traffic is about 5 times that, we end up with about 1 query/second. That >> is >> > a very light load for the Solr installation we're discussing. >> > >> > > so for example "give me documents that have a content type of >> > > video, that are marked for client X, have a category of Y or Z, and > >> > > > was >> > > published to platform A, ordered by date published". >> > >> > That is a near-trivial query and you should get a reply very fast on >> > modest hardware. >> > >> > > The searches that use a search term are more like use the same query >> > from the >> > > example as before, but find me all the documents that have the string >> > "My Video" >> > > in it's title and description. >> > >> > Unless you experiment with fuzzy matches and phrase slop, this should >> also >> > be fast. Ignoring analyzers, there is practically no difference between >> > > a >> > meta data field and a larger content field in Solr. >> > >> > Your current search (guessing here) iterates all terms in the content >> > fields and take a comparatively large penalty when a large document is >> > encountered. The inversion of index in Solr means that the search terms >> are >> > looked up in a dictionary and refers to the documents they belong to. > >> > The >> > penalty for having thousands or millions of terms as compared to tens or >> > hundreds in a field in an inverted index is very small. >> > >> > We're still in "any random machine you've got available"-land so I > >> > second >> > Michael's suggestion. >> > >> > Regards, >> > Toke Eskildsen >> >