Or perhaps we should develop our own, Solr-based benchmark... Michael Della Bitta
------------------------------------------------ Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Thu, Feb 14, 2013 at 10:54 AM, Michael Della Bitta <michael.della.bi...@appinions.com> wrote: > My dual-core, HT-enabled Dell Latitude from last year has this CPU: > model name : Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz > bogomips: 4988.65 > > An m3.xlarge reports: > model name : Intel(R) Xeon(R) CPU E5645 @ 2.40GHz > bogomips : 4000.14 > > I tried running geekbench and phoronx-test-suite and failed at both... > Anybody have a favorite, free, CLI benchmarking suite? > > Michael Della Bitta > > ------------------------------------------------ > Appinions > 18 East 41st Street, 2nd Floor > New York, NY 10017-6271 > > www.appinions.com > > Where Influence Isn’t a Game > > > On Thu, Feb 14, 2013 at 8:10 AM, Jack Krupansky <j...@basetechnology.com> > wrote: >> That raises the question of how your average professional notebook computer >> (PC or Mac or Linux) compares to a garden-variety cloud server such as an >> Amazon EC2 m1.large (or m3.xlarge) in terms of performance such as document >> ingestion rate or how many documents you can load before load and/or query >> performance starts to fall off the cliff. Anybody have any numbers? I mean, >> is a MacBook Pro half of an EC2 m1.large? Twice? Less? More? Any rough feel? >> (With all the usual caveats that "it all depends" and "your mileage will >> vary.) But the intent would be for a similar workload on both (like loading >> the wikipedia dump.) >> >> -- Jack Krupansky >> >> -----Original Message----- From: Erick Erickson >> Sent: Thursday, February 14, 2013 7:31 AM >> To: solr-user@lucene.apache.org >> Subject: Re: What should focus be on hardware for solr servers? >> >> >> One data point: I can comfortably index and search the Wikipedia dump (11M >> articles, 5M with text) on my Macbook Pro. Admittedly not heavy-duty >> queries, but.... >> >> Erick >> >> >> On Wed, Feb 13, 2013 at 4:01 PM, Matthew Shapiro <m...@mshapiro.net> wrote: >> >>> Excellent, thank you very much for the reply! >>> >>> On Wed, Feb 13, 2013 at 2:08 PM, Toke Eskildsen <t...@statsbiblioteket.dk >>> >wrote: >>> >>> > Matthew Shapiro [m...@mshapiro.net] wrote: >>> > >>> > > Sorry, I should clarify our current statistics. First of all I meant >>> > 183k >>> > > documents (not 183, woops). Around 100k of those are full fledged html >>> > > articles (not web pages but articles in our CMS with html content >>> inside >>> > > of them), >>> > >>> > If an article is around 10-30 pages (or the equivalent), this is still a >>> > small corpus. >>> > >>> > > the rest of the data are more like key/value data records with a lot >>> > > of attached meta data for searching. >>> > >>> > If the amount of unique categories (model, author, playtime, lix, >>> > favorite_band, year...) in the meta data is in the lower hundreds, you >>> > should be fine. >>> > >>> > > Also, what I meant by search without a search term is that probably > >>> > > > 80% >>> > > (hard to confirm due to the lack of stats given by the GSA) of our >>> > searches >>> > > are done on pure metadata clauses without any searching through the >>> > content >>> > > itself, >>> > >>> > That clarifies a lot, thanks. So we have roughly speaking 4000*5 >>> > queries/day ~= 14 queries/minute. Guessing wildly that your peak time >>> > traffic is about 5 times that, we end up with about 1 query/second. That >>> is >>> > a very light load for the Solr installation we're discussing. >>> > >>> > > so for example "give me documents that have a content type of >>> > > video, that are marked for client X, have a category of Y or Z, and > >>> > > > was >>> > > published to platform A, ordered by date published". >>> > >>> > That is a near-trivial query and you should get a reply very fast on >>> > modest hardware. >>> > >>> > > The searches that use a search term are more like use the same query >>> > from the >>> > > example as before, but find me all the documents that have the string >>> > "My Video" >>> > > in it's title and description. >>> > >>> > Unless you experiment with fuzzy matches and phrase slop, this should >>> also >>> > be fast. Ignoring analyzers, there is practically no difference between >>> > > a >>> > meta data field and a larger content field in Solr. >>> > >>> > Your current search (guessing here) iterates all terms in the content >>> > fields and take a comparatively large penalty when a large document is >>> > encountered. The inversion of index in Solr means that the search terms >>> are >>> > looked up in a dictionary and refers to the documents they belong to. > >>> > The >>> > penalty for having thousands or millions of terms as compared to tens or >>> > hundreds in a field in an inverted index is very small. >>> > >>> > We're still in "any random machine you've got available"-land so I > >>> > second >>> > Michael's suggestion. >>> > >>> > Regards, >>> > Toke Eskildsen >>> >>