Re: What should focus be on hardware for solr servers?

Matthew Shapiro Wed, 13 Feb 2013 09:31:35 -0800

That definitely will be a useful tool in this conversion, thanks.

On Wed, Feb 13, 2013 at 12:25 PM, Michael Della Bitta <
michael.della.bi...@appinions.com> wrote:


> Ooops: https://code.google.com/p/solrmeter/
>
>
>
> Michael Della Bitta
>
> ------------------------------------------------
> Appinions
> 18 East 41st Street, 2nd Floor
> New York, NY 10017-6271
>
> www.appinions.com
>
> Where Influence Isn’t a Game
>
>
> On Wed, Feb 13, 2013 at 12:25 PM, Michael Della Bitta
> <michael.della.bi...@appinions.com> wrote:
> > Matthew,
> >
> > With an index that small, you should be able to build a proof of
> > concept on your own hardware and discover how it performs using
> > something like SolrMeter:
> >
> >
> > Michael Della Bitta
> >
> > ------------------------------------------------
> > Appinions
> > 18 East 41st Street, 2nd Floor
> > New York, NY 10017-6271
> >
> > www.appinions.com
> >
> > Where Influence Isn’t a Game
> >
> >
> > On Wed, Feb 13, 2013 at 12:21 PM, Matthew Shapiro <m...@mshapiro.net>
> wrote:
> >> Thanks for the reply.
> >>
> >> If the main amount of searches are the exact same (e.g. the empty
> search),
> >>> the result will be cached. If 5,683 searches/month is the real count,
> this
> >>> sounds like a very low amount of searches in a very limited corpus.
> Just
> >>> about any machine should be fine. I guess I am missing something here.
> >>> Could you elaborate a bit? How large is a document, how many do you
> expect
> >>> to handle, what do you expect a query to look like, how should the
> result
> >>> be presented?
> >>
> >>
> >> Sorry, I should clarify our current statistics.  First of all I meant
> 183k
> >> documents (not 183, woops).  Around 100k of those are full fledged html
> >> articles (not web pages but articles in our CMS with html content
> inside of
> >> them), the rest of the data are more like key/value data records with a
> lot
> >> of attached meta data for searching.
> >>
> >> Also, what I meant by search without a search term is that probably 80%
> >> (hard to confirm due to the lack of stats given by the GSA) of our
> searches
> >> are done on pure metadata clauses without any searching through the
> content
> >> itself, so for example "give me documents that have a content type of
> >> video, that are marked for client X, have a category of Y or Z, and was
> >> published to platform A, ordered by date published".  The searches that
> use
> >> a search term are more like use the same query from the example as
> before,
> >> but find me all the documents that have the string "My Video" in it's
> title
> >> and description.  From the way that the GSA provides us statistics
> (which
> >> are pretty bare), it appears like they do not count "no search term"
> >> searches in part of those statistics (the GSA is not really built for
> not
> >> using search terms either, and we've had various issues using it in this
> >> way because of it).
> >>
> >> The reason we are using the GSA for this and not our MSSql database is
> >> because some of this data requires multiple, and expensive, joins and
> we do
> >> need full text search for when users want to use that option.  Also for
> >> faceting.
> >>
> >>
> >> On Wed, Feb 13, 2013 at 11:24 AM, Toke Eskildsen <
> t...@statsbiblioteket.dk>wrote:
> >>
> >>> Matthew Shapiro [m...@mshapiro.net] wrote:
> >>>
> >>> [Hardware for Solr]
> >>>
> >>> > What type of hardware (at a high level) should I be looking for.
>  Are the
> >>> > main constraints disk I/O, memory size, processing power, etc...?
> >>>
> >>> That depends on what you are trying to achieve. Broadly speaking,
> "simple"
> >>> search and retrieval is mainly I/O bound. The easy way to handle that
> is to
> >>> use SSDs as storage. However, a lot of people like the old school
> solution
> >>> and compensates for the slow seeks of spinning drives by adding  RAM
> and
> >>> doing warmup of the searcher or index files. So either SSD or RAM on
> the
> >>> I/O side. If the corpus is non-trivial is size that is, which brings us
> >>> to...
> >>>
> >>> > Right now we have about 183 documents stored in the GSA (which will
> go
> >>> up a
> >>> > lot once we are on Solr since the GSA is limiting).  The search
> systems
> >>> are
> >>> > used to display core information on several of our homepages, so our
> >>> search
> >>> > traffic is pretty significant (the GSA reports 5,683 searches in the
> last
> >>> > month, however I am 99% sure this is not correct and is not counting
> >>> search
> >>> > requests without any search terms, which consists of most of our
> search
> >>> > traffic).
> >>>
> >>> If the main amount of searches are the exact same (e.g. the empty
> search),
> >>> the result will be cached. If 5,683 searches/month is the real count,
> this
> >>> sounds like a very low amount of searches in a very limited corpus.
> Just
> >>> about any machine should be fine. I guess I am missing something here.
> >>> Could you elaborate a bit? How large is a document, how many do you
> expect
> >>> to handle, what do you expect a query to look like, how should the
> result
> >>> be presented?
> >>>
> >>> Regards,
> >>> Toke Eskildsen
>

Re: What should focus be on hardware for solr servers?

Reply via email to