Ah, that's a very different number.  Yes, assuming your docs are web pages, a 
single reasonably equipped machine should be able to handle that and a few 
dozen QPS.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: Dietrich <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Wednesday, March 26, 2008 2:18:53 PM
Subject: Re: How to index multiple sites with option of combining results in 
search

Makes sense, nut probably overkill for my requirements. I wasn't
really talking 275*200000, more likely the total would be something
like four million documents. I was under the assumption that a single
machine, or a simple distributed index, should be able to handle that,
is that wrong?

-ds

On Wed, Mar 26, 2008 at 2:05 PM, Otis Gospodnetic
<[EMAIL PROTECTED]> wrote:
> Dietrich,
>
>  I don't think there are established practices in the open (yet).  You could 
> design your application with a site(s)->shard mapping and then, knowing which 
> sites are involved in the query, search only the relevant shards.  This will 
> be efficient, but it would require careful management on your part.
>
>  Putting everything in a single index would just not work with "normal" 
> machines, I think.
>
>
>  Otis
>  --
>  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>  ----- Original Message ----
>  From: Dietrich <[EMAIL PROTECTED]>
>  To: solr-user@lucene.apache.org
>
>
> Sent: Wednesday, March 26, 2008 10:47:55 AM
>  Subject: Re: How to index multiple sites with option of combining results in 
> search
>
>  I understand that, and that makes sense. But, coming back to the
>  orginal question:
>  >  >  When performing searches,
>  >  >  I need to be able to search against any combination of sites.
>  >  >  Does anybody have suggestions what the best practice for a scenario
>  >  >  like that would be, considering  both indexing and querying
>  >  >  performance? Put everything into one index and filter when performing
>  >  >  the queries, or creating a separate index for each one and combining
>  >  >  results when performing the query?
>
>  Are there any established best practices for that?
>
>  -ds
>
>  On Tue, Mar 25, 2008 at 11:25 PM, Otis Gospodnetic
>  <[EMAIL PROTECTED]> wrote:
>  > Dietrich,
>  >
>  >  I pointed to SOLR-303 because 275 * 200,000 looks like a too big of a 
> number for a single machine to handle.
>  >
>  >
>  >  Otis
>  >  --
>  >  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>  >
>  >  ----- Original Message ----
>  >  From: Dietrich <[EMAIL PROTECTED]>
>  >  To: solr-user@lucene.apache.org
>  >
>  >
>  > Sent: Tuesday, March 25, 2008 7:00:17 PM
>  >  Subject: Re: How to index multiple sites with option of combining results 
> in search
>  >
>  >  On Tue, Mar 25, 2008 at 6:12 PM, Otis Gospodnetic
>  >  <[EMAIL PROTECTED]> wrote:
>  >  > Sounds like SOLR-303 is a must for you.
>  >  Why? I see the benefits of using a distributed architecture in
>  >  general, but why do you recommend it specifically for this scenario.
>  >  > Have you looked at Nutch?
>  >  I don't want to (or need to) use a crawler. I am using a crawler-base
>  >  system now, and it does not offer the flexibility I need when it comes
>  >  to custom schemes and faceting.
>  >  >
>  >  >  Otis
>  >  >  --
>  >  >  Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>  >  >
>  >  >
>  >  >
>  >  >  ----- Original Message ----
>  >  >  From: Dietrich <[EMAIL PROTECTED]>
>  >  >  To: solr-user@lucene.apache.org
>  >  >  Sent: Tuesday, March 25, 2008 4:15:23 PM
>  >  >  Subject: How to index multiple sites with option of combining results 
> in search
>  >  >
>  >  >  I am planning to index 275+ different sites with Solr, each of which
>  >  >  might have anywhere up to 200 000 documents. When performing searches,
>  >  >  I need to be able to search against any combination of sites.
>  >  >  Does anybody have suggestions what the best practice for a scenario
>  >  >  like that would be, considering  both indexing and querying
>  >  >  performance? Put everything into one index and filter when performing
>  >  >  the queries, or creating a separate index for each one and combining
>  >  >  results when performing the query?
>  >  >
>  >  >
>  >  >
>  >  >
>  >
>  >
>  >
>  >
>
>
>
>



Reply via email to