Re: How best to handle a reasonable amount to data (25TB+)

2012-02-08 Thread Petite Abeille
On Feb 8, 2012, at 10:14 AM, Danil ŢORIN wrote: > For example if you only query data for 1 month intervals, and you > partition by date, you can calculate in which shard your data can be > found, and query just that shard. This is what one calls "partition pruning" in database terms. http://en.

Re: How best to handle a reasonable amount to data (25TB+)

2012-02-08 Thread Petite Abeille
On Feb 8, 2012, at 10:14 AM, Danil ŢORIN wrote: > For example if you only query data for 1 month intervals, and you > partition by date, you can calculate in which shard your data can be > found, and query just that shard. This is what one calls "partition pruning" in database terms. http://en.

Re: How best to handle a reasonable amount to data (25TB+)

2012-02-08 Thread Danil ŢORIN
It also depends on your queries. For example if you only query data for 1 month intervals, and you partition by date, you can calculate in which shard your data can be found, and query just that shard. If you can find a partition key that is always present in the query, you can create a gazillion

Re: How best to handle a reasonable amount to data (25TB+)

2012-02-07 Thread Li Li
it's up to your machines. in our application, we indexs about 30,000,000(30M)docs/shard, and the response time is about 150ms. our machine has about 48GB memory and about 25GB is allocated to solr and other is used for disk cache in Linux. if calculated by our application, indexing 1.25T docs will

RE: How best to handle a reasonable amount to data (25TB+)

2012-02-07 Thread Peter Miller
27;t all fit in memory. That's a great resource you reference. Thanks so much, The Captn -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, 8 February 2012 1:38 PM To: java-user@lucene.apache.org Subject: Re: How best to handle a reasonable

Re: How best to handle a reasonable amount to data (25TB+)

2012-02-07 Thread Erick Erickson
to:peter.mil...@objectconsulting.com.au] > Sent: Wednesday, 8 February 2012 12:20 PM > To: java-user@lucene.apache.org > Subject: RE: How best to handle a reasonable amount to data (25TB+) > > Whoops! Very poor basic maths, I should have written it down. I was thinking > 13 shards. But yes,

RE: How best to handle a reasonable amount to data (25TB+)

2012-02-07 Thread Peter Miller
ing.com.au] Sent: Wednesday, 8 February 2012 12:20 PM To: java-user@lucene.apache.org Subject: RE: How best to handle a reasonable amount to data (25TB+) Whoops! Very poor basic maths, I should have written it down. I was thinking 13 shards. But yes, 13,000 is a bit different. Now I'm in even mo

RE: How best to handle a reasonable amount to data (25TB+)

2012-02-07 Thread Peter Miller
ch across seven years of data. Thanks a lot, The Captn -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, 8 February 2012 12:39 AM To: java-user@lucene.apache.org Subject: Re: How best to handle a reasonable amount to data (25TB+) I'm curiou

Re: How best to handle a reasonable amount to data (25TB+)

2012-02-07 Thread Erick Erickson
:peter.c.e...@gmail.com] > Sent: Monday, 6 February 2012 5:29 PM > To: java-user@lucene.apache.org > Subject: Re: How best to handle a reasonable amount to data (25TB+) > > it sounds not an issue of lucene but the logic of your app. > if you're afraid too many docs in one index

RE: How best to handle a reasonable amount to data (25TB+)

2012-02-06 Thread Peter Miller
llion documents, so that makes the 1.25 trillion number look reasonable. Any other thoughts? Thanks, The Captn. -Original Message- From: ppp c [mailto:peter.c.e...@gmail.com] Sent: Monday, 6 February 2012 5:29 PM To: java-user@lucene.apache.org Subject: Re: How best to handle a reasonable amou

Re: How best to handle a reasonable amount to data (25TB+)

2012-02-05 Thread ppp c
it sounds not an issue of lucene but the logic of your app. if you're afraid too many docs in one index you can make multiple indexes. And then search across them, then merge, then over. On Mon, Feb 6, 2012 at 10:50 AM, Peter Miller < peter.mil...@objectconsulting.com.au> wrote: > Hi, > > I have

How best to handle a reasonable amount to data (25TB+)

2012-02-05 Thread Peter Miller
Hi, I have a little bit of an unusual set of requirements, and I am looking for advice. I have researched the archives, and seen some relevant posts, but they are fairly old and not specifically a match, so I thought I would give this a try. We will eventually have about 50TB raw, non-searchab