Thanks Chris,

This makes sense, at any time we show users a trend graph for all the tweets
relevant for them in the last 15 days. So I guess keeping a shards for the
last 15-20 days data would be a good option and all the other data moved to
different shards each with 2 months data.

I have no idea about sharding right now, if you could point me to some
resource for date wise sharding. 

Regards,
Rohit

-----Original Message-----
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: 17 September 2011 00:19
To: solr-user@lucene.apache.org
Subject: RE: Out of memory


: Actually I am storing twitter streaming data into the core, so the rate of
: index is about 12tweets(docs)/second. The same solr contains 3 other cores
        ...
: .         At any given time I dont need data more than past 15 days,
unless
: someone queries for it explicetly. How can this be achieved?

so you are adding 12 docs a second, and you need to keep all docs forever, 
in case someone askes for a specific doc, but otherwise you only typically 
need to search for docs in the past 15 days.

if you index is going to grow w/o bounds at this rate forever then it 
doesn't matter what tricks you try, or how you tune things -- you are 
always going to run out of resources unless you adopt some sort of 
distributed approach.

off the cuff, i would suggest indexing all of the docs for a single "day" 
in one shard, and making most of your searches be a distributed request 
against the most recent 15 shards.

you didn't say how people "query for it explicitly" when looking for older 
docs -- if it's by date then when a user asks for a specific date range 
you cna just query those shards explicitly, if it's by some unique id then 
you'll want to cache in your application the min/max id for each doc in 
each shard (easy enough to determine by looping over them all and doing a 
stast query)


-Hoss

Reply via email to