Thanks Chris, This makes sense, at any time we show users a trend graph for all the tweets relevant for them in the last 15 days. So I guess keeping a shards for the last 15-20 days data would be a good option and all the other data moved to different shards each with 2 months data.
I have no idea about sharding right now, if you could point me to some resource for date wise sharding. Regards, Rohit -----Original Message----- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: 17 September 2011 00:19 To: solr-user@lucene.apache.org Subject: RE: Out of memory : Actually I am storing twitter streaming data into the core, so the rate of : index is about 12tweets(docs)/second. The same solr contains 3 other cores ... : . At any given time I dont need data more than past 15 days, unless : someone queries for it explicetly. How can this be achieved? so you are adding 12 docs a second, and you need to keep all docs forever, in case someone askes for a specific doc, but otherwise you only typically need to search for docs in the past 15 days. if you index is going to grow w/o bounds at this rate forever then it doesn't matter what tricks you try, or how you tune things -- you are always going to run out of resources unless you adopt some sort of distributed approach. off the cuff, i would suggest indexing all of the docs for a single "day" in one shard, and making most of your searches be a distributed request against the most recent 15 shards. you didn't say how people "query for it explicitly" when looking for older docs -- if it's by date then when a user asks for a specific date range you cna just query those shards explicitly, if it's by some unique id then you'll want to cache in your application the min/max id for each doc in each shard (easy enough to determine by looping over them all and doing a stast query) -Hoss