If you're happy with disk sizes and indexing&search performance, there are still holes:
Documents update instead of fields, so when you have a million documents that say "German" and should say "French", you have to reindex a million documents. There are no tools for managing distributed indexes, so you're on your own. Distributed TF/IDF is coming, but will never be perfect. So managing your own distributed relevance strategies is a must. On Wed, Feb 3, 2010 at 5:41 PM, AJ Asver <a...@scoopler.com> wrote: > Hi all, > > I work on search at Scoopler.com, a real-time search engine which uses Solr. > We current use solr for indexing but then fetch data from our couchdb > cluster using the IDs solr returns. We are now considering storing a larger > portion of data in Solr's index itself so we don't have to hit the DB too. > Assuming that we are still storing data on the db (for backend and back up > purposes) are there any significant disadvantages to using solr as a data > store too? > > We currently run a master-slave setup on EC2 using x-large slave instances > to allow for the disk cache to use as much memory as possible. I imagine we > would definitely have to add more slave instances to accomodate the extra > data we're storing (and make sure it stays in memory). > > Any tips would be really helpful. > -- > AJ Asver > Co-founder, Scoopler.com > > +44 (0) 7834 609830 / +1 (415) 670 9152 > a...@scoopler.com > > > Follow me on Twitter: http://www.twitter.com/_aj > Add me on Linkedin: http://www.linkedin.com/in/ajasver > or YouNoodle: http://younoodle.com/people/ajmal_asver > > My Blog: http://ajasver.com > -- Lance Norskog goks...@gmail.com