If you're happy with disk sizes and indexing&search performance, there
are still holes:

Documents update instead of fields, so when you have a million
documents that say "German" and should say "French", you have to
reindex a million documents.

There are no tools for managing distributed indexes, so you're on your own.

Distributed TF/IDF is coming, but will never be perfect. So managing
your own distributed relevance strategies is a must.

On Wed, Feb 3, 2010 at 5:41 PM, AJ Asver <a...@scoopler.com> wrote:
> Hi all,
>
> I work on search at Scoopler.com, a real-time search engine which uses Solr.
>  We current use solr for indexing but then fetch data from our couchdb
> cluster using the IDs solr returns.  We are now considering storing a larger
> portion of data in Solr's index itself so we don't have to hit the DB too.
>  Assuming that we are still storing data on the db (for backend and back up
> purposes) are there any significant disadvantages to using solr as a data
> store too?
>
> We currently run a master-slave setup on EC2 using x-large slave instances
> to allow for the disk cache to use as much memory as possible.  I imagine we
> would definitely have to add more slave instances to accomodate the extra
> data we're storing (and make sure it stays in memory).
>
> Any tips would be really helpful.
> --
> AJ Asver
> Co-founder, Scoopler.com
>
> +44 (0) 7834 609830 / +1 (415) 670 9152
> a...@scoopler.com
>
>
> Follow me on Twitter: http://www.twitter.com/_aj
> Add me on Linkedin: http://www.linkedin.com/in/ajasver
> or YouNoodle: http://younoodle.com/people/ajmal_asver
>
> My Blog: http://ajasver.com
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to