A database, just to store uncommitted documents in case they might be updated, seems like it will have a pretty major impact on indexing performance. A lucene-only implementation would seem to be much lighter on resources.
-Yonik On Thu, Dec 4, 2008 at 11:32 AM, Noble Paul നോബിള് नोब्ळ् <[EMAIL PROTECTED]> wrote: > The solution will be an UpdateRequestProcessor (which itself is > pluggable).I am implementing a JDBC based one. I'll test with H2 and > MySql (and may be Derby) > > We will ship the H2 (embedded) jar > > > > > > > On Thu, Dec 4, 2008 at 9:53 PM, Ryan McKinley <[EMAIL PROTECTED]> wrote: >> Again, I would hope that solr builds a storage agnostic solution. >> >> As long as we have a simple interface to load/store documents, it should be >> easy to write a JDBC/ehcache/disk/Cassandra/whatever implementation. >> >> ryan >> >> >> On Dec 4, 2008, at 10:29 AM, Noble Paul നോബിള് नोब्ळ् wrote: >> >>> Cassandra does not meet our requirements. >>> we do not need that kind of scalability >>> >>> Moreover its future is uncertain and they are trying to incubate it into >>> Solr >>> >>> >>> On Thu, Dec 4, 2008 at 8:52 PM, Sami Siren <[EMAIL PROTECTED]> wrote: >>>> >>>> Yet another possibility: http://wiki.apache.org/incubator/Cassandra >>>> >>>> It at least claims to be scalable, no personal experience. >>>> >>>> -- >>>> Sami Siren >>>> >>>> Noble Paul ??????? ?????? wrote: >>>>> >>>>> Another persistence solution is ehcache with diskstore. It even has >>>>> replication >>>>> >>>>> I have never used ehcache . So I cannot comment on it >>>>> >>>>> any comments? >>>>> >>>>> --Noble >>>>> >>>>> On Wed, Dec 3, 2008 at 8:50 PM, Noble Paul ??????? ?????? >>>>> <[EMAIL PROTECTED]> wrote: >>>>> >>>>>> >>>>>> On Wed, Dec 3, 2008 at 5:52 PM, Grant Ingersoll <[EMAIL PROTECTED]> >>>>>> wrote: >>>>>> >>>>>>> >>>>>>> On Dec 3, 2008, at 1:28 AM, Noble Paul ??????? ?????? wrote: >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> The code can be written against JDBC. But we need to test the DDL and >>>>>>>> data types on al the supported DBs >>>>>>>> >>>>>>>> But , which one would we like to ship with Solr as a default option? >>>>>>>> >>>>>>> >>>>>>> Why do we need a default option? Is this something that is intended >>>>>>> to >>>>>>> be >>>>>>> on by default? Or, do you mean just to have one for unit tests to >>>>>>> work? >>>>>>> >>>>>> >>>>>> Default does not mean that it is enabled bby default. But if it is >>>>>> enabled I can have defaults for stuff like driver, url , DDL etc. And >>>>>> the user may not need to provide an extra jar >>>>>> >>>>>>> >>>>>>> I don't know if it is still the case, but I often find embedded dbs to >>>>>>> be >>>>>>> quite annoying since you often can't connect to them from other >>>>>>> clients >>>>>>> outside of the JVM which makes debugging harder. Of course, maybe I >>>>>>> just >>>>>>> don't know the tricks to do it. Derby is one DB that you can still >>>>>>> connect >>>>>>> to even when it is embedded. >>>>>>> >>>>>> >>>>>> Embedded is the best bet for us because of performance reasons and >>>>>> zero management. >>>>>> The users can still read the data through Solr itself . >>>>>> >>>>>>> >>>>>>> Also, whatever is chosen needs to scale to millions of documents, and >>>>>>> I >>>>>>> wonder about an embedded DB doing that. I also have a hard time >>>>>>> believing >>>>>>> that both a DB w/ millions of docs and Solr can live on the same >>>>>>> machine, >>>>>>> which is presumably what an embedded DB must do. Presumably, it also >>>>>>> needs >>>>>>> to be able to be replicated, right? >>>>>>> >>>>>> >>>>>> millions of docs.? >>>>>> then you must configure a remote DB for storage reasons >>>>>> and must manage the replication separately >>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> H2 looks impressive. the jar (small) is just 667KB and the memory >>>>>>>> footprint is small too >>>>>>>> --Noble >>>>>>>> >>>>>>>> On Wed, Dec 3, 2008 at 10:30 AM, Ryan McKinley <[EMAIL PROTECTED]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> check http://www.h2database.com/ in my view the best embedded DB >>>>>>>>> out >>>>>>>>> there. >>>>>>>>> >>>>>>>>> from the maker of HSQLDB... is second round. >>>>>>>>> >>>>>>>>> However, from anything solr, I would hope it would just rely on >>>>>>>>> JDBC. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Dec 2, 2008, at 12:08 PM, Shalin Shekhar Mangar wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> HSQLDB has a limit of upto 8GB of data. In Solr, you might want to >>>>>>>>>> go >>>>>>>>>> beyond >>>>>>>>>> that without a commit. >>>>>>>>>> >>>>>>>>>> On Tue, Dec 2, 2008 at 10:33 PM, Dawid Weiss >>>>>>>>>> <[EMAIL PROTECTED]>wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Isn't HSQLDB an option? Its performance ranges a lot depending on >>>>>>>>>>> the >>>>>>>>>>> volume of data and queries, but otherwise the license looks >>>>>>>>>>> BSDish. >>>>>>>>>>> >>>>>>>>>>> http://hsqldb.org/web/hsqlLicense.html >>>>>>>>>>> >>>>>>>>>>> Dawid >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Regards, >>>>>>>>>> Shalin Shekhar Mangar. >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> --Noble Paul >>>>>>>> >>>>>>> >>>>>>> -------------------------- >>>>>>> Grant Ingersoll >>>>>>> >>>>>>> Lucene Helpful Hints: >>>>>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance >>>>>>> http://wiki.apache.org/lucene-java/LuceneFAQ >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> -- >>>>>> --Noble Paul >>>>>> >>>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >>> >>> -- >>> --Noble Paul >> >> > > > > -- > --Noble Paul >