You may also be interested in looking at things like solrbase (on Github). Otis -- Solr & ElasticSearch Support http://sematext.com/
On Sat, Apr 6, 2013 at 6:01 PM, Furkan KAMACI <furkankam...@gmail.com> wrote: > Hi; > > First of all should mention that I am new to Solr and making a research > about it. What I am trying to do that I will crawl some websites with Nutch > and then I will index them with Solr. (Nutch 2.1, Solr-SolrCloud 4.2 ) > > I wonder about something. I have a cloud of machines that crawls websites > and stores that documents. Then I send that documents into SolrCloud. Solr > indexes that documents and generates indexes and save them. I know that > from Information Retrieval theory: it *may* not be efficient to store > indexes at a NoSQL database (they are something like linked lists and if > you store them in such kind of database you *may* have a sparse > representation -by the way there may be some solutions for it. If you > explain them you are welcome.) > > However Solr stores some documents too (i.e. highlights) So some of my > documents will be doubled somehow. If I consider that I will have many > documents, that dobuled documents may cause a problem for me. So is there > any way not storing that documents at Solr and pointing to them at > Hbase(where I save my crawled documents) or instead of pointing directly > storing them at Hbase (is it efficient or not)?