You may also be interested in looking at things like solrbase (on Github).

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Sat, Apr 6, 2013 at 6:01 PM, Furkan KAMACI <furkankam...@gmail.com> wrote:
> Hi;
>
> First of all should mention that I am new to Solr and making a research
> about it. What I am trying to do that I will crawl some websites with Nutch
> and then I will index them with Solr. (Nutch 2.1, Solr-SolrCloud 4.2 )
>
> I wonder about something. I have a cloud of machines that crawls websites
> and stores that documents. Then I send that documents into SolrCloud. Solr
> indexes that documents and generates indexes and save them. I know that
> from Information Retrieval theory: it *may* not be efficient to store
> indexes at a NoSQL database (they are something like linked lists and if
> you store them in such kind of database you *may* have a sparse
> representation -by the way there may be some solutions for it. If you
> explain them you are welcome.)
>
> However Solr stores some documents too (i.e. highlights) So some of my
> documents will be doubled somehow. If I consider that I will have many
> documents, that dobuled documents may cause a problem for me. So is there
> any way not storing that documents at Solr and pointing to them at
> Hbase(where I save my crawled documents) or instead of pointing directly
> storing them at Hbase (is it efficient or not)?

Reply via email to