Re: Importing Big Data From Berkeley DB to Solr

Otis Gospodnetic Wed, 09 Nov 2011 12:48:40 -0800

Carey,

Some options:
* Just read your BDB and use SolrJ to index to Solr in batches and in parallel
* Dump your BDB into csv format and use Solr's ability to import csv files fast
* Use Hadoop MapReduce to index to Lucene or Solr in parallel


Yes, you can index using Lucene APIs directly, but you will have to make sure 
all the analysis you specify there is identical to what you have in your Solr 
schema.

Otis
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/


>________________________________
>From: Carey Sublette <csuble...@local.com>
>To: "solr-user@lucene.apache.org" <solr-user@lucene.apache.org>
>Sent: Wednesday, November 9, 2011 3:06 PM
>Subject: Importing Big Data From Berkeley DB to Solr
>
>Hi:
>
>I have a massive data repository (hundreds of millions of records) stored in 
>Berkeley DB with Java code to access it, and I need an efficient method to 
>import it into Solr for indexing. I cannot find a straightforward Java data 
>import API that I can load the data with.
>
>There is no JDBC for the DataImportHandler to call, it is not a simple file, 
>and the inefficiencies (and extra code) needed to submit it as HTTP calls, or 
>as XML feeds, etc. are measures of last resort only.
>
>Can a call a Lucene API in a Solr installation to do this somehow?
>
>Thanks
>
>
>

Re: Importing Big Data From Berkeley DB to Solr

Reply via email to