Re: import efficiencies

Erick Erickson Thu, 26 May 2016 10:46:12 -0700

Solr commits aren't the issue I'd guess. All the time is
probably being spent getting the data from MySQL.

I've had some luck writing to Solr from a DB through a
SolrJ program, here's a place to get started:
searchhub.org/2012/02/14/indexing-with-solrj/
you can peel out the Tika bits pretty easily I should
think.

One technique I've used is to cache
some of the DB tables in Java's memory to keep
from having to do the secondary lookup(s). This only
really works if the "secondary table" is small enough to fit in
Java's memory of course. You can do some creative
things with caching partial tables if you can sort appropriately.

Best,
Erick

On Thu, May 26, 2016 at 9:01 AM, John Blythe <j...@curvolabs.com> wrote:
> hi all,
>
> i've got layered entities in my solr import. it's calling on some
> transactional data from a MySQL instance. there are two fields that are
> used to then lookup other information from other tables via their related
> UIDs, one of which has its own child entity w yet another select statement
> to grab up more data.
>
> it fetches at about 120/s but processes at ~50-60/s. we currently only have
> close to 500k records, but it's growing quickly and thus is becoming
> increasingly painful to make modifications due to the reimport that needs
> to then occur.
>
> i feel like i'd seen some threads regarding commits of new data,
> master/slave, or solrcloud/sharding that could help in some ways related to
> this but as of yet can't scrounge them up w my searches (ironic :p).
>
> can someone help by pointing me to some good material related to this sort
> of thing?
>
> thanks-

Re: import efficiencies

Reply via email to