Solr commits aren't the issue I'd guess. All the time is probably being spent getting the data from MySQL.
I've had some luck writing to Solr from a DB through a SolrJ program, here's a place to get started: searchhub.org/2012/02/14/indexing-with-solrj/ you can peel out the Tika bits pretty easily I should think. One technique I've used is to cache some of the DB tables in Java's memory to keep from having to do the secondary lookup(s). This only really works if the "secondary table" is small enough to fit in Java's memory of course. You can do some creative things with caching partial tables if you can sort appropriately. Best, Erick On Thu, May 26, 2016 at 9:01 AM, John Blythe <j...@curvolabs.com> wrote: > hi all, > > i've got layered entities in my solr import. it's calling on some > transactional data from a MySQL instance. there are two fields that are > used to then lookup other information from other tables via their related > UIDs, one of which has its own child entity w yet another select statement > to grab up more data. > > it fetches at about 120/s but processes at ~50-60/s. we currently only have > close to 500k records, but it's growing quickly and thus is becoming > increasingly painful to make modifications due to the reimport that needs > to then occur. > > i feel like i'd seen some threads regarding commits of new data, > master/slave, or solrcloud/sharding that could help in some ways related to > this but as of yet can't scrounge them up w my searches (ironic :p). > > can someone help by pointing me to some good material related to this sort > of thing? > > thanks-