Thanks for the e-mail. I probably should have provided more details,
but I was more interested in making sure I was approaching the problem
correctly (using DIH, with one big SELECT statement for millions of
rows) instead of solving this specific problem. Here's a partial
stacktrace from this specific problem:

Caused by: Can not read response from server.
Expected to read 4 bytes, read 0 bytes before connection was
unexpectedly lost.
        at com.mysql.jdbc.MysqlIO.readFully(
        at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(
        ... 22 more
Apr 21, 2011 3:53:28 AM
org.apache.solr.handler.dataimport.EntityProcessorBase getNext
SEVERE: getNext() failed for query 'REDACTED'
Communications link failure

The last packet successfully received from the server was 128
milliseconds ago.  The last packet sent successfully to the server was
25,273,484 milliseconds ago.

A custom indexer, so that's a fairly common practice? So when you are
dealing with these large indexes, do you try not to fully rebuild them
when you can? It's not a nightly thing, but something to do in case of
a disaster? Is there a difference in the performance of an index that
was built all at once vs. one that has had delta inserts and updates
applied over a period of months?

Thank you for your insight.

On Thu, Apr 21, 2011 at 4:31 PM, Chris Hostetter
<> wrote:
> : For a new project, I need to index about 20M records (30 fields) and I
> : have been running into issues with MySQL disconnects, right around
> : 15M. I've tried several remedies I've found on blogs, changing
> if you can provide some concrete error/log messages and the details of how
> you are configuring your datasource that might help folks provide better
> suggestions -- youv'e said you run into a problem but you havne't provided
> any details for people to go on in giving you feedback.
> : resolved the issue. It got me wondering: Is this the way everyone does
> : it? What about 100M records up to 1B; are those all pulled using DIH
> : and a single query?
> I've only recently started using DIH, and while it definitely has a lot
> of quirks/anoyances, it seems like a pretty good 80/20 solution for
> indexing with Solr -- but that doens't mean it's perfect for all
> situations.
> Writing custom indexer code can certianly make sense in a lot of cases --
> particularly where you already have a data pblishing system that you wnat
> to tie into directly -- the trick is to ensure you have a decent strategy
> for rebuilding the entire index should the need arrise (but this is relaly
> only an issue if your primary indexing solution is incremental -- many use
> cases can be satisifed just fine with a brute force "full rebuild
> periodically" impelmentation.
> -Hoss

Reply via email to