: For a new project, I need to index about 20M records (30 fields) and I
: have been running into issues with MySQL disconnects, right around
: 15M. I've tried several remedies I've found on blogs, changing

if you can provide some concrete error/log messages and the details of how 
you are configuring your datasource that might help folks provide better 
suggestions -- youv'e said you run into a problem but you havne't provided 
any details for people to go on in giving you feedback.

: resolved the issue. It got me wondering: Is this the way everyone does
: it? What about 100M records up to 1B; are those all pulled using DIH
: and a single query?

I've only recently started using DIH, and while it definitely has a lot 
of quirks/anoyances, it seems like a pretty good 80/20 solution for 
indexing with Solr -- but that doens't mean it's perfect for all 
situations.

Writing custom indexer code can certianly make sense in a lot of cases -- 
particularly where you already have a data pblishing system that you wnat 
to tie into directly -- the trick is to ensure you have a decent strategy 
for rebuilding the entire index should the need arrise (but this is relaly 
only an issue if your primary indexing solution is incremental -- many use 
cases can be satisifed just fine with a brute force "full rebuild 
periodically" impelmentation.


-Hoss

Reply via email to