: For a new project, I need to index about 20M records (30 fields) and I : have been running into issues with MySQL disconnects, right around : 15M. I've tried several remedies I've found on blogs, changing
if you can provide some concrete error/log messages and the details of how you are configuring your datasource that might help folks provide better suggestions -- youv'e said you run into a problem but you havne't provided any details for people to go on in giving you feedback. : resolved the issue. It got me wondering: Is this the way everyone does : it? What about 100M records up to 1B; are those all pulled using DIH : and a single query? I've only recently started using DIH, and while it definitely has a lot of quirks/anoyances, it seems like a pretty good 80/20 solution for indexing with Solr -- but that doens't mean it's perfect for all situations. Writing custom indexer code can certianly make sense in a lot of cases -- particularly where you already have a data pblishing system that you wnat to tie into directly -- the trick is to ensure you have a decent strategy for rebuilding the entire index should the need arrise (but this is relaly only an issue if your primary indexing solution is incremental -- many use cases can be satisifed just fine with a brute force "full rebuild periodically" impelmentation. -Hoss