I have a simple program that uses the BatchInserter to load rows from a SQL database and am running it on a modestly configured Windows machine with 2GB of RAM and setting the max heap to 500M.
Initially it was running out of memory quite soon so I introduced a flush after every 5000 nodes and it appeared that all was well. But having got further in the data load it appears to hop along nicely but the memory allocated (simply visible using windows task manager) grows and grows until I suspect it's reached its max heap size and it's written about 2M nodes then abruptly stops making any further discernible progress. It doesn't fail, just the logging I've put in to log every 5000 nodes has stopped and the CPU is 100% used - garbage collecting I suspect. Is there something I should be doing periodically in addition to the index flush to stop the heap exhaustion? My code is really simple, here's the method for loading nodes from each table:- public long restoreCollection() { resolveSql(); _log.debug("restore collection:" + getCollectionName() + " using: " + _sql + " and:" + Arrays.deepToString(_columns)); final BatchInserterIndex _index = makeIndex(); final long collectionNode = _inserter.createNode(MapUtil.map("name", getCollectionName())); _log.debug("Query db..."); getJdbcTemplate().query(_sql, new Object[] {}, new RowCallbackHandler() { public void processRow(ResultSet row) throws SQLException { final Map<String, Object> properties = extractproperties(row); long node = _inserter.createNode(properties); _inserter.createRelationship(node, collectionNode, RdmRelationship.MEMBER_OF, null); if (_index != null) for (DbColumn col : _columns) { if (col.isIndexed()) _index.add(node, MapUtil.map(col.getName(), properties.get(col.getName()))); } _collectionSize++; if ((_collectionSize % FLUSH_INTERVAL == 0)) { if (_index != null) _index.flush(); _log.debug("Added node:" + _collectionSize + " to: " + getCollectionName()); } } }); // long collectionNode = -1; if (_index != null) { _index.flush(); } _log.debug("Completed restoring " + _collectionSize + " to: " + getCollectionName()); return collectionNode; } and then around that a higher level function that handles all tables:- public void run() { throwIfNull(_restorers, "Restorers missing"); throwIfNull(_inserter, "Batch inserter missing"); int totalNodes = 0; int totalRelationships = 0; try { for (CollectionRestorer r : _restorers) { long collection = r.restoreCollection(); totalNodes += r.getCollectionSize(); _inserter.createRelationship(_inserter.getGraphDbService() .getReferenceNode().getId(), collection, RdmRelationship.CLASS_EXTENT, null); } for (ParentChildRelationshipBuilder r : _relators) { r.makeRelationships(); totalRelationships += r.getRelations(); } } finally { _inserter.shutdown(); _log.info("Batch inserter shutdown. Created: " + totalNodes + " nodes and " + totalRelationships + " relationships"); } } Any suggestions welcome. _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user