I have a simple program that uses the BatchInserter to load rows from a SQL 
database and am running it on a modestly configured Windows machine with 2GB of 
RAM and setting the max heap to 500M.

Initially it was running out of memory quite soon so I introduced a flush after 
every 5000 nodes and it appeared that all was well.  But having got further in 
the data load it appears to hop along nicely but the memory allocated (simply 
visible using windows task manager) grows and grows until I suspect it's 
reached its max heap size and it's written about 2M nodes then abruptly stops 
making any further discernible progress.  It doesn't fail, just the logging 
I've put in to log every 5000 nodes has stopped and the CPU is 100% used - 
garbage collecting I suspect.

Is there something I should be doing periodically in addition to the    index 
flush to stop the heap exhaustion?  My code is really simple, here's the method 
for loading nodes from each table:-

  public long restoreCollection() {
        resolveSql();
        _log.debug("restore collection:" + getCollectionName() + " using: "
                + _sql + " and:" + Arrays.deepToString(_columns));
        final BatchInserterIndex _index = makeIndex();
        final long collectionNode = _inserter.createNode(MapUtil.map("name",
                getCollectionName()));
 
        _log.debug("Query db...");
        getJdbcTemplate().query(_sql, new Object[] {},
                new RowCallbackHandler() {
                    public void processRow(ResultSet row) throws SQLException {
                        final Map<String, Object> properties = 
extractproperties(row);
                        long node = _inserter.createNode(properties);
                        _inserter.createRelationship(node, collectionNode,
                                RdmRelationship.MEMBER_OF, null);
                        if (_index != null)
                            for (DbColumn col : _columns) {
                                if (col.isIndexed())
                                    _index.add(node, MapUtil.map(col.getName(),
                                            properties.get(col.getName())));
                            }
                        _collectionSize++;
                        if ((_collectionSize % FLUSH_INTERVAL == 0)) {
                            if (_index != null)
                                _index.flush();
                            _log.debug("Added node:" + _collectionSize
                                    + " to: " + getCollectionName());
                        }
                    }
                });
 
        // long collectionNode = -1;
        if (_index != null) {
            _index.flush();
        }
        _log.debug("Completed restoring " + _collectionSize + " to: "
                + getCollectionName());
        return collectionNode;
    }
 

and then around that a higher level function that handles all tables:-

    public void run() {
        throwIfNull(_restorers, "Restorers missing");
        throwIfNull(_inserter, "Batch inserter missing");
        int totalNodes = 0;
        int totalRelationships = 0;
        try {
            for (CollectionRestorer r : _restorers) {
                long collection = r.restoreCollection();
                totalNodes += r.getCollectionSize();
                _inserter.createRelationship(_inserter.getGraphDbService()
                        .getReferenceNode().getId(), collection,
                        RdmRelationship.CLASS_EXTENT, null);
            }
            for (ParentChildRelationshipBuilder r : _relators) {
                r.makeRelationships();
                totalRelationships += r.getRelations();
 
            }
        } finally {
            _inserter.shutdown();
            _log.info("Batch inserter shutdown.  Created: " + totalNodes + " 
nodes and "
                    + totalRelationships + " relationships");
        }
    }
 
Any suggestions welcome.
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to