Re: [Neo4j] BatchInserter exhausting heap...?

Paul Bandler Thu, 02 Jun 2011 07:56:31 -0700

Contrary to what I indicated earlier, the program is exhausting the heap.  So 
far I have:-
 
Tried significantly reducing the default parameters as below:-
 
Physical mem: 1535MB, Heap size: 496MB
use_memory_mapped_buffers=false
neostore.propertystore.db.index.keys.mapped_memory=1M
neostore.propertystore.db.strings.mapped_memory=30M
neostore.propertystore.db.arrays.mapped_memory=5M
neo_store=c:\neo4j-advanced-1.3\data\graph.db\neostore
neostore.relationshipstore.db.mapped_memory=10M
neostore.propertystore.db.index.mapped_memory=1M
neostore.propertystore.db.mapped_memory=20M
dump_configuration=true
cache_type=weak
neostore.nodestore.db.mapped_memory=10M
 
Split my data reading code into small selects (thinking that maybe the JDBC 
driver was accumulating memory)
 
Neither has made a noticeable difference – it runs out of memory after loading 
about the same amount of data.
 
I’m trying to do a relatively simple thing – write a batch importer that loads 
data one row at a time and writes one node at a time.  I don’t want anything 
cached.  It would seem reasonable for the default configuration when using the 
BatchInserter would facilitate this but so far it’s not been possible. 
 
Any further suggestions folks?
Regards,
Paul Bandler


Sent from my iPhone

On 2 Jun 2011, at 08:53, Paul Bandler <pband...@cseuk.co.uk> wrote:

> I monitored the heap using jconsole and much to my surprise observed that the 
> heap stayed relatively stable while the overall memory occupancy of the 
> process grew steadily until it reached the ~500M . I'm now rather confused as 
> to what else can be consuming memory like that.... Any ideas folks?
> 
> Sent from my iPhone
> 
> On 1 Jun 2011, at 20:52, Michael Hunger <michael.hun...@neotechnology.com> 
> wrote:
> 
>> props passed in to the batchinserter
>> look into messages log
>> you see the different gc behaviour
>> 
>> Michael
>> 
>> Sent from my iBrick4
>> 
>> 
>> Am 01.06.2011 um 20:44 schrieb Paul Bandler <pband...@cseuk.co.uk>:
>> 
>>> Is that simply set as a system property or via the Map passed as the second 
>>> parameter to the BatchInserterImpl constructor?  I've tried both and 
>>> doesn't seem to help.  Is there some way I can verify that it's being used?
>>> 
>>> I'm using 1.3 
>>> 
>>> On 1 Jun 2011, at 18:49, Michael Hunger wrote:
>>> 
>>>> you could use cache_type=weak
>>>> in the db properties
>>>> 
>>>> you can easily introspect java programs (heap) using jmap jconsole or 
>>>> visualvm
>>>> 
>>>> what version of neo4j are you using?
>>>> 
>>>> index.flush just sets a flag for immediate index querying
>>>> 
>>>> Sent from my iBrick4
>>>> 
>>>> 
>>>> Am 01.06.2011 um 19:18 schrieb Paul Bandler <pband...@cseuk.co.uk>:
>>>> 
>>>>> I have a simple program that uses the BatchInserter to load rows from a 
>>>>> SQL database and am running it on a modestly configured Windows machine 
>>>>> with 2GB of RAM and setting the max heap to 500M.
>>>>> 
>>>>> Initially it was running out of memory quite soon so I introduced a flush 
>>>>> after every 5000 nodes and it appeared that all was well.  But having got 
>>>>> further in the data load it appears to hop along nicely but the memory 
>>>>> allocated (simply visible using windows task manager) grows and grows 
>>>>> until I suspect it's reached its max heap size and it's written about 2M 
>>>>> nodes then abruptly stops making any further discernible progress.  It 
>>>>> doesn't fail, just the logging I've put in to log every 5000 nodes has 
>>>>> stopped and the CPU is 100% used - garbage collecting I suspect.
>>>>> 
>>>>> Is there something I should be doing periodically in addition to the    
>>>>> index flush to stop the heap exhaustion?  My code is really simple, 
>>>>> here's the method for loading nodes from each table:-
>>>>> 
>>>>> public long restoreCollection() {
>>>>>   resolveSql();
>>>>>   _log.debug("restore collection:" + getCollectionName() + " using: "
>>>>>           + _sql + " and:" + Arrays.deepToString(_columns));
>>>>>   final BatchInserterIndex _index = makeIndex();
>>>>>   final long collectionNode = _inserter.createNode(MapUtil.map("name",
>>>>>           getCollectionName()));
>>>>> 
>>>>>   _log.debug("Query db...");
>>>>>   getJdbcTemplate().query(_sql, new Object[] {},
>>>>>           new RowCallbackHandler() {
>>>>>               public void processRow(ResultSet row) throws SQLException {
>>>>>                   final Map<String, Object> properties = 
>>>>> extractproperties(row);
>>>>>                   long node = _inserter.createNode(properties);
>>>>>                   _inserter.createRelationship(node, collectionNode,
>>>>>                           RdmRelationship.MEMBER_OF, null);
>>>>>                   if (_index != null)
>>>>>                       for (DbColumn col : _columns) {
>>>>>                           if (col.isIndexed())
>>>>>                               _index.add(node, MapUtil.map(col.getName(),
>>>>>                                       properties.get(col.getName())));
>>>>>                       }
>>>>>                   _collectionSize++;
>>>>>                   if ((_collectionSize % FLUSH_INTERVAL == 0)) {
>>>>>                       if (_index != null)
>>>>>                           _index.flush();
>>>>>                       _log.debug("Added node:" + _collectionSize
>>>>>                               + " to: " + getCollectionName());
>>>>>                   }
>>>>>               }
>>>>>           });
>>>>> 
>>>>>   // long collectionNode = -1;
>>>>>   if (_index != null) {
>>>>>       _index.flush();
>>>>>   }
>>>>>   _log.debug("Completed restoring " + _collectionSize + " to: "
>>>>>           + getCollectionName());
>>>>>   return collectionNode;
>>>>> }
>>>>> 
>>>>> 
>>>>> and then around that a higher level function that handles all tables:-
>>>>> 
>>>>> public void run() {
>>>>>   throwIfNull(_restorers, "Restorers missing");
>>>>>   throwIfNull(_inserter, "Batch inserter missing");
>>>>>   int totalNodes = 0;
>>>>>   int totalRelationships = 0;
>>>>>   try {
>>>>>       for (CollectionRestorer r : _restorers) {
>>>>>           long collection = r.restoreCollection();
>>>>>           totalNodes += r.getCollectionSize();
>>>>>           _inserter.createRelationship(_inserter.getGraphDbService()
>>>>>                   .getReferenceNode().getId(), collection,
>>>>>                   RdmRelationship.CLASS_EXTENT, null);
>>>>>       }
>>>>>       for (ParentChildRelationshipBuilder r : _relators) {
>>>>>           r.makeRelationships();
>>>>>           totalRelationships += r.getRelations();
>>>>> 
>>>>>       }
>>>>>   } finally {
>>>>>       _inserter.shutdown();
>>>>>       _log.info("Batch inserter shutdown.  Created: " + totalNodes + " 
>>>>> nodes and "
>>>>>               + totalRelationships + " relationships");
>>>>>   }
>>>>> }
>>>>> 
>>>>> Any suggestions welcome.
>>>>> _______________________________________________
>>>>> Neo4j mailing list
>>>>> User@lists.neo4j.org
>>>>> https://lists.neo4j.org/mailman/listinfo/user
>>>> _______________________________________________
>>>> Neo4j mailing list
>>>> User@lists.neo4j.org
>>>> https://lists.neo4j.org/mailman/listinfo/user
>>> 
>>> _______________________________________________
>>> Neo4j mailing list
>>> User@lists.neo4j.org
>>> https://lists.neo4j.org/mailman/listinfo/user
>> _______________________________________________
>> Neo4j mailing list
>> User@lists.neo4j.org
>> https://lists.neo4j.org/mailman/listinfo/user
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Re: [Neo4j] BatchInserter exhausting heap...?

Reply via email to