Re: tdbloader2 performance for 1B+ triples

Andy Seaborne Fri, 10 Aug 2012 05:58:19 -0700

On 10/08/12 06:34, Michael Brunnbauer wrote:


Hello Andy,

[tdbloader2]

On Thu, Aug 09, 2012 at 06:53:59PM +0200, Michael Brunnbauer wrote:

INFO  Add: 55,550,000 Data (Batch: 52 / Avg: 8,785)
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
Any idea what a good value for -Xmx for 1B+ triples would be ?
I will try with 16384 now.


-Xmx16384M throws the memory error after 478 mio triples:

INFO  Add: 478,600,000 Data (Batch: 247 / Avg: 13,627)
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit 
exceeded


478 million / 16G heap

This is bizarre.

Previously, with 32G heap:

INFO  Add: 55,500,000 Data (Batch: 98 / Avg: 10,335)
INFO    Elapsed: 5,369.59 seconds [2012/08/09 17:45:44 CEST]
INFO  Add: 55,550,000 Data (Batch: 52 / Avg: 8,785)
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space


which is 55 million, a lot less than when you decreased the heap size.

This morning, I have loaded (this is the end of the data phase,rformatted.):


11:46:38 INFO  loader               ::
   Add: 747,400,000 Data (Batch: 187,969 / Avg: 131,924)
11:46:43 INFO  loader               ::
   Total: 747,436,151 tuples : 5,669.75 seconds :
   131,828.81 tuples/sec [2012/08/10 11:46:43 UTC]

with no change to tdbloader2 other than fix the classpath setting bug soit's -Xmx1200M.

The machine is a 34G machine in Amazon - I even forgot to halt the largedataset it is hosting but it's not public yet and only the odd developeris testing against it.

What is the data like? The data shape should only affect the buildingof the node table.


Many long literals? (might explain why the default setting was not enough)

but that does not explain why decreasing the heap size means it getsfurther.


Unrelated:

I have noticed the parameters to sort(1) could be a lot better ...

e.g.
--buffer-size=50%  --parallel=3

I'll try that out but you're crashing out in the data phase before indexcreation.


        Andy


Regards,

Michael Brunnbauer

Re: tdbloader2 performance for 1B+ triples

Reply via email to