On 14/12/13 22:18, Rick Moynihan wrote:
On Thu, Dec 12, 2013 at 2:22 PM, Andy Seaborne <[email protected]> wrote:
Hi Rick,
On 12/12/13 11:03, Rick Moynihan wrote:
Hi all,
I have a script which dumps 2 modestly sized n-triples files into fuseki
via curl and a HTTP PUT.
e.g. the script does the following 2 actions:
curl -X PUT --data-binary @data/file-1.nt -H 'Content-Type: text/plain' '
http://localhost:3030/linkeddev-test/data?graph=http://foo-bar.org/graph1
curl -X PUT --data-binary @data/file-2.nt -H 'Content-Type: text/plain' '
http://localhost:3030/linkeddev-test/data?graph=http://foo-bar.org/graph2
'
And it does them one after the other, never in parallel?
Yes they're sequential, never parallel. Is parallel update an issue?
No, shouldn't be - there can be only one actively running write
transactions mediated by an internal lock.
File 1 is 162mb
File 2 is 223mb
so about 1.6 and 2.2 million triples?
740,000 and 1.6 million.
Sometimes this imports fine, other times the import takes minutes, Fuseki
consumes 380% CPU and I have to kill it after a few minutes.
When its fine, how long does it take?
Approximately 2m 40s for both datasets.
It might be GC pressure and its GC's very hard but not making signifcant
progress - tis can show as very high CPU, nothing happening and then
OOME. How much heap have you given the java process?
The other thing to look at memory mapped files. TDB uses mmapped files
which are not part of the Java heap. Don't give the Fuseki all of RAM
for the heap - leave as much for the OS to use for file system cache as
possible (but Fuseki still needs a decent heap to manage transactions).
Thanks for the advice. Raising the heap from 1.2gb to 4gb seems to have
made the problem disappear.
OK then - looks like it was close to out-of-memory and a GC was getting
scheduled very frequently.
I assume it's a 64 bit machine but which OS? (Even amongst Linuxes
handling of mmap varies for reason I don't understand.)
It's a Mac.
R.