I have the machine running now for hours, but to be fair, I didn't produce any load in the meantime.

On 24.10.22 14:17, Andy Seaborne wrote:
Hi Bob, good article!

Especially the "check your data before loading" bit.


https://bobdc.com/miscfiles/dataset2.ttl
You can remove all those "rdfs:subClassOf" triples. That all happens automatically.

On 23/10/2022 20:36, Bob DuCharme wrote:
> The good news is that I have gotten Fuseki running on a free tier AWS
> EC2 instance with very little trouble and was able to use the HTML
> interface and the SPARQL endpoint, as described at
> https://www.bobdc.com/blog/ec2fuseki/
>
> The bad news: it just randomly stops, even when there has been no
> querying activity, typically after 30-60 minutes of being up:
>
>    17:17:50 INFO  Server          ::   OS:     Linux
> 5.10.144-127.601.amzn2.x86_64 amd64
>    17:17:50 INFO  Server          ::   PID:    3314
>    17:17:51 INFO  Server          :: Started 2022/10/23 17:17:51 UTC on
> port 3030
>    Killed
>
> The instance has 1GB of memory. I had only loaded 162K of data.
>
> Should I set JVM_ARGS different from the default?

Yes - as Lorenz says.

It needs to be less than the machine size, and allow a bit of other space (OS, file system cache).  Guess: 0.75G.


What I think is happening is that even when "nothing" is happening, there is still some small amount of work going on. Not from Fuseki itself but, for example, UI pings the server, a bit of Java runs.

The heap will slowly increase because there is no pressure to do a full GC and, if the heap size is set larger than the machine, eventually a request to grow the heap larger then the OS allows happens and the OS kills the process. No java/Fuseki log message.

Even though this work is very small, on a t2.micro "eventually" might be quite soon.


Another factor you may come across later, when using TDB2 on a small instance, is that the TDB2 caches will need tuning smaller for safety. Most likely, at 162K all the data ends up in RAM and the node table cache isn't large so it won't be a problem because it never gets very big.

For 162K of data, and it's read-only ("publishing"), I'd try putting everything in-memory at startup.

# Transactional in-memory dataset.
:dataset rdf:type ja:MemoryDataset ;
    ja:data "data1.trig"; ## Or a triples format like .ttl.
    .

which is

   fuseki-server --file DATA --update /dataset2 ## --update optional

or load with a script. Downside updates are lost.


It is possible to tune down TDB cache sizes.  And if anyone is really desperate, 32-bit JVMs (but don't unless you really have to).

The mechanism is rather clunky to apply from Fuseki at the moment.

    Andy

>
> Thanks,
>
> Bob
>

Reply via email to