On 05/09/14 12:58, Hugh Cayless wrote:
Hello all,
I've used Jena-Fuseki previously, but when I needed to reload all my data
(I'm using TDB), I've generally erased the contents of my data directory
and recreated it because it's faster than dropping the graph. I'm noticing
now though that if I issue a SPARQL DROP ALL update, the graph does indeed
get dropped, but if I check the size of my data directory, it's the same as
it was. When my data gets added back, the data directory gets that much
larger, eventually causing me to run out of free space on the volume.
Is there some sort of vacuum procedure I need to run to clear the stale
data? Or a reset command that will restore the contents of the data
directory to its default, empty state? It would be nice to be able to do
this without stopping Fuseki, as it will be serving other databases besides
the one I'm currently messing with.
Thanks,
Hugh
Hugh,
Space is not recycled back to the OS so files do not get smaller. Space
is partially reused but it could be better.
The node table is not cleared up - NodeIds are reused should RDF data be
added again with the same URIs or literals. BNodes will likely be fresh
ones so they do waste space in the node tables. The cost of reference
counting node usage would be very high.
In indexes, space should be reused but isn't as well as it should be and
its only reused within the same JVM run. Restart looses the chance to
reuse the space.
I'm afraid the only reset is to stop the server and delete the files.
Fuseki2 will add the option of deleting a database. However, on MS
Windows, the well-know java bug that memory mapped files can't be
deleted until the the JVM exists blocks even this.
Andy