Re: Jena GeoSparql

Greg Fri, 16 Jul 2021 10:31:02 -0700

Hello,

Just to make an additional point to Lorenz's response about omitting
options. If the -t TDB folder option is being used then any inferences
and conversion will be applied and stored in the persistent dataset.
This means that file loading and the inferencing/conversion options are
not required after the first start-up. Therefore, the dataset can be
prepared in the first start-up (or as a series of data preparation
start-ups) and then subsequent start-ups can be quicker and more
minimal. I've updated to the GeoSPARQL Fuseki readme to add clarity on
these points.


Alternatively, the data can be loaded using the TDB API and the
GeoSPARQL Jena functions as also mentioned. Then GeoSPARQL Fuseki can be
started on the fully prepared TDB dataset.

All the best,

Greg

On 16/07/2021 08:01, Lorenz Buehmann wrote:

It's always a matter of whether you need one of those options or not.

So if you don't need inference, do not enable it.

If you do not need to convert WGS84 lat/long to WKT notation, do not
enable it.

In the end your query has to work and ideally you get your result as
fast as possible. But I prefer completeness over performance. Always
tune your setup and queries afterwards

On 15.07.21 21:33, Matt Whitby wrote:

StackOverflow said - rightly or wrongly - that one is always just
running a
32-big JVM (Under Windows).

Lorenz suggested omitting convert_geo param which seemed to push it
through.


On Thu, 15 Jul 2021 at 19:19, Andy Seaborne <a...@apache.org> wrote:

-Xmx1000m is 1Gbytes - that's small.

Are running a 32 bit java? There is a comment on StackOverflow comment
saying you are. The maximum possible heap size is about 1.5G.

In 32 bit mode, TDB can't cache in the same way. It will be slow.

You also you dropped the -i (RDFS inference) which also reduces the
memory footprint.

      Andy

System java for me:
openjdk version "17-ea" 2021-09-14
OpenJDK Runtime Environment (build 17-ea+19-Ubuntu-1ubuntu1)
OpenJDK 64-Bit Server VM (build 17-ea+19-Ubuntu-1ubuntu1, mixed mode,
sharing)

Even Java8:
openjdk version "1.8.0_292"
OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1-b10)
OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode)


On 15/07/2021 13:47, Matt Whitby wrote:

I tried it without the --convert_geo param, and with -Xmx1000m for the

heap

size and that did seem to fix it.

Thank you.

java -jar -Xmx1000m jena-fuseki-geosparql-3.17.0.jar --tdb spatialdb2




On Thu, 15 Jul 2021 at 13:12, Lorenz Buehmann <
buehm...@informatik.uni-leipzig.de> wrote:

1.) do you need convert_geo param?

2.) Xss does increase the stack size, if you run out of memory you
should increase Xmx to e.g. 4G or something depending on your
machine.
If the geospatial index has been generated I'm wondering what is done
in-memory, so if you don't need the convert_geo option, you should
omit
it and try again

On 15.07.21 13:28, Matt Whitby wrote:

Hi Andy, Lorenz..

I tried the TDBLoader and it generated the indexes, etc.

image.png


boot up geosparql with the tdb file and I run out of memory too.

java -jar jena-fuseki-geosparql-3.17.0.jar --convert_geo --tdb
spatialdb2 -i

/Exception in thread "main" java.lang.OutOfMemoryError: Java heap

space/

Now, I'm sure the solution is to add a param akin to *-Xss8m* to
book
the JVM with more memory.

java -jar *-Xss8m* jena-fuseki-geosparql-3.17.0.jar --convert_geo
--tdb spatialdb2

Does this sound right to you?

On Tue, 13 Jul 2021 at 22:09, Andy Seaborne <a...@apache.org
<mailto:a...@apache.org>> wrote:

      On 13/07/2021 11:31, Matt Whitby wrote:
      > Morning all.
      >
      >
      > It's just on my laptop (though with 64gb of memory so
more than
      enough I
      > would assume).

      Unless you have changed something, it is using 16G.

      However, it is not clear that it is a memory space issue.

      >
      > The file is about 850mb, so not that big in the scheme of

things.

      >
      > I don't see any log files.

      They have been printed to stdout as shown. They can go
elsewhere
      (it's
      log4j2).

      >
      > The full stack trace is...

      There was an Java Error (the code doesn't print it - an
      oversight). Out
      of memory is an error.

      How long after the "DatasetOperations :: Reading RDF -
Started -
      File:"
      output does it fail?

      It is worth checking the file parses correctly. e.g. some
encoding
      errors become Java "errors" in 3.17.0.

       > 11:26:01 INFO  DatasetOperations :: In-Memory Dataset

      That means there are multiple copies in-memory during loading.
      This does not explain using 16G.

      But the database can be loaded using the TDB bulkloader
separately
      from
      the server starting and then pass in the persistent
database so
      the file
      does not have to be read each start-up.

           Andy
      >
      > C:\Data\apache-jena-fuseki-3.17.0>java -jar
      > jena-fuseki-geosparql-3.17.0.jar --convert_geo -rf
      "nhle_spatial3.ttl" -i
      >
      > 11:26:01 INFO  Main            :: Arguments Received:
      [--convert_geo, -rf,
      > nhle_spatial3.ttl, -i]
      > 11:26:01 INFO  DatasetOperations :: Server Configuration:

port=3030,

      > datsetName=ds, loopbackOnly=true, updateAllowed=false,
      inference=true,
      > applyDefaultGeometry=false, validateGeometryLiteral=false,
      > convertGeoPredicates=true, removeGeoPredicates=false,
      queryRewrite=true,
      > tdbFile=null,
fileGraphFormats=[FileGraphFormat{rdfFile=nhle_spatial3.ttl,
      > graphName=, rdfFormat=Turtle/pretty}],
fileGraphDelimiters=[],
      > indexEnabled=true, indexSizes=[-1, -1, -1],
indexExpiries=[5000,
      5000,
      > 5000], spatialIndexFile=null, tdb2=false, help=false
      > 11:26:01 INFO  DatasetOperations :: In-Memory Dataset
      > 11:26:02 INFO  DatasetOperations :: Reading RDF - Started -

File:

      > nhle_spatial3.ttl, Graph Name: , RDF Format: Turtle/pretty
      > 11:26:02 WARN  system          :: The ôSIS_DATAö environment
      variable is
      > not set.
      > Exception in thread "main"
      org.apache.jena.sparql.JenaTransactionException:
      > Write transaction - no commit or abort before end()
      >          at
      >

org.apache.jena.sparql.core.TransactionalLock.error(TransactionalLock.java:179)

      >          at
      >

org.apache.jena.sparql.core.TransactionalLock.end(TransactionalLock.java:162)

      >          at
      >

org.apache.jena.sparql.core.DatasetGraphMap.end(DatasetGraphMap.java:80)

      >          at
org.apache.jena.sparql.core.DatasetImpl.end(DatasetImpl.java:164)
      >          at
      >

org.apache.jena.fuseki.geosparql.DatasetOperations.loadData(DatasetOperations.java:170)

      >          at
      >

org.apache.jena.fuseki.geosparql.DatasetOperations.setup(DatasetOperations.java:68)

      >          at

org.apache.jena.fuseki.geosparql.Main.main(Main.java:64)

      >
      >
      >
      >
      > On Mon, 12 Jul 2021 at 20:32, Andy Seaborne <a...@apache.org
      <mailto:a...@apache.org>> wrote:
      >
      >> There would have been more logging and more stack trace
      >>
      >> but it looks like you are loading all the data into memory
      >> Did it log "In-Memory Dataset"?
      >>
      >> How big is nhle_spatial5.ttl? How big is the machine it is
      running on?
      >>
      >>       Andy
      >>
      >> On 12/07/2021 14:44, Matt Whitby wrote:
      >>> Good afternoon all.
      >>>
      >>> I've been trying to import a TTL file into GeoSparql. 
It's
      worked fine
      >> for
      >>> smaller datasets, but now they're getting bigger (the
      production one
      >> would
      >>> be about 900mb) I'm getting an error.
      >>>
      >>> java -jar jena-fuseki-geosparql-3.17.0.jar
--convert_geo -rf
      >>> "nhle_spatial5.ttl" -i
      >>>
      >>> Error:
      >>>
      >>> Reading RDF - Started - File: nhle_spatial5.ttl, Graph
Name: ,

RDF

      >> Format:
      >>> Turtle/pretty
      >>> Exception in thread "main"
      >> org.apache.jena.sparql.JenaTransactionException:
      >>> Write transaction - no commit or abort before end()
      >>>    at
      >>>
      >>

org.apache.jena.sparql.core.TransactionalLock.error(TransactionalLock.java:179)

      >>>
      >>> Might I be correct in thinking it's a size issue?
      >>>
      >>>
      >>> Kind Regards,
      >>> M
      >>>
      >>
      >
      >



--
Matt
Southend. Essex, England

Guff follows....

Me: http://www.about.me/matt.whitby
<http://www.about.me/matt.whitby>


Photography: http://www.whitbyphoto.com
<http://www.whitbyphoto.com/>


Travels: http://www.whitbyadventures.com
<http://www.whitbyadventures.com/>


Music: http://www.last.fm/user/MattWhitby
<http://www.last.fm/user/MattWhitby/%3C/a%3E>


Reading: https://www.goodreads.com/user_challenges/19398505
<https://www.goodreads.com/user_challenges/19398505>


Development: https://www.hackerrank.com/matt_whitby
<https://www.hackerrank.com/matt_whitby>

Re: Jena GeoSparql

Reply via email to