Wolfgang, here is another link (I did not find in your link list yet) this
time to setup wikidata with blazegraph in the Google Cloud (GCE)
https://addshore.com/2019/10/your-own-wikidata-query-service-with-no-limits-part-1/
On Thu, Jun 11, 2020 at 7:14 AM Wolfgang Fahl wrote:
>
> Am 10.06.20
Am 10.06.20 um 17:46 schrieb Marco Neumann:
> Wolfang, I hear you and I've added a dataset today with 1 billion triples
> and will continue to try to add larger datasets over time.
> http://www.lotico.com/index.php/JENA_Loader_Benchmarks
>
> If you are only specifically interested in the wikidata
Exactly Andy, thank you for the additional context, and as a matter of fact
we already query / manipulate 150bn+ triples in a LOD cloud as distributed
sets every day.
But of course we frequently see practitioners in the community who look at
the Semantic Web and Jena specifically primarily as a
On 09/06/2020 12:18, Wolfgang Fahl wrote:
Marco
thank you for sharing your results. Could you please try to make the
sample size 10 and 100 times bigger for the discussion we currently have
at hand. Getting to a billion triples has not been a problem for the
WikiData import. From 1-10
Hi Andy,
Thanks for the helpful pointers by you and others.
I will change the heap settings to see if this at least allows the process to
finish. For reference, the machine has 128GB of main memory and a regular HDD
attached.
I also changed the logging settings to see the progress (would be
Marco
thank you for sharing your results. Could you please try to make the
sample size 10 and 100 times bigger for the discussion we currently have
at hand. Getting to a billion triples has not been a problem for the
WikiData import. From 1-10 billion triples it gets tougher and
for >10 billion
same here, I get the best performance on single iron with SSD and fast
DDRAM. The datacenters in the cloud tend to be very selective and you can
only get the fast dedicated hardware in a few locations in the cloud.
http://www.lotico.com/index.php/JENA_Loader_Benchmarks
In addition keep in mind
It maybe that SSD is the important factor.
1/ From a while ago, on truthy:
https://lists.apache.org/thread.html/70dde8e3d99ce3d69de613b5013c3f4c583d96161dec494ece49a412%40%3Cusers.jena.apache.org%3E
before tdb2.tdbloader was a thing.
2/ I did some (not open) testing on a mere 800M and
Hi Johannes,
thank you for bringing the issue to this mailinglist again.
At
https://stackoverflow.com/questions/61813248/jena-tdbloader-performance-and-limits
there is a question describing the issue and at
http://wiki.bitplan.com/index.php/Get_your_own_copy_of_WikiData#Test_with_Apache_Jena
a
Wouldn't it be a good idea to have a page in the Fuseki/TDB2
documentation with benchmark results and/or user-reported loading
statistics, including hardware specs?
It would also be useful to map such specs to the AWS instance types:
https://aws.amazon.com/ec2/instance-types/
On Mon, Jun 8, 2020
Hi Johannes,
On 08/06/2020 16:54, Hoffart, Johannes wrote:
Hi,
I want to load the full Wikidata dump, available at
https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.bz2 to use in
Jena.
I tried it using the tdb2.tdbloader with $JVM_ARGS set to -Xmx120G. Initially,
the
Hi Johannes,
On 08/06/2020 16:54, Hoffart, Johannes wrote:
Hi,
I want to load the full Wikidata dump, available at
https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.bz2 to use in
Jena.
I tried it using the tdb2.tdbloader with $JVM_ARGS set to -Xmx120G. Initially,
the
Thanks Johannes for starting this thread. I am facing the exact same
problem with tdb2. For any significantly large file for that matter, it
takes forever to load. I hope this problem has a solution.
Thank you.
-Ahmed
On Mon, Jun 8, 2020 at 11:55 AM Hoffart, Johannes
wrote:
> Hi,
>
> I want to
Hi,
I want to load the full Wikidata dump, available at
https://dumps.wikimedia.org/wikidatawiki/entities/latest-all.ttl.bz2 to use in
Jena.
I tried it using the tdb2.tdbloader with $JVM_ARGS set to -Xmx120G. Initially,
the progress (measured by dataset size) is quick. It slows down very much
14 matches
Mail list logo