On 16/12/2021 10:52, Andy Seaborne wrote:
...
I am getting a slow down during data ingestion. However, your summary
figures don't show that in the ingest phase. The whole logs may have the
signal in it but less pronounced.
My working assumption is now that it is random access to the node t
On 16/12/2021 12:32, LB wrote:
I couldn't get access to the full log as the output was too verbose for
the screen and I forgot to pipe into a file ...
Yes - familiar ...
Maybe xloader should capture it's logging.
I can confirm the triples.tmp.gz size was something around 35-40G if I
rememb
I couldn't get access to the full log as the output was too verbose for
the screen and I forgot to pipe into a file ...
I can confirm the triples.tmp.gz size was something around 35-40G if I
remember correctly.
I rerun the load now to a) keep logs and b) see if increasing the number
of threa
Awesome!
I'm really pleased to hear the news.
That's better than I feared at this scale!
How big is triples.tmp.gz? 2* that size, and the database size is the
peak storage space used. My estimate is about 40G making 604G overall.
I'd appreciate having the whole log file. Could you email it to
thank you Lorenz, I am running this test myself now again with a larger
disk. You may want to consider running a full load of wikidata as well. The
timing info and disk space you have should be sufficient.
Did we figure out a place to post the parser messages?
Marco
On Thu, Dec 16, 2021 at 10:0
Sure
wikidata-tdb/Data-0001:
total 524G
-rw-r--r-- 1 24 Dez 15 05:41 GOSP.bpt
-rw-r--r-- 1 8,0M Dez 14 12:21 GOSP.dat
-rw-r--r-- 1 8,0M Dez 14 12:21 GOSP.idn
-rw-r--r-- 1 24 Dez 15 05:41 GPOS.bpt
-rw-r--r-- 1 8,0M Dez 14 12:21 GPOS.dat
-rw-r--r-- 1 8,0M Dez 14 12:21 GPOS.idn
-rw-r--r-- 1 2
Thank you Lorenz, can you please post a directory list for Data-0001 with
file sizes.
On Thu, Dec 16, 2021 at 8:49 AM LB
wrote:
> Loading of latest WD truthy dump (6.6 billion triples) Bzip2 compressed:
>
> Server:
>
> AMD Ryzen 9 5950X (16C/32T)
> 128 GB DDR4 ECC RAM
> 2 x 3.84 TB NVMe SSD
>
Andy Seaborne created JENA-2220:
---
Summary: Move Apache HttpClient-related code into jena-jdbc
Key: JENA-2220
URL: https://issues.apache.org/jira/browse/JENA-2220
Project: Apache Jena
Issue Type
Andy Seaborne created JENA-2221:
---
Summary: Remove usage of Apache HttpClient in transition code.
Key: JENA-2221
URL: https://issues.apache.org/jira/browse/JENA-2221
Project: Apache Jena
Issue T
Loading of latest WD truthy dump (6.6 billion triples) Bzip2 compressed:
Server:
AMD Ryzen 9 5950X (16C/32T)
128 GB DDR4 ECC RAM
2 x 3.84 TB NVMe SSD
Environment:
- Ubuntu 20.04.3 LTS
- OpenJDK Runtime Environment (build 11.0.11+9-Ubuntu-0ubuntu2.20.04)
- Jena 4.3.1
Command:
tools/apache-
10 matches
Mail list logo