For your information

a) It took 10.2 days to load the Wikidata RDF dump
(wikidata-20190513-all-BETA.ttl, 379G) in Blazegraph 2.1.5.
The bigdata.jnl file turned to be 1.3T

Server technical features

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                16
On-line CPU(s) list:   0-15
Thread(s) per core:    2
Core(s) per socket:    8
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 79
Model name:            Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
Stepping:              1
CPU MHz:               1200.476
CPU max MHz:           3000.0000
CPU min MHz:           1200.0000
BogoMIPS:              4197.65
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              20480K
RAM: 128G

b) It took 43 hours to load the Wikidata RDF dump
(wikidata-20190610-all-BETA.ttl, 383G) in the dev version of Virtuoso
07.20.3230.
I had to patch Virtuoso because it was given the following error each
time I load the RDF data

09:58:06 PL LOG: File /backup/wikidata-20190610-all-BETA.ttl error
42000 TURTLE RDF loader, line 2984680: RDFGE: RDF box with a geometry
RDF type and a non-geometry content

The virtuoso.db file turned to be 340G.

Server technical features

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                12
On-line CPU(s) list:   0-11
Thread(s) per core:    2
Core(s) per socket:    6
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 63
Model name:            Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
Stepping:              2
CPU MHz:               1199.920
CPU max MHz:           3800.0000
CPU min MHz:           1200.0000
BogoMIPS:              6984.39
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              15360K
NUMA node0 CPU(s):     0-11
RAM: 128G

Best,


Le mar. 4 juin 2019 à 16:37, Vi to <vituzzu.w...@gmail.com> a écrit :
>
> V4 has 8 cores instead of 6.
>
> But well, it's a server grade config on purpose!
>
> Vito
>
> Il giorno mar 4 giu 2019 alle ore 16:32 Guillaume Lederrey 
> <gleder...@wikimedia.org> ha scritto:
>>
>> On Tue, Jun 4, 2019 at 3:14 PM Vi to <vituzzu.w...@gmail.com> wrote:
>> >
>> > AFAIR it's a double Xeon E5-2620 v3.
>> > With modern CPUs frequency is not so significant.
>>
>> Our latest batch of servers are: Intel(R) Xeon(R) CPU E5-2620 v4 @
>> 2.10GHz (so v4 instead of v3, but the difference is probably minimal).
>>
>> > Vito
>> >
>> > Il giorno mar 4 giu 2019 alle ore 13:00 Adam Sanchez 
>> > <a.sanche...@gmail.com> ha scritto:
>> >>
>> >> Thanks Guillaume!
>> >> One question more, what is the CPU frequency (GHz)?
>> >>
>> >> Le mar. 4 juin 2019 à 12:25, Guillaume Lederrey
>> >> <gleder...@wikimedia.org> a écrit :
>> >> >
>> >> > On Tue, Jun 4, 2019 at 12:18 PM Adam Sanchez <a.sanche...@gmail.com> 
>> >> > wrote:
>> >> > >
>> >> > > Hello,
>> >> > >
>> >> > > Does somebody know the minimal hardware requirements (disk size and
>> >> > > RAM) for loading wikidata dump in Blazegraph?
>> >> >
>> >> > The actual hardware requirements will depend on your use case. But for
>> >> > comparison, our production servers are:
>> >> >
>> >> > * 16 cores (hyper threaded, 32 threads)
>> >> > * 128G RAM
>> >> > * 1.5T of SSD storage
>> >> >
>> >> > > The downloaded dump file wikidata-20190513-all-BETA.ttl is 379G.
>> >> > > The bigdata.jnl file which stores all the triples data in Blazegraph
>> >> > > is 478G but still growing.
>> >> > > I had 1T disk but is almost full now.
>> >> >
>> >> > The current size of our jnl file in production is ~670G.
>> >> >
>> >> > Hope that helps!
>> >> >
>> >> >     Guillaume
>> >> >
>> >> > > Thanks,
>> >> > >
>> >> > > Adam
>> >> > >
>> >> > > _______________________________________________
>> >> > > Wikidata mailing list
>> >> > > Wikidata@lists.wikimedia.org
>> >> > > https://lists.wikimedia.org/mailman/listinfo/wikidata
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Guillaume Lederrey
>> >> > Engineering Manager, Search Platform
>> >> > Wikimedia Foundation
>> >> > UTC+2 / CEST
>> >> >
>> >> > _______________________________________________
>> >> > Wikidata mailing list
>> >> > Wikidata@lists.wikimedia.org
>> >> > https://lists.wikimedia.org/mailman/listinfo/wikidata
>> >>
>> >> _______________________________________________
>> >> Wikidata mailing list
>> >> Wikidata@lists.wikimedia.org
>> >> https://lists.wikimedia.org/mailman/listinfo/wikidata
>> >
>> > _______________________________________________
>> > Wikidata mailing list
>> > Wikidata@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>>
>> --
>> Guillaume Lederrey
>> Engineering Manager, Search Platform
>> Wikimedia Foundation
>> UTC+2 / CEST
>>
>> _______________________________________________
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
> _______________________________________________
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata

_______________________________________________
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to