Re: [Wikidata] minimal hardware requirements for loading wikidata dump in Blazegraph

2019-06-25 Thread Ted Thibodeau Jr
On Jun 20, 2019, at 08:37 AM, Adam Sanchez  wrote:
> 
> For your information
> 
> ...
> b) It took 43 hours to load the Wikidata RDF dump
> (wikidata-20190610-all-BETA.ttl, 383G) in the dev version of Virtuoso
> 07.20.3230.
> I had to patch Virtuoso because it was given the following error each
> time I load the RDF data
> 
> 09:58:06 PL LOG: File /backup/wikidata-20190610-all-BETA.ttl error
> 42000 TURTLE RDF loader, line 2984680: RDFGE: RDF box with a geometry
> RDF type and a non-geometry content
> 
> The virtuoso.db file turned to be 340G.
> 
> Server technical features
> 
> Architecture:  x86_64
> CPU op-mode(s):32-bit, 64-bit
> Byte Order:Little Endian
> CPU(s):12
> On-line CPU(s) list:   0-11
> Thread(s) per core:2
> Core(s) per socket:6
> Socket(s): 1
> NUMA node(s):  1
> Vendor ID: GenuineIntel
> CPU family:6
> Model: 63
> Model name:Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
> Stepping:  2
> CPU MHz:   1199.920
> CPU max MHz:   3800.
> CPU min MHz:   1200.
> BogoMIPS:  6984.39
> Virtualization:VT-x
> L1d cache: 32K
> L1i cache: 32K
> L2 cache:  256K
> L3 cache:  15360K
> NUMA node0 CPU(s): 0-11
> RAM: 128G
> 
> Best,



Hi, Adam --

We're quite interested in the time your Wikidata load took on
Virtuoso, as it seems rather slow, given our experience with
other large (and much larger!) data sets.

The hardware information you provided focused primarily on the 
processors -- but RAM and disk details are much more important 
to data loads.

Also, there are some significant Virtuoso configuration settings
(in the INI file) which have an impact.

We'd like to get the info that would let us fill in the blanks
on this spreadsheet (itself a work in progress), so we can do 
some analysis, and likely provide some tuning hints that would 
bring the Virtuoso Wikidata load time down significantly.

   
https://docs.google.com/spreadsheets/d/1-stlTC_WJmMU3xA_NxA1tSLHw6_sbpjff-5OITtrbFw/edit?usp=sharing

You can see the settings in use for some other deployments, on
the "Current" tab, which may in themselves show you some places
you could improve things immediately.

Last, we would appreciate knowing exactly what you patched to
get around the geodata error, as there are a few open issues 
along those lines, which are also works in progress.

Thanks,

Ted



--
A: Yes.  http://www.idallen.com/topposting.html
| Q: Are you sure?   
| | A: Because it reverses the logical flow of conversation.
| | | Q: Why is top posting frowned upon?

Ted Thibodeau, Jr.   //   voice +1-781-273-0900 x32
Senior Support & Evangelism  //mailto:tthibod...@openlinksw.com
 //  http://twitter.com/TallTed
OpenLink Software, Inc.  //  http://www.openlinksw.com/
 20 Burlington Mall Road, Suite 322, Burlington MA 01803
 Weblog-- http://www.openlinksw.com/blogs/
 Community -- https://community.openlinksw.com/
 LinkedIn  -- http://www.linkedin.com/company/openlink-software/
 Twitter   -- http://twitter.com/OpenLink
 Facebook  -- http://www.facebook.com/OpenLinkSoftware
Universal Data Access, Integration, and Management Technology Providers






smime.p7s
Description: S/MIME cryptographic signature
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] minimal hardware requirements for loading wikidata dump in Blazegraph

2019-06-20 Thread Kingsley Idehen
On 6/20/19 12:48 PM, hellm...@informatik.uni-leipzig.de wrote:
> Hi Adam,
> the server specs you posted are not so important. What disks did you use?
>
> They should be SSD or 15k RPM SAS to make it faster.
>
> Virtuoso can parse multi thread if you split the files before loading,
> but hdd speed is still the bottleneck.
>
> Sebastian


Yep!

And if the shared-nothing cluster edition is in use, you can run the
bulk loaders in parallel across each of the nodes in the clusters which
will reduce the load time too.

We've cluster configurations behind our LOD instance where all of
DBpedia was loaded in 15 minutes flat, and I don't mean via some massive
cluster setup just what we have behind our LOD Cloud cache instance :)


Kingsley

>
> On June 20, 2019 2:37:16 PM GMT+02:00, Adam Sanchez
>  wrote:
>
> For your information
>
> a) It took 10.2 days to load the Wikidata RDF dump
> (wikidata-20190513-all-BETA.ttl, 379G) in Blazegraph 2.1.5.
> The bigdata.jnl file turned to be 1.3T
>
> Server technical features
>
> Architecture:  x86_64
> CPU op-mode(s):32-bit, 64-bit
> Byte Order:Little Endian
> CPU(s):16
> On-line CPU(s) list:   0-15
> Thread(s) per core:2
> Core(s) per socket:8
> Socket(s): 1
> NUMA node(s):  1
> Vendor ID: GenuineIntel
> CPU family:6
> Model: 79
> Model name:Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
> Stepping:  1
> CPU MHz:   1200.476
> CPU max MHz:   3000.
> CPU min MHz:   1200.
> BogoMIPS:  4197.65
> Virtualization:VT-x
> L1d cache: 32K
> L1i cache: 32K
> L2 cache:  256K
> L3 cache:  20480K
> RAM: 128G
>
> b) It took 43 hours to load the Wikidata RDF dump
> (wikidata-20190610-all-BETA.ttl, 383G) in the dev version of Virtuoso
> 07.20.3230.
> I had to patch Virtuoso because it was given the following error each
> time I load the RDF data
>
> 09:58:06 PL LOG: File /backup/wikidata-20190610-all-BETA.ttl error
> 42000 TURTLE RDF loader, line 2984680: RDFGE: RDF box with a geometry
> RDF type and a non-geometry content
>
> The virtuoso.db file turned to be 340G.
>
> Server technical features
>
> Architecture:  x86_64
> CPU op-mode(s):32-bit, 64-bit
> Byte Order:Little Endian
> CPU(s):12
> On-line CPU(s) list:   0-11
> Thread(s) per core:2
> Core(s) per socket:6
> Socket(s): 1
> NUMA node(s):  1
> Vendor ID: GenuineIntel
> CPU family:6
> Model: 63
> Model name:Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
> Stepping:  2
> CPU MHz:   1199.920
> CPU max MHz:   3800.
> CPU min MHz:   1200.
> BogoMIPS:  6984.39
> Virtualization:VT-x
> L1d cache: 32K
> L1i cache: 32K
> L2 cache:  256K
> L3 cache:  15360K
> NUMA node0 CPU(s): 0-11
> RAM: 128G
>
> Best,
>
>
> Le mar. 4 juin 2019 à 16:37, Vi to  a écrit :
>
> V4 has 8 cores instead of 6. But well, it's a server grade
> config on purpose! Vito Il giorno mar 4 giu 2019 alle ore
> 16:32 Guillaume Lederrey  ha scritto:
>
> On Tue, Jun 4, 2019 at 3:14 PM Vi to
>  wrote:
>
> AFAIR it's a double Xeon E5-2620 v3. With modern CPUs
> frequency is not so significant. 
>
> Our latest batch of servers are: Intel(R) Xeon(R) CPU
> E5-2620 v4 @ 2.10GHz (so v4 instead of v3, but the
> difference is probably minimal).
>
> Vito Il giorno mar 4 giu 2019 alle ore 13:00 Adam
> Sanchez  ha scritto:
>
> Thanks Guillaume! One question more, what is the
> CPU frequency (GHz)? Le mar. 4 juin 2019 à 12:25,
> Guillaume Lederrey  a
> écrit :
>
> On Tue, Jun 4, 2019 at 12:18 PM Adam Sanchez
>  wrote:
>
> Hello, Does somebody know the minimal
> hardware requirements (disk size and RAM)
> for loading wikidata dump in Blazegraph? 
>
> The actual hardware requirements will depend
> on your use case. But for comparison, our
> production servers are: * 16 cores (hyper
> threaded, 32 threads) * 128G RAM * 1.5T of SSD
> storage
>
> The downloaded dump file
>   

Re: [Wikidata] minimal hardware requirements for loading wikidata dump in Blazegraph

2019-06-20 Thread hellmann
Hi Adam,
the server specs you posted are not so important. What disks did you use?

They should be SSD or 15k RPM SAS to make it faster.

Virtuoso can parse multi thread if you split the files before loading, but hdd 
speed is still the bottleneck.

Sebastian 

On June 20, 2019 2:37:16 PM GMT+02:00, Adam Sanchez  
wrote:
>For your information
>
>a) It took 10.2 days to load the Wikidata RDF dump
>(wikidata-20190513-all-BETA.ttl, 379G) in Blazegraph 2.1.5.
>The bigdata.jnl file turned to be 1.3T
>
>Server technical features
>
>Architecture:  x86_64
>CPU op-mode(s):32-bit, 64-bit
>Byte Order:Little Endian
>CPU(s):16
>On-line CPU(s) list:   0-15
>Thread(s) per core:2
>Core(s) per socket:8
>Socket(s): 1
>NUMA node(s):  1
>Vendor ID: GenuineIntel
>CPU family:6
>Model: 79
>Model name:Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
>Stepping:  1
>CPU MHz:   1200.476
>CPU max MHz:   3000.
>CPU min MHz:   1200.
>BogoMIPS:  4197.65
>Virtualization:VT-x
>L1d cache: 32K
>L1i cache: 32K
>L2 cache:  256K
>L3 cache:  20480K
>RAM: 128G
>
>b) It took 43 hours to load the Wikidata RDF dump
>(wikidata-20190610-all-BETA.ttl, 383G) in the dev version of Virtuoso
>07.20.3230.
>I had to patch Virtuoso because it was given the following error each
>time I load the RDF data
>
>09:58:06 PL LOG: File /backup/wikidata-20190610-all-BETA.ttl error
>42000 TURTLE RDF loader, line 2984680: RDFGE: RDF box with a geometry
>RDF type and a non-geometry content
>
>The virtuoso.db file turned to be 340G.
>
>Server technical features
>
>Architecture:  x86_64
>CPU op-mode(s):32-bit, 64-bit
>Byte Order:Little Endian
>CPU(s):12
>On-line CPU(s) list:   0-11
>Thread(s) per core:2
>Core(s) per socket:6
>Socket(s): 1
>NUMA node(s):  1
>Vendor ID: GenuineIntel
>CPU family:6
>Model: 63
>Model name:Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
>Stepping:  2
>CPU MHz:   1199.920
>CPU max MHz:   3800.
>CPU min MHz:   1200.
>BogoMIPS:  6984.39
>Virtualization:VT-x
>L1d cache: 32K
>L1i cache: 32K
>L2 cache:  256K
>L3 cache:  15360K
>NUMA node0 CPU(s): 0-11
>RAM: 128G
>
>Best,
>
>
>Le mar. 4 juin 2019 à 16:37, Vi to  a écrit :
>>
>> V4 has 8 cores instead of 6.
>>
>> But well, it's a server grade config on purpose!
>>
>> Vito
>>
>> Il giorno mar 4 giu 2019 alle ore 16:32 Guillaume Lederrey
> ha scritto:
>>>
>>> On Tue, Jun 4, 2019 at 3:14 PM Vi to  wrote:
>>> >
>>> > AFAIR it's a double Xeon E5-2620 v3.
>>> > With modern CPUs frequency is not so significant.
>>>
>>> Our latest batch of servers are: Intel(R) Xeon(R) CPU E5-2620 v4 @
>>> 2.10GHz (so v4 instead of v3, but the difference is probably
>minimal).
>>>
>>> > Vito
>>> >
>>> > Il giorno mar 4 giu 2019 alle ore 13:00 Adam Sanchez
> ha scritto:
>>> >>
>>> >> Thanks Guillaume!
>>> >> One question more, what is the CPU frequency (GHz)?
>>> >>
>>> >> Le mar. 4 juin 2019 à 12:25, Guillaume Lederrey
>>> >>  a écrit :
>>> >> >
>>> >> > On Tue, Jun 4, 2019 at 12:18 PM Adam Sanchez
> wrote:
>>> >> > >
>>> >> > > Hello,
>>> >> > >
>>> >> > > Does somebody know the minimal hardware requirements (disk
>size and
>>> >> > > RAM) for loading wikidata dump in Blazegraph?
>>> >> >
>>> >> > The actual hardware requirements will depend on your use case.
>But for
>>> >> > comparison, our production servers are:
>>> >> >
>>> >> > * 16 cores (hyper threaded, 32 threads)
>>> >> > * 128G RAM
>>> >> > * 1.5T of SSD storage
>>> >> >
>>> >> > > The downloaded dump file wikidata-20190513-all-BETA.ttl is
>379G.
>>> >> > > The bigdata.jnl file which stores all the triples data in
>Blazegraph
>>> >> > > is 478G but still growing.
>>> >> > > I had 1T disk but is almost full now.
>>> >> >
>>> >> > The current size of our jnl file in production is ~670G.
>>> >> >
>>> >> > Hope that helps!
>>> >> >
>>> >> > Guillaume
>>> >> >
>>> >> > > Thanks,
>>> >> > >
>>> >> > > Adam
>>> >> > >
>>> >> > > ___
>>> >> > > Wikidata mailing list
>>> >> > > Wikidata@lists.wikimedia.org
>>> >> > > https://lists.wikimedia.org/mailman/listinfo/wikidata
>>> >> >
>>> >> >
>>> >> >
>>> >> > --
>>> >> > Guillaume Lederrey
>>> >> > Engineering Manager, Search Platform
>>> >> > Wikimedia Foundation
>>> >> > UTC+2 / CEST
>>> >> >
>>> >> > ___
>>> >> > Wikidata mailing list
>>> >> > Wikidata@lists.wikimedia.org
>>> >> > https://lists.wikimedia.org/mailman/listinfo/wikidata
>>> >>
>>> >> ___
>>> >> Wikidata mailing list
>>> >> Wikidata@lists.wikimedia.org
>>> >> 

Re: [Wikidata] minimal hardware requirements for loading wikidata dump in Blazegraph

2019-06-20 Thread fn
Is that with SSD harddisks? Isn't the bottleneck the io traffic to the 
harddisks? (I suppose your are not loading into RAM?) What was you 
hardware configuration?


best regards
Finn
http://people.compute.dtu.dk/faan/

On 20/06/2019 14:37, Adam Sanchez wrote:

For your information

a) It took 10.2 days to load the Wikidata RDF dump
(wikidata-20190513-all-BETA.ttl, 379G) in Blazegraph 2.1.5.
The bigdata.jnl file turned to be 1.3T

Server technical features

Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):16
On-line CPU(s) list:   0-15
Thread(s) per core:2
Core(s) per socket:8
Socket(s): 1
NUMA node(s):  1
Vendor ID: GenuineIntel
CPU family:6
Model: 79
Model name:Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
Stepping:  1
CPU MHz:   1200.476
CPU max MHz:   3000.
CPU min MHz:   1200.
BogoMIPS:  4197.65
Virtualization:VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache:  256K
L3 cache:  20480K
RAM: 128G

b) It took 43 hours to load the Wikidata RDF dump
(wikidata-20190610-all-BETA.ttl, 383G) in the dev version of Virtuoso
07.20.3230.
I had to patch Virtuoso because it was given the following error each
time I load the RDF data

09:58:06 PL LOG: File /backup/wikidata-20190610-all-BETA.ttl error
42000 TURTLE RDF loader, line 2984680: RDFGE: RDF box with a geometry
RDF type and a non-geometry content

The virtuoso.db file turned to be 340G.

Server technical features

Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):12
On-line CPU(s) list:   0-11
Thread(s) per core:2
Core(s) per socket:6
Socket(s): 1
NUMA node(s):  1
Vendor ID: GenuineIntel
CPU family:6
Model: 63
Model name:Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
Stepping:  2
CPU MHz:   1199.920
CPU max MHz:   3800.
CPU min MHz:   1200.
BogoMIPS:  6984.39
Virtualization:VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache:  256K
L3 cache:  15360K
NUMA node0 CPU(s): 0-11
RAM: 128G

Best,


Le mar. 4 juin 2019 à 16:37, Vi to  a écrit :


V4 has 8 cores instead of 6.

But well, it's a server grade config on purpose!

Vito

Il giorno mar 4 giu 2019 alle ore 16:32 Guillaume Lederrey 
 ha scritto:


On Tue, Jun 4, 2019 at 3:14 PM Vi to  wrote:


AFAIR it's a double Xeon E5-2620 v3.
With modern CPUs frequency is not so significant.


Our latest batch of servers are: Intel(R) Xeon(R) CPU E5-2620 v4 @
2.10GHz (so v4 instead of v3, but the difference is probably minimal).


Vito

Il giorno mar 4 giu 2019 alle ore 13:00 Adam Sanchez  ha 
scritto:


Thanks Guillaume!
One question more, what is the CPU frequency (GHz)?

Le mar. 4 juin 2019 à 12:25, Guillaume Lederrey
 a écrit :


On Tue, Jun 4, 2019 at 12:18 PM Adam Sanchez  wrote:


Hello,

Does somebody know the minimal hardware requirements (disk size and
RAM) for loading wikidata dump in Blazegraph?


The actual hardware requirements will depend on your use case. But for
comparison, our production servers are:

* 16 cores (hyper threaded, 32 threads)
* 128G RAM
* 1.5T of SSD storage


The downloaded dump file wikidata-20190513-all-BETA.ttl is 379G.
The bigdata.jnl file which stores all the triples data in Blazegraph
is 478G but still growing.
I had 1T disk but is almost full now.


The current size of our jnl file in production is ~670G.

Hope that helps!

 Guillaume


Thanks,

Adam

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




--
Guillaume Lederrey
Engineering Manager, Search Platform
Wikimedia Foundation
UTC+2 / CEST

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




--
Guillaume Lederrey
Engineering Manager, Search Platform
Wikimedia Foundation
UTC+2 / CEST

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


___
Wikidata mailing list
Wikidata@lists.wikimedia.org

Re: [Wikidata] minimal hardware requirements for loading wikidata dump in Blazegraph

2019-06-20 Thread Adam Sanchez
For your information

a) It took 10.2 days to load the Wikidata RDF dump
(wikidata-20190513-all-BETA.ttl, 379G) in Blazegraph 2.1.5.
The bigdata.jnl file turned to be 1.3T

Server technical features

Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):16
On-line CPU(s) list:   0-15
Thread(s) per core:2
Core(s) per socket:8
Socket(s): 1
NUMA node(s):  1
Vendor ID: GenuineIntel
CPU family:6
Model: 79
Model name:Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
Stepping:  1
CPU MHz:   1200.476
CPU max MHz:   3000.
CPU min MHz:   1200.
BogoMIPS:  4197.65
Virtualization:VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache:  256K
L3 cache:  20480K
RAM: 128G

b) It took 43 hours to load the Wikidata RDF dump
(wikidata-20190610-all-BETA.ttl, 383G) in the dev version of Virtuoso
07.20.3230.
I had to patch Virtuoso because it was given the following error each
time I load the RDF data

09:58:06 PL LOG: File /backup/wikidata-20190610-all-BETA.ttl error
42000 TURTLE RDF loader, line 2984680: RDFGE: RDF box with a geometry
RDF type and a non-geometry content

The virtuoso.db file turned to be 340G.

Server technical features

Architecture:  x86_64
CPU op-mode(s):32-bit, 64-bit
Byte Order:Little Endian
CPU(s):12
On-line CPU(s) list:   0-11
Thread(s) per core:2
Core(s) per socket:6
Socket(s): 1
NUMA node(s):  1
Vendor ID: GenuineIntel
CPU family:6
Model: 63
Model name:Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz
Stepping:  2
CPU MHz:   1199.920
CPU max MHz:   3800.
CPU min MHz:   1200.
BogoMIPS:  6984.39
Virtualization:VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache:  256K
L3 cache:  15360K
NUMA node0 CPU(s): 0-11
RAM: 128G

Best,


Le mar. 4 juin 2019 à 16:37, Vi to  a écrit :
>
> V4 has 8 cores instead of 6.
>
> But well, it's a server grade config on purpose!
>
> Vito
>
> Il giorno mar 4 giu 2019 alle ore 16:32 Guillaume Lederrey 
>  ha scritto:
>>
>> On Tue, Jun 4, 2019 at 3:14 PM Vi to  wrote:
>> >
>> > AFAIR it's a double Xeon E5-2620 v3.
>> > With modern CPUs frequency is not so significant.
>>
>> Our latest batch of servers are: Intel(R) Xeon(R) CPU E5-2620 v4 @
>> 2.10GHz (so v4 instead of v3, but the difference is probably minimal).
>>
>> > Vito
>> >
>> > Il giorno mar 4 giu 2019 alle ore 13:00 Adam Sanchez 
>> >  ha scritto:
>> >>
>> >> Thanks Guillaume!
>> >> One question more, what is the CPU frequency (GHz)?
>> >>
>> >> Le mar. 4 juin 2019 à 12:25, Guillaume Lederrey
>> >>  a écrit :
>> >> >
>> >> > On Tue, Jun 4, 2019 at 12:18 PM Adam Sanchez  
>> >> > wrote:
>> >> > >
>> >> > > Hello,
>> >> > >
>> >> > > Does somebody know the minimal hardware requirements (disk size and
>> >> > > RAM) for loading wikidata dump in Blazegraph?
>> >> >
>> >> > The actual hardware requirements will depend on your use case. But for
>> >> > comparison, our production servers are:
>> >> >
>> >> > * 16 cores (hyper threaded, 32 threads)
>> >> > * 128G RAM
>> >> > * 1.5T of SSD storage
>> >> >
>> >> > > The downloaded dump file wikidata-20190513-all-BETA.ttl is 379G.
>> >> > > The bigdata.jnl file which stores all the triples data in Blazegraph
>> >> > > is 478G but still growing.
>> >> > > I had 1T disk but is almost full now.
>> >> >
>> >> > The current size of our jnl file in production is ~670G.
>> >> >
>> >> > Hope that helps!
>> >> >
>> >> > Guillaume
>> >> >
>> >> > > Thanks,
>> >> > >
>> >> > > Adam
>> >> > >
>> >> > > ___
>> >> > > Wikidata mailing list
>> >> > > Wikidata@lists.wikimedia.org
>> >> > > https://lists.wikimedia.org/mailman/listinfo/wikidata
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Guillaume Lederrey
>> >> > Engineering Manager, Search Platform
>> >> > Wikimedia Foundation
>> >> > UTC+2 / CEST
>> >> >
>> >> > ___
>> >> > Wikidata mailing list
>> >> > Wikidata@lists.wikimedia.org
>> >> > https://lists.wikimedia.org/mailman/listinfo/wikidata
>> >>
>> >> ___
>> >> Wikidata mailing list
>> >> Wikidata@lists.wikimedia.org
>> >> https://lists.wikimedia.org/mailman/listinfo/wikidata
>> >
>> > ___
>> > Wikidata mailing list
>> > Wikidata@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>>
>> --
>> Guillaume Lederrey
>> Engineering Manager, Search Platform
>> Wikimedia Foundation
>> UTC+2 / CEST
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> 

Re: [Wikidata] minimal hardware requirements for loading wikidata dump in Blazegraph

2019-06-04 Thread Vi to
V4 has 8 cores instead of 6.

But well, it's a server grade config on purpose!

Vito

Il giorno mar 4 giu 2019 alle ore 16:32 Guillaume Lederrey <
gleder...@wikimedia.org> ha scritto:

> On Tue, Jun 4, 2019 at 3:14 PM Vi to  wrote:
> >
> > AFAIR it's a double Xeon E5-2620 v3.
> > With modern CPUs frequency is not so significant.
>
> Our latest batch of servers are: Intel(R) Xeon(R) CPU E5-2620 v4 @
> 2.10GHz (so v4 instead of v3, but the difference is probably minimal).
>
> > Vito
> >
> > Il giorno mar 4 giu 2019 alle ore 13:00 Adam Sanchez <
> a.sanche...@gmail.com> ha scritto:
> >>
> >> Thanks Guillaume!
> >> One question more, what is the CPU frequency (GHz)?
> >>
> >> Le mar. 4 juin 2019 à 12:25, Guillaume Lederrey
> >>  a écrit :
> >> >
> >> > On Tue, Jun 4, 2019 at 12:18 PM Adam Sanchez 
> wrote:
> >> > >
> >> > > Hello,
> >> > >
> >> > > Does somebody know the minimal hardware requirements (disk size and
> >> > > RAM) for loading wikidata dump in Blazegraph?
> >> >
> >> > The actual hardware requirements will depend on your use case. But for
> >> > comparison, our production servers are:
> >> >
> >> > * 16 cores (hyper threaded, 32 threads)
> >> > * 128G RAM
> >> > * 1.5T of SSD storage
> >> >
> >> > > The downloaded dump file wikidata-20190513-all-BETA.ttl is 379G.
> >> > > The bigdata.jnl file which stores all the triples data in Blazegraph
> >> > > is 478G but still growing.
> >> > > I had 1T disk but is almost full now.
> >> >
> >> > The current size of our jnl file in production is ~670G.
> >> >
> >> > Hope that helps!
> >> >
> >> > Guillaume
> >> >
> >> > > Thanks,
> >> > >
> >> > > Adam
> >> > >
> >> > > ___
> >> > > Wikidata mailing list
> >> > > Wikidata@lists.wikimedia.org
> >> > > https://lists.wikimedia.org/mailman/listinfo/wikidata
> >> >
> >> >
> >> >
> >> > --
> >> > Guillaume Lederrey
> >> > Engineering Manager, Search Platform
> >> > Wikimedia Foundation
> >> > UTC+2 / CEST
> >> >
> >> > ___
> >> > Wikidata mailing list
> >> > Wikidata@lists.wikimedia.org
> >> > https://lists.wikimedia.org/mailman/listinfo/wikidata
> >>
> >> ___
> >> Wikidata mailing list
> >> Wikidata@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/wikidata
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
>
> --
> Guillaume Lederrey
> Engineering Manager, Search Platform
> Wikimedia Foundation
> UTC+2 / CEST
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] minimal hardware requirements for loading wikidata dump in Blazegraph

2019-06-04 Thread Guillaume Lederrey
On Tue, Jun 4, 2019 at 3:14 PM Vi to  wrote:
>
> AFAIR it's a double Xeon E5-2620 v3.
> With modern CPUs frequency is not so significant.

Our latest batch of servers are: Intel(R) Xeon(R) CPU E5-2620 v4 @
2.10GHz (so v4 instead of v3, but the difference is probably minimal).

> Vito
>
> Il giorno mar 4 giu 2019 alle ore 13:00 Adam Sanchez  
> ha scritto:
>>
>> Thanks Guillaume!
>> One question more, what is the CPU frequency (GHz)?
>>
>> Le mar. 4 juin 2019 à 12:25, Guillaume Lederrey
>>  a écrit :
>> >
>> > On Tue, Jun 4, 2019 at 12:18 PM Adam Sanchez  wrote:
>> > >
>> > > Hello,
>> > >
>> > > Does somebody know the minimal hardware requirements (disk size and
>> > > RAM) for loading wikidata dump in Blazegraph?
>> >
>> > The actual hardware requirements will depend on your use case. But for
>> > comparison, our production servers are:
>> >
>> > * 16 cores (hyper threaded, 32 threads)
>> > * 128G RAM
>> > * 1.5T of SSD storage
>> >
>> > > The downloaded dump file wikidata-20190513-all-BETA.ttl is 379G.
>> > > The bigdata.jnl file which stores all the triples data in Blazegraph
>> > > is 478G but still growing.
>> > > I had 1T disk but is almost full now.
>> >
>> > The current size of our jnl file in production is ~670G.
>> >
>> > Hope that helps!
>> >
>> > Guillaume
>> >
>> > > Thanks,
>> > >
>> > > Adam
>> > >
>> > > ___
>> > > Wikidata mailing list
>> > > Wikidata@lists.wikimedia.org
>> > > https://lists.wikimedia.org/mailman/listinfo/wikidata
>> >
>> >
>> >
>> > --
>> > Guillaume Lederrey
>> > Engineering Manager, Search Platform
>> > Wikimedia Foundation
>> > UTC+2 / CEST
>> >
>> > ___
>> > Wikidata mailing list
>> > Wikidata@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata



-- 
Guillaume Lederrey
Engineering Manager, Search Platform
Wikimedia Foundation
UTC+2 / CEST

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] minimal hardware requirements for loading wikidata dump in Blazegraph

2019-06-04 Thread Vi to
AFAIR it's a double Xeon E5-2620 v3.
With modern CPUs frequency is not so significant.

Vito

Il giorno mar 4 giu 2019 alle ore 13:00 Adam Sanchez 
ha scritto:

> Thanks Guillaume!
> One question more, what is the CPU frequency (GHz)?
>
> Le mar. 4 juin 2019 à 12:25, Guillaume Lederrey
>  a écrit :
> >
> > On Tue, Jun 4, 2019 at 12:18 PM Adam Sanchez 
> wrote:
> > >
> > > Hello,
> > >
> > > Does somebody know the minimal hardware requirements (disk size and
> > > RAM) for loading wikidata dump in Blazegraph?
> >
> > The actual hardware requirements will depend on your use case. But for
> > comparison, our production servers are:
> >
> > * 16 cores (hyper threaded, 32 threads)
> > * 128G RAM
> > * 1.5T of SSD storage
> >
> > > The downloaded dump file wikidata-20190513-all-BETA.ttl is 379G.
> > > The bigdata.jnl file which stores all the triples data in Blazegraph
> > > is 478G but still growing.
> > > I had 1T disk but is almost full now.
> >
> > The current size of our jnl file in production is ~670G.
> >
> > Hope that helps!
> >
> > Guillaume
> >
> > > Thanks,
> > >
> > > Adam
> > >
> > > ___
> > > Wikidata mailing list
> > > Wikidata@lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wikidata
> >
> >
> >
> > --
> > Guillaume Lederrey
> > Engineering Manager, Search Platform
> > Wikimedia Foundation
> > UTC+2 / CEST
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] minimal hardware requirements for loading wikidata dump in Blazegraph

2019-06-04 Thread Adam Sanchez
Thanks Guillaume!
One question more, what is the CPU frequency (GHz)?

Le mar. 4 juin 2019 à 12:25, Guillaume Lederrey
 a écrit :
>
> On Tue, Jun 4, 2019 at 12:18 PM Adam Sanchez  wrote:
> >
> > Hello,
> >
> > Does somebody know the minimal hardware requirements (disk size and
> > RAM) for loading wikidata dump in Blazegraph?
>
> The actual hardware requirements will depend on your use case. But for
> comparison, our production servers are:
>
> * 16 cores (hyper threaded, 32 threads)
> * 128G RAM
> * 1.5T of SSD storage
>
> > The downloaded dump file wikidata-20190513-all-BETA.ttl is 379G.
> > The bigdata.jnl file which stores all the triples data in Blazegraph
> > is 478G but still growing.
> > I had 1T disk but is almost full now.
>
> The current size of our jnl file in production is ~670G.
>
> Hope that helps!
>
> Guillaume
>
> > Thanks,
> >
> > Adam
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
>
> --
> Guillaume Lederrey
> Engineering Manager, Search Platform
> Wikimedia Foundation
> UTC+2 / CEST
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] minimal hardware requirements for loading wikidata dump in Blazegraph

2019-06-04 Thread Marco Neumann
thanks  Guillaume. How does that compare to the wikidata footprint of the
wikidata service (SQL) not WDQS. I presume it sits in a MyISAM storage
container?

On Tue, Jun 4, 2019 at 11:25 AM Guillaume Lederrey 
wrote:

> On Tue, Jun 4, 2019 at 12:18 PM Adam Sanchez 
> wrote:
> >
> > Hello,
> >
> > Does somebody know the minimal hardware requirements (disk size and
> > RAM) for loading wikidata dump in Blazegraph?
>
> The actual hardware requirements will depend on your use case. But for
> comparison, our production servers are:
>
> * 16 cores (hyper threaded, 32 threads)
> * 128G RAM
> * 1.5T of SSD storage
>
> > The downloaded dump file wikidata-20190513-all-BETA.ttl is 379G.
> > The bigdata.jnl file which stores all the triples data in Blazegraph
> > is 478G but still growing.
> > I had 1T disk but is almost full now.
>
> The current size of our jnl file in production is ~670G.
>
> Hope that helps!
>
> Guillaume
>
> > Thanks,
> >
> > Adam
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
>
> --
> Guillaume Lederrey
> Engineering Manager, Search Platform
> Wikimedia Foundation
> UTC+2 / CEST
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>


-- 


---
Marco Neumann
KONA
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] minimal hardware requirements for loading wikidata dump in Blazegraph

2019-06-04 Thread Guillaume Lederrey
On Tue, Jun 4, 2019 at 12:18 PM Adam Sanchez  wrote:
>
> Hello,
>
> Does somebody know the minimal hardware requirements (disk size and
> RAM) for loading wikidata dump in Blazegraph?

The actual hardware requirements will depend on your use case. But for
comparison, our production servers are:

* 16 cores (hyper threaded, 32 threads)
* 128G RAM
* 1.5T of SSD storage

> The downloaded dump file wikidata-20190513-all-BETA.ttl is 379G.
> The bigdata.jnl file which stores all the triples data in Blazegraph
> is 478G but still growing.
> I had 1T disk but is almost full now.

The current size of our jnl file in production is ~670G.

Hope that helps!

Guillaume

> Thanks,
>
> Adam
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata



-- 
Guillaume Lederrey
Engineering Manager, Search Platform
Wikimedia Foundation
UTC+2 / CEST

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata