Re: [Wikidata] Wikidata HDT dump

2018-10-01 Thread Paul Houle
You shouldn't have to keep anything in RAM to HDT-ize something as you 
could make the dictionary by sorting on disk and also do the joins to 
look up everything against the dictionary by sorting.


-- Original Message --
From: "Ettore RIZZA" 
To: "Discussion list for the Wikidata project." 


Sent: 10/1/2018 5:03:59 PM
Subject: Re: [Wikidata] Wikidata HDT dump

> what computer did you use for this? IIRC it required >512GB of RAM to 
function.


Hello Laura,

Sorry for my confusing message, I am not at all a member of the HDT 
team. But according to its creator 
, 100 GB "with 
an optimized code" could be enough to produce an HDT like that.


On Mon, 1 Oct 2018 at 18:59, Laura Morales  wrote:
> a new dump of Wikidata in HDT (with index) is 
available[http://www.rdfhdt.org/datasets/].


Thank you very much! Keep it up!
Out of curiosity, what computer did you use for this? IIRC it required 
>512GB of RAM to function.


> You will see how Wikidata has become huge compared to other 
datasets. it contains about twice the limit of 4B triples discussed 
above.


There is a 64-bit version of HDT that doesn't have this limitation of 
4B triples.


> In this regard, what is in 2018 the most user friendly way to use 
this format?


Speaking for me at least, Fuseki with a HDT store. But I know there 
are also some CLI tools from the HDT folks.


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata HDT dump

2018-10-01 Thread Ettore RIZZA
> what computer did you use for this? IIRC it required >512GB of RAM to
function.

Hello Laura,

Sorry for my confusing message, I am not at all a member of the HDT team.
But according to its creator
, 100 GB "with an
optimized code" could be enough to produce an HDT like that.

On Mon, 1 Oct 2018 at 18:59, Laura Morales  wrote:

> > a new dump of Wikidata in HDT (with index) is available[
> http://www.rdfhdt.org/datasets/].
>
> Thank you very much! Keep it up!
> Out of curiosity, what computer did you use for this? IIRC it required
> >512GB of RAM to function.
>
> > You will see how Wikidata has become huge compared to other datasets. it
> contains about twice the limit of 4B triples discussed above.
>
> There is a 64-bit version of HDT that doesn't have this limitation of 4B
> triples.
>
> > In this regard, what is in 2018 the most user friendly way to use this
> format?
>
> Speaking for me at least, Fuseki with a HDT store. But I know there are
> also some CLI tools from the HDT folks.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata HDT dump

2018-10-01 Thread Laura Morales
> a new dump of Wikidata in HDT (with index) is 
> available[http://www.rdfhdt.org/datasets/].

Thank you very much! Keep it up!
Out of curiosity, what computer did you use for this? IIRC it required >512GB 
of RAM to function.

> You will see how Wikidata has become huge compared to other datasets. it 
> contains about twice the limit of 4B triples discussed above.

There is a 64-bit version of HDT that doesn't have this limitation of 4B 
triples.

> In this regard, what is in 2018 the most user friendly way to use this format?

Speaking for me at least, Fuseki with a HDT store. But I know there are also 
some CLI tools from the HDT folks.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Weekly Summary #332

2018-10-01 Thread Léa Lacroix
*Here's your quick overview of what has been happening around Wikidata over
the last week.*

Discussions

   - Closed request for comments: P171
   , When
   multiple sources are cited for a fact, should IMDB be deleted as one of
   them when used
   
,
   Findagrave removed as a source for information
   
,
   How would you state the number of (rooms, restaurants, suites) in a
   hotel?
   
,
   Updating References for External Data
   


Events 

   - Past: IRC office hour
   ,
   September 25th
   - Past: Working in a World of (linked, semantic) Open Data
   .
   Keynote by User:MartinPoulter
    at University of
   Stirling Life in Data Conference
   
,
   September 28th
   - Wikidata and Wikimedia workshop
   

   (30 September) and session about data modeling
   

   (2 October) at the CIDOC 2018 Conference  of
   the International Council of Museums, Heraklion, Crete
   - There were three Wikidata-related presentations
    at
   the 10th International Conference on Ecological Informatics that took place
   on 23-28 September in Jena.
   - Upcoming: German-speaking WikiCon, October 3-5 in St Gallen
   (Switzerland). Several Wikidata-related talks and workshops in the
   programme 
   .

Press, articles, blog posts


   - Property Path use in Wikidata Queries
   , by
   Gregory Todd Williams

Other Noteworthy Stuff

   - The templates Reasonator
    and Scholia
    are available, for
   linking *from* en.Wikisource pages *to* representations of Wikidata
   items.
   - Wikidata considered unable to support hierarchical search in
   Structured Data for Commons
   
,
   see also thread
   
   .

Did you know?

   - Newest properties
   :
  - General datatypes: ENI number
  , inflection class
  , has inflection class
  , root
  , creates lexeme type
  
  - External identifiers: ASCE Historical Civil Engineering Landmark ID
  , Comic Vine ID
  , DxOMark ID
  , Geheugen van de VU
  person ID , HKCAN ID
  , Oqaasileriffik online
  dictionary ID , IANA
  Root Zone Database ID
, Shazam
  track ID , Spotify show
  ID , Shazam artist ID
  , Sprockhoff Number
  , Index of Historic
  Collectors and Dealers of Cubism ID
  , ANZSRC FoR ID
  
   - New property proposals
   
   to review:
  - General datatypes: applies to ci

Re: [Wikidata] Wikidata HDT dump

2018-10-01 Thread Ettore RIZZA
Hello,

a new dump of Wikidata in HDT (with index) is available
. You will see how Wikidata has become
huge compared to other datasets. it contains about twice the limit of 4B
triples discussed above.

In this regard, what is in 2018 the most user friendly way to use this
format?

BR,

Ettore

On Tue, 7 Nov 2017 at 15:33, Ghislain ATEMEZING <
ghislain.atemez...@gmail.com> wrote:

> Hi Jeremie,
>
> Thanks for this info.
>
> In the meantime, what about making chunks of 3.5Bio triples (or any size
> less than 4Bio) and a script to convert the dataset? Would that be possible
> ?
>
>
>
> Best,
>
> Ghislain
>
>
>
> Provenance : Courrier 
> pour Windows 10
>
>
>
> *De : *Jérémie Roquet 
> *Envoyé le :*mardi 7 novembre 2017 15:25
> *À : *Discussion list for the Wikidata project.
> 
> *Objet :*Re: [Wikidata] Wikidata HDT dump
>
>
>
> Hi everyone,
>
>
>
> I'm afraid the current implementation of HDT is not ready to handle
>
> more than 4 billions triples as it is limited to 32 bit indexes. I've
>
> opened an issue upstream: https://github.com/rdfhdt/hdt-cpp/issues/135
>
>
>
> Until this is addressed, don't waste your time trying to convert the
>
> entire Wikidata to HDT: it can't work.
>
>
>
> --
>
> Jérémie
>
>
>
> ___
>
> Wikidata mailing list
>
> Wikidata@lists.wikimedia.org
>
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata