Re: [Wikidata] Concise/Notable Wikidata Dump

2019-12-18 Thread Edgard Marx
It certainly helps, however, I think Aidan's suggestion goes into the
direction of having an official dump distribution.

Imagine how many CO2 can be spared just by avoiding the computational
resource to recreate this dump every time ones need it.

Besides, it standardise the dataset used for research purposes.

On Wed, Dec 18, 2019, 11:26 Marco Fossati  wrote:

> Hi everyone,
>
> Benno (in CC) has recently announced this tool:
> https://tools.wmflabs.org/wdumps/
>
> I haven't checked it out yet, but it sounds related to Aidan's inquiry.
> Hope this helps.
>
> Cheers,
>
> Marco
>
> On 12/18/19 8:01 AM, Edgard Marx wrote:
> > +1
> >
> > On Tue, Dec 17, 2019, 19:14 Aidan Hogan  > <mailto:aid...@gmail.com>> wrote:
> >
> > Hey all,
> >
> > As someone who likes to use Wikidata in their research, and likes to
> > give students projects relating to Wikidata, I am finding it more and
> > more difficult to (recommend to) work with recent versions of
> Wikidata
> > due to the increasing dump sizes, where even the truthy version now
> > costs considerable time and machine resources to process and handle.
> In
> > some cases we just grin and bear the costs, while in other cases we
> > apply an ad hoc sampling to be able to play around with the data and
> > try
> > things quickly.
> >
> > More generally, I think the growing data volumes might inadvertently
> > scare people off taking the dumps and using them in their research.
> >
> > One idea we had recently to reduce the data size for a student
> project
> > while keeping the most notable parts of Wikidata was to only keep
> > claims
> > that involve an item linked to Wikipedia; in other words, if the
> > statement involves a Q item (in the "subject" or "object") not
> > linked to
> > Wikipedia, the statement is removed.
> >
> > I wonder would it be possible for Wikidata to provide such a dump to
> > download (e.g., in RDF) for people who prefer to work with a more
> > concise sub-graph that still maintains the most "notable" parts?
> While
> > of course one could compute this from the full-dump locally, making
> > such
> > a version available as a dump directly would save clients some
> > resources, potentially encourage more research using/on Wikidata, and
> > having such a version "rubber-stamped" by Wikidata would also help to
> > justify the use of such a dataset for research purposes.
> >
> > ... just an idea I thought I would float out there. Perhaps there is
> > another (better) way to define a concise dump.
> >
> > Best,
> > Aidan
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org <mailto:Wikidata@lists.wikimedia.org>
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
> >
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
> >
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Concise/Notable Wikidata Dump

2019-12-17 Thread Edgard Marx
+1

On Tue, Dec 17, 2019, 19:14 Aidan Hogan  wrote:

> Hey all,
>
> As someone who likes to use Wikidata in their research, and likes to
> give students projects relating to Wikidata, I am finding it more and
> more difficult to (recommend to) work with recent versions of Wikidata
> due to the increasing dump sizes, where even the truthy version now
> costs considerable time and machine resources to process and handle. In
> some cases we just grin and bear the costs, while in other cases we
> apply an ad hoc sampling to be able to play around with the data and try
> things quickly.
>
> More generally, I think the growing data volumes might inadvertently
> scare people off taking the dumps and using them in their research.
>
> One idea we had recently to reduce the data size for a student project
> while keeping the most notable parts of Wikidata was to only keep claims
> that involve an item linked to Wikipedia; in other words, if the
> statement involves a Q item (in the "subject" or "object") not linked to
> Wikipedia, the statement is removed.
>
> I wonder would it be possible for Wikidata to provide such a dump to
> download (e.g., in RDF) for people who prefer to work with a more
> concise sub-graph that still maintains the most "notable" parts? While
> of course one could compute this from the full-dump locally, making such
> a version available as a dump directly would save clients some
> resources, potentially encourage more research using/on Wikidata, and
> having such a version "rubber-stamped" by Wikidata would also help to
> justify the use of such a dataset for research purposes.
>
> ... just an idea I thought I would float out there. Perhaps there is
> another (better) way to define a concise dump.
>
> Best,
> Aidan
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] dcatap namespace in WDQS

2019-07-30 Thread Edgard Marx
Great to see ideas envisioned in one paper [1] such as data catalogs and
KNS (Knowledge Name Services) being implemented on major knowledge graphs
and having a profound impact on publishing and querying data on the Web.

[1] https://ieeexplore.ieee.org/document/7889519

:-)

On Mon, Jul 29, 2019 at 9:03 AM Federico Leva (Nemo) 
wrote:

> Stas Malyshev, 29/07/19 04:14:
> > As part of our Wikidata Query Service setup, we maintain the namespace
> > serving DCAT-AP (DCAT Application Profile) data[1].
>
> How many of the endpoints we federate with support DCAT-AP? I suppose
> federated queries may benefit the most from it.
>
> DCAT-AP is allegedly taking off and it's given great importance for
> instance in the EU Open data maturity ranking
> <
> https://www.europeandataportal.eu/en/highlights/measurement-open-data-maturity-europe>:
>
> Italy, which is otherwise a laggard on open data, was scored high
> because its national portal embraced DCAT-AP.
>
> Federico
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata HDT dump

2017-10-28 Thread Edgard Marx
On Sat, Oct 28, 2017 at 2:31 PM, Laura Morales  wrote:

> > KBox is an alternative to other existing architectures for publishing KB
> such as SPARQL endpoints (e.g. LDFragments, Virtuoso), and Dump files.
> > I should add that you can do federated query with KBox as as easier as
> you can do with SPARQL endpoints.
>
>
> OK, but I still fail to see what is the value of this? What's the reason
> why I'd want to use it rather than just start a Fuseki endpoint, or use
> linked-fragments?
>

I agree that KBox is not indicated to all scenarios, rather, it fits to
users that query frequently a KG,
plus do not want to spend time downloading and indexing dump files.
KBox bridge this cumbersome task, plus, it shift query execution to the
client, so no scalability issues.
BTW, if you want to work with Javascript you can also simple start an local
endpoint:

https://github.com/AKSW/KBox/blob/master/README.md#starting-a-sparql-endpoint


>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata HDT dump

2017-10-28 Thread Edgard Marx
Hoi Laura,

Thnks for the opportunity to clarify it.
KBox is an alternative to other existing architectures for publishing KB
such as SPARQL endpoints (e.g. LDFragments, Virtuoso), and Dump files.
I should add that you can do federated query with KBox as as easier as you
can do with SPARQL endpoints.
Here an example:

https://github.com/AKSW/KBox#how-can-i-query-multi-bases

You can use KBox either on JAVA API or command prompt.

best,

http://emarx.org

On Sat, Oct 28, 2017 at 1:16 PM, Laura Morales  wrote:

> > No, the idea is that each organization will have its own KNS, so users
> can add the KNS that they want.
>
> How would this compare with a traditional SPARQL endpoint + "federated
> queries", or with "linked fragments"?
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata HDT dump

2017-10-28 Thread Edgard Marx
Hoi Ghislain,

On Sat, Oct 28, 2017 at 9:54 AM, Ghislain ATEMEZING <
ghislain.atemez...@gmail.com> wrote:

> Hello emarx,
> Many thanks for sharing KBox. Very interesting project!
>

thanks


> One question, how do you deal with different versions of the KB, like the
> case here of wikidata dump?
>

KBox works with the so called KNS (Knowledge Name Service) servers, so any
dataset publisher can have his own KNS.
Each dataset has its own KN (Knowledge Name) that is distributed over the
KNS (Knowledge Name Service).
E.g. wikidata dump is https://www.wikidata.org/20160801.


> Do you fetch their repo every xx time?
>

No, the idea is that each organization will have its own KNS, so users can
add the KNS that they want.
Currently all datasets available in KBox KNS are served by KBox team.
You can check all of them here kbox.tech, or using the command line (
https://github.com/AKSW/KBox#how-can-i-list-available-knowledge-bases).


> Also, for avoiding your users to re-create the models, you can pre-load
> "models" from LOV catalog.
>

We plan to share all LOD datasets in KBox, we are currently in discussing
this with W3C,
DBpedia might have its own KNS soon.
Regarding LOV catalog, you can help by just asking them to publish their
catalog in KBox.

best,

http://emarx.org


>
> Cheers,
> Ghislain
>
> 2017-10-27 21:56 GMT+02:00 Edgard Marx <digam...@gmail.com>:
>
>> Hey guys,
>>
>> I don't know if you already knew about it,
>> but you can use KBox for Wikidata, DBpedia, Freebase, Lodstats...
>>
>> https://github.com/AKSW/KBox
>>
>> And yes, you can also use it to merge your graph with one of those
>>
>> https://github.com/AKSW/KBox#how-can-i-query-multi-bases
>>
>>  cheers,
>> 
>>
>>
>>
>> On Oct 27, 2017 21:02, "Jasper Koehorst" <jasperkoeho...@gmail.com>
>> wrote:
>>
>> I will look into the size of the jnl file but should that not be located
>> where the blazegraph is running from the sparql endpoint or is this a
>> special flavour?
>> Was also thinking of looking into a gitlab runner which occasionally
>> could generate a HDT file from the ttl dump if our server can handle it but
>> for this an md5 sum file would be preferable or should a timestamp be
>> sufficient?
>>
>> Jasper
>>
>>
>> > On 27 Oct 2017, at 18:58, Jérémie Roquet <jroq...@arkanosis.net> wrote:
>> >
>> > 2017-10-27 18:56 GMT+02:00 Jérémie Roquet <jroq...@arkanosis.net>:
>> >> 2017-10-27 18:51 GMT+02:00 Luigi Assom <itsawesome@gmail.com>:
>> >>> I found and share this resource:
>> >>> http://www.rdfhdt.org/datasets/
>> >>>
>> >>> there is also Wikidata dump in HDT
>> >>
>> >> The link to the Wikidata dump seems dead, unfortunately :'(
>> >
>> > … but there's a file on the server:
>> > http://gaia.infor.uva.es/hdt/wikidata-20170313-all-BETA.hdt.gz (ie.
>> > the link was missing the “.gz”)
>> >
>> > --
>> > Jérémie
>> >
>> > ___
>> > Wikidata mailing list
>> > Wikidata@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
>
> --
>
> "*Love all, trust a few, do wrong to none*" (W. Shakespeare)
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata - short biographies

2016-02-02 Thread Edgard Marx
Hey,

I recommend you to not post doubts related with third part systems or
softwares that are not related with Wikidata or Wikimida here.
In case of RDFSlice there is a page called issues (
https://bitbucket.org/emarx/rdfslice/issues),
where you can open an issue and someone will answer you.

I also advise you to post your command line or a error, so the developers
can better understand it and quickly fix it (if there is a problem).

best regards,
Edgard

On Tue, Feb 2, 2016 at 7:18 AM, Hampton Snowball <hamptonsnowb...@gmail.com>
wrote:

> I was able to semi-successfully use RDFSlice with the dump using Windows
> command prompt.  Only, maybe because it's a 5gb dump file I am getting java
> errors line after line as it goes through the file
> (java.lang.StringIndexOutOfBoundsException: String index out of range - 1.
> Sometimes the last number changes).
>
> I thought it might might be a memory issue.  Increasing memory with the
> -Xmx2G command (or 3G, 4G) I haven't had luck with.  Any tips would be
> appreciated.
>
> Thanks
>
> On Mon, Feb 1, 2016 at 7:28 PM, Hampton Snowball <
> hamptonsnowb...@gmail.com> wrote:
>
>> Of course I meant sorry if this is a dumb question :)
>>
>>
>>
>> On Mon, Feb 1, 2016 at 7:13 PM, Hampton Snowball <
>> hamptonsnowb...@gmail.com> wrote:
>>
>>> Sorry if this is a dump question (I'm not a developer).  To run the
>>> command on the rdfslice program in mentions (" java -jar rdfslice.jar
>>> -source | -patterns  -out  -order
>>>  -debug ), can this be done with windows command
>>> prompt? or do I need some special developer version of java/console?
>>>
>>> Thanks for the tool.
>>>
>>> On Sun, Jan 31, 2016 at 3:53 PM, Edgard Marx <
>>> m...@informatik.uni-leipzig.de> wrote:
>>>
>>>> Hey,
>>>> you can simple use RDFSlice (
>>>> https://bitbucket.org/emarx/rdfslice/overview) directly on the dump
>>>> file (https://dumps.wikimedia.org/wikidatawiki/entities/20160125/)
>>>>
>>>> best,
>>>> Edgard
>>>>
>>>> On Sun, Jan 31, 2016 at 7:43 PM, Hampton Snowball <
>>>> hamptonsnowb...@gmail.com> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I am interested in a subset of wikidata and I am trying to find the
>>>>> best way to get it without getting a larger dataset then necessary.
>>>>>
>>>>> Is there a way to just get the "bios" that appear on the wikidata
>>>>> pages below the name of the person/organization, as well as the link to 
>>>>> the
>>>>> english wikipedia page / or all wikipedia pages?
>>>>>
>>>>> For example from: https://www.wikidata.org/wiki/Q1652291;
>>>>>
>>>>> "Turkish female given name"
>>>>> https://en.wikipedia.org/wiki/H%C3%BClya
>>>>> and optionally https://de.wikipedia.org/wiki/H%C3%BClya
>>>>>
>>>>> I know there is SPARQL which previously this list helped me construct
>>>>> a query, but I know some requests seem to timeout when looking at a large
>>>>> amount of data so I am not sure this would work.
>>>>>
>>>>> The dumps I know are the full dataset, but I am not sure if there's
>>>>> any other subset dumps available or better way of grabbing this data
>>>>>
>>>>> Thanks in advance,
>>>>> HS
>>>>>
>>>>>
>>>>> ___
>>>>> Wikidata mailing list
>>>>> Wikidata@lists.wikimedia.org
>>>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>>>
>>>>>
>>>>
>>>> ___
>>>> Wikidata mailing list
>>>> Wikidata@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>>
>>>>
>>>
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata - short biographies

2016-02-01 Thread Edgard Marx
Yep,

Please notes that RDFSlice will take the subset.
That is, the triples that contain the property that you are looking for.
Here go three examples of SPARQL queries:

ps: you can try them here https://query.wikidata.org.

** For your example,*

SELECT *
WHERE
{
   <http://www.wikidata.org/entity/Q1652291>  <http://schema.org/description>
?o .
filter(lang(?o)='en').
}


** For all English bios:*

SELECT *
WHERE
{
   ?s <http://schema.org/description> ?o .
   filter(lang(?o)='en').
}

** For all language bios:*

SELECT *
WHERE
{
   <http://www.wikidata.org/entity/Q1652291>  <http://schema.org/description>
?o .
}


best,
Edgard



On Mon, Feb 1, 2016 at 4:34 AM, Hampton Snowball <hamptonsnowb...@gmail.com>
wrote:

> Thanks. I see it requires constructing a query to only extract the data
> you want. E.g. the graph pattern:
>
>  - desired query, e.g. "SELECT * WHERE {?s ?p ?o}" or graph
> pattern e.g. "{?s ?p ?o}"
>
> Since I don't know about constructing queries, would you be able to tell
> me what would be the proper query to extract from all the pages the short
> bio, english wikipedia, maybe other wikipedias?
>
> For example from: https://www.wikidata.org/wiki/Q1652291;
>
> "Turkish female given name"
> https://en.wikipedia.org/wiki/H%C3%BClya
> and optionally https://de.wikipedia.org/wiki/H%C3%BClya
>
> Thanks in advance!
>
>
> On Sun, Jan 31, 2016 at 3:53 PM, Edgard Marx <
> m...@informatik.uni-leipzig.de> wrote:
>
>> Hey,
>> you can simple use RDFSlice (
>> https://bitbucket.org/emarx/rdfslice/overview) directly on the dump file
>> (https://dumps.wikimedia.org/wikidatawiki/entities/20160125/)
>>
>> best,
>> Edgard
>>
>> On Sun, Jan 31, 2016 at 7:43 PM, Hampton Snowball <
>> hamptonsnowb...@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> I am interested in a subset of wikidata and I am trying to find the best
>>> way to get it without getting a larger dataset then necessary.
>>>
>>> Is there a way to just get the "bios" that appear on the wikidata pages
>>> below the name of the person/organization, as well as the link to the
>>> english wikipedia page / or all wikipedia pages?
>>>
>>> For example from: https://www.wikidata.org/wiki/Q1652291;
>>>
>>> "Turkish female given name"
>>> https://en.wikipedia.org/wiki/H%C3%BClya
>>> and optionally https://de.wikipedia.org/wiki/H%C3%BClya
>>>
>>> I know there is SPARQL which previously this list helped me construct a
>>> query, but I know some requests seem to timeout when looking at a large
>>> amount of data so I am not sure this would work.
>>>
>>> The dumps I know are the full dataset, but I am not sure if there's any
>>> other subset dumps available or better way of grabbing this data
>>>
>>> Thanks in advance,
>>> HS
>>>
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata - short biographies

2016-02-01 Thread Edgard Marx
Wikidata seems to be highly queried by now, it is a public endoint.

However, the query bellow might work in RDFSlice:

ps: notice that the subject variable (?article) contains the wikipedia link
and it will be extracted.

SELECT *
WHERE
{
   ?article <http://schema.org/description> ?o .
   ?article <http://schema.org/about> ?o1 .
   ?article <http://www.w3.org/2000/01/rdf-schema#label> ?o2 .
}

best,
Edgard

On Mon, Feb 1, 2016 at 5:12 PM, Hampton Snowball <hamptonsnowb...@gmail.com>
wrote:

> Thank you. This will give me the bios, however, I still want the
> associated wikipedia links.  Previously someone had given me a query that
> included the english wikipedia along with another property. You can see it
> below:
>
>
> PREFIX wd: <http://www.wikidata.org/entity/>
> PREFIX wdt: <http://www.wikidata.org/prop/direct/>
> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
> PREFIX schema: <http://schema.org/>
>
> SELECT ?item  ?twitter ?article WHERE {
>   ?item wdt:P2002 ?twitter
>   OPTIONAL {?item rdfs:label ?item_label filter (lang(?item_label) = "en")
> .}
>
>   ?article schema:about ?item .
>   ?article schema:inLanguage "en" .
>   FILTER (SUBSTR(str(?article), 1, 25) = "https://en.wikipedia.org/;)
>
>  }
> ORDER BY ASC (?article)
>
>
> *I tried to take the PREFIX header and this portion to append to some of
> your queries.  *
>
>   ?article schema:about ?item .
>   ?article schema:inLanguage "en" .
>   FILTER (SUBSTR(str(?article), 1, 25) = "https://en.wikipedia.org/;)
>
>
> *The first one, which seems to be only for 1 record, just as a test seemed
> to give me an ERROR though:*
>
>
> PREFIX wd: <http://www.wikidata.org/entity/>
> PREFIX wdt: <http://www.wikidata.org/prop/direct/>
> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
> PREFIX schema: <http://schema.org/>
>
> SELECT *
> WHERE
> {
><http://www.wikidata.org/entity/Q1652291>  <
> http://schema.org/description> ?o .
> filter(lang(?o)='en').
>
> ?article schema:about ?item .
> ?article schema:inLanguage "en" .
> FILTER (SUBSTR(str(?article), 1, 25) = "https://en.wikipedia.org/;)
> }
>
> *So I assume the other queries like this would not work (would timeout on
> query.wikidata.org <http://query.wikidata.org> so can't test):*
>
>
> PREFIX wd: <http://www.wikidata.org/entity/>
> PREFIX wdt: <http://www.wikidata.org/prop/direct/>
> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
> PREFIX schema: <http://schema.org/>
>
> SELECT *
> WHERE
> {
>?s <http://schema.org/description> ?o .
>filter(lang(?o)='en').
>
> ?article schema:about ?item .
> ?article schema:inLanguage "en" .
> FILTER (SUBSTR(str(?article), 1, 25) = "https://en.wikipedia.org/;)
> }
>
>
> So am I doing something wrong with these combined queries in the syntax?
>
> Thanks in advance again, and the help thus far!
>
>
> On Mon, Feb 1, 2016 at 1:19 AM, Edgard Marx <
> m...@informatik.uni-leipzig.de> wrote:
>
>> Yep,
>>
>> Please notes that RDFSlice will take the subset.
>> That is, the triples that contain the property that you are looking for.
>> Here go three examples of SPARQL queries:
>>
>> ps: you can try them here https://query.wikidata.org.
>>
>> ** For your example,*
>>
>> SELECT *
>> WHERE
>> {
>><http://www.wikidata.org/entity/Q1652291>  <
>> http://schema.org/description> ?o .
>> filter(lang(?o)='en').
>> }
>>
>>
>> ** For all English bios:*
>>
>> SELECT *
>> WHERE
>> {
>>?s <http://schema.org/description> ?o .
>>filter(lang(?o)='en').
>> }
>>
>> ** For all language bios:*
>>
>> SELECT *
>> WHERE
>> {
>><http://www.wikidata.org/entity/Q1652291>  <
>> http://schema.org/description> ?o .
>> }
>>
>>
>> best,
>> Edgard
>>
>>
>>
>> On Mon, Feb 1, 2016 at 4:34 AM, Hampton Snowball <
>> hamptonsnowb...@gmail.com> wrote:
>>
>>> Thanks. I see it requires constructing a query to only extract the data
>>> you want. E.g. the graph pattern:
>>>
>>>  - desired query, e.g. "SELECT * WHERE {?s ?p ?o}" or
>>> graph pattern e.g. "{?s ?p ?o}"
>>>
>>> Since I don't know about constructing queries, would you be able to tell
>>> me what would be the proper query to extract from all the pages the s

Re: [Wikidata] Wikidata - short biographies

2016-01-31 Thread Edgard Marx
Hey,
you can simple use RDFSlice (https://bitbucket.org/emarx/rdfslice/overview)
directly on the dump file (
https://dumps.wikimedia.org/wikidatawiki/entities/20160125/)

best,
Edgard

On Sun, Jan 31, 2016 at 7:43 PM, Hampton Snowball  wrote:

> Hello,
>
> I am interested in a subset of wikidata and I am trying to find the best
> way to get it without getting a larger dataset then necessary.
>
> Is there a way to just get the "bios" that appear on the wikidata pages
> below the name of the person/organization, as well as the link to the
> english wikipedia page / or all wikipedia pages?
>
> For example from: https://www.wikidata.org/wiki/Q1652291;
>
> "Turkish female given name"
> https://en.wikipedia.org/wiki/H%C3%BClya
> and optionally https://de.wikipedia.org/wiki/H%C3%BClya
>
> I know there is SPARQL which previously this list helped me construct a
> query, but I know some requests seem to timeout when looking at a large
> amount of data so I am not sure this would work.
>
> The dumps I know are the full dataset, but I am not sure if there's any
> other subset dumps available or better way of grabbing this data
>
> Thanks in advance,
> HS
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Wikidata - short biographies

2016-01-31 Thread Edgard Marx
Yep,

One more reason to use RDFSlice ;-),

thnks

On Mon, Feb 1, 2016 at 7:25 AM, Stas Malyshev 
wrote:

> Hi!
>
> >
> > ** For all English bios:*
> >
> > SELECT *
> > WHERE
> > {
> >?s  ?o .
> >filter(lang(?o)='en').
> > }
>
> Please don't run this on query.wikidata.org though. Please add LIMIT.
> Otherwise you'd be trying to download several millions of data items,
> which would probably time out anyway. Add something like "LIMIT 10" to it.
>
> Thanks,
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Source statistics

2015-09-07 Thread Edgard Marx
Is not an updated version, but

dbtrends.aksw.org

best,
Edgard

On Mon, Sep 7, 2015 at 1:25 PM, André Costa 
wrote:

> Hi all!
>
> I'm wondering if there is a way (SQL, api, tool or otherwise) for finding
> out how often a particular source is used on Wikidata.
>
> The background is a collaboration with two GLAMs where we have used ther
> open (and CC0) datasets to add and/or source statements on Wikidata for
> items on which they can be considered an authority. Now I figured it would
> be nice to give them back a number for just how big the impact was.
>
> While I can find out how many items should be affected I couldn't find an
> easy way, short of analysing each of these, for how many statements were
> affected.
>
> Any suggestions would be welcome.
>
> Some details: Each reference is a P248 claim + P577 claim (where the
> latter may change)
>
> Cheers,
> André / Lokal_Profil
> André Costa | GLAM-tekniker, Wikimedia Sverige | andre.co...@wikimedia.se
> | +46 (0)733-964574
>
> Stöd fri kunskap, bli medlem i Wikimedia Sverige.
> Läs mer på blimedlem.wikimedia.se
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata-l] Fwd: Wikipedia search log files

2013-10-18 Thread Edgard Marx
Hi Taraborelli,

I just create the project page requesting the user queries log.

https://meta.wikimedia.org/wiki/Research:User_queries

best,
Edgard


On Sat, Oct 12, 2013 at 8:44 PM, Andrew Gray andrew.g...@dunelm.org.ukwrote:

 I don't think they are going to be made publicly available because of
 the privacy issue. You could try talking to the Wikimedia research
 committee, who may be able to arrange this:

 https://meta.wikimedia.org/wiki/Research:Access_to_non-public_data

 Andrew.

 On 11 October 2013 13:06, Edgard Marx m...@informatik.uni-leipzig.de
 wrote:
  Hi Andrew,
 
  thanks and very sorry for inconvenience,
 
  Can I have access to the old files or some logs files with this data? Is
 for
  research propose.
 
  thanks in advance,
  best,
  Edgard
 
 
  On Fri, Oct 11, 2013 at 1:28 PM, Andrew Gray andrew.g...@dunelm.org.uk
  wrote:
 
  Hi Edgard,
 
  As the note at the top of the page says, these were taken down. As far
  as I know they have not been made available again.
 
  For future questions on analytics data you could try
  https://lists.wikimedia.org/mailman/listinfo/analytics
 
  Andrew.
 
  On 11 October 2013 10:34, Edgard Marx m...@informatik.uni-leipzig.de
  wrote:
   hi?
  
   there is someone ho can help me?
  
   thanks, I really appreciate it
  
   best,
   Edgard
  
   -- Forwarded message --
   From: Edgard Marx m...@informatik.uni-leipzig.de
   Date: Thu, Oct 10, 2013 at 2:34 PM
   Subject: Wikipedia search log files
   To: dvanli...@wikimedia.org
  
  
   Hi Diederik,
  
   I saw your name on this post
  
   (
 https://blog.wikimedia.org/2012/09/19/what-are-readers-looking-for-wikipedia-search-data-now-available/
 ).
  
   I am looking for User Search log files. I could not find them in
   http://dumps.wikimedia.org/other/search/.
  
   Could you help me? I am very new in Wikipedia dump data.
  
   best,
   Edgard
  
  
   ___
   Wikidata-l mailing list
   Wikidata-l@lists.wikimedia.org
   https://lists.wikimedia.org/mailman/listinfo/wikidata-l
  
 
 
 
  --
  - Andrew Gray
andrew.g...@dunelm.org.uk
 
 



 --
 - Andrew Gray
   andrew.g...@dunelm.org.uk


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


[Wikidata-l] Fwd: Wikipedia search log files

2013-10-11 Thread Edgard Marx
hi?

there is someone ho can help me?

thanks, I really appreciate it

best,
Edgard

-- Forwarded message --
From: Edgard Marx m...@informatik.uni-leipzig.de
Date: Thu, Oct 10, 2013 at 2:34 PM
Subject: Wikipedia search log files
To: dvanli...@wikimedia.org


Hi *Diederik,*

I saw your name on this post (https://blog.wikimedia
.org/2012/09/19/what-are-readers-looking-for-wikipedia
-search-data-now-available/).

I am looking for User Search log files. I could not find them in
http://dumps.wikimedia.org/other/search/.

Could you help me? I am very new in Wikipedia dump data.

best,
Edgard
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] Fwd: Wikipedia search log files

2013-10-11 Thread Edgard Marx
Hi Andrew,

thanks and very sorry for inconvenience,

Can I have access to the old files or some logs files with this data? Is
for research propose.

thanks in advance,
best,
Edgard


On Fri, Oct 11, 2013 at 1:28 PM, Andrew Gray andrew.g...@dunelm.org.ukwrote:

 Hi Edgard,

 As the note at the top of the page says, these were taken down. As far
 as I know they have not been made available again.

 For future questions on analytics data you could try
 https://lists.wikimedia.org/mailman/listinfo/analytics

 Andrew.

 On 11 October 2013 10:34, Edgard Marx m...@informatik.uni-leipzig.de
 wrote:
  hi?
 
  there is someone ho can help me?
 
  thanks, I really appreciate it
 
  best,
  Edgard
 
  -- Forwarded message --
  From: Edgard Marx m...@informatik.uni-leipzig.de
  Date: Thu, Oct 10, 2013 at 2:34 PM
  Subject: Wikipedia search log files
  To: dvanli...@wikimedia.org
 
 
  Hi Diederik,
 
  I saw your name on this post
  (
 https://blog.wikimedia.org/2012/09/19/what-are-readers-looking-for-wikipedia-search-data-now-available/
 ).
 
  I am looking for User Search log files. I could not find them in
  http://dumps.wikimedia.org/other/search/.
 
  Could you help me? I am very new in Wikipedia dump data.
 
  best,
  Edgard
 
 
  ___
  Wikidata-l mailing list
  Wikidata-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikidata-l
 



 --
 - Andrew Gray
   andrew.g...@dunelm.org.uk


___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l