[Wikidata-tech] Wikidata item search via API based on labels and description

2014-11-05 Thread Adrian Pohl

Hello,

I have a list of place names and want to find the according wikidata 
item with this name. The list includes Köln, Düsseldorf but also 
parts of towns which are recorded as compounds of the superior 
administrative entity and the district like 
Schmallenberg-Westernbödefeld or Kerpen-Manheim.


If I lookup these via the Wikidata API with the wbsearchentities action 
I get no problems with Köln and the like [1] but won't get any results 
for compounds, see e.g. [2] although both strings are part of the label 
and the description of a wikidata item.


Via the wikidata interface I get the right result, though.[3]

I have looked quite some time but couldn't find a way to query wikidata 
programatically and get results similar to the website search. Thus, my 
question is:


Is there a way to query wikidata via an API over both the label fields 
and the description?


Background

I am working at the North Rhine-Westphalian Library Service Center 
(hbz)and we are currently building a new website for the 
Northrhine-Westphalian bibliography. [4] This bibliography collects 
articles, books and other media about places in the German federal state 
of Northrhine- Westphalia. Each record contains a string which indicates 
which place a resource is about. As soon as we have those links to 
Wikidata we will think about how to link to a list of bibliographic 
resources about a place from the place's wikipedia page. See the GitHub 
issue on this particular problem at [5].


All the best
Adrian

[1] 
https://www.wikidata.org/w/api.php?action=wbsearchentitiessearch=Kölnlanguage=deformat=json


[2] 
https://www.wikidata.org/w/api.php?action=wbsearchentitiessearch=Kerpen%20Manheimlanguage=deformat=json


[3] https://www.wikidata.org/w/index.php?search=Kerpen+Manheim

[4] http://lobid.org/nwbib

[5] https://github.com/hbz/nwbib/issues/42
--
Adrian Pohl
hbz - Hochschulbibliothekszentrum des Landes NRW
Tel: (+49)(0)221 - 400 75 235
http://www.hbz-nrw.de

___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech


[Wikidata-tech] Article: Facebook's Top Open Data Problems

2014-11-05 Thread Ori Livneh
Facebook just published this summary of a summit for database researchers
held at Menlo Park last September. I recommend it. It contains a clear and
concise description of Facebook's data infrastructure, and a description of
the open problems they are thinking about, which is even more interesting.

https://research.facebook.com/blog/1522692927972019/facebook-s-top-open-data-problems/

To whet your appetite, here are the problems (the summaries mostly my own
paraphrase):

* Mobile: How should the shift toward mobile devices affect Facebook’s data
infrastructure?

* Reducing replication: How can we reduce the number of round trips between
the application and data layers?

* Impact of Caching on Availability (aka oh no, we just restarted
memcached): How do we harness the efficiency gains provided by caching
without being brought to our knees by a sudden drop in cache hit rate?

* Sampling at logging time in a distributed environment: How should we
sample log streams if we want to maintain accuracy and flexibility to
answer post-hoc queries?

* Trading storage space and CPU: TL;DR: gzip --best or gzip --fast?

* Reliability of pipelines: Pipelines are less reliable than the sum of
their parts. A pipeline composed of two systems, each 0.999 reliable,
is 0.989 reliable. Much sadness. What to do?

* Globally distributed warehouse: consistency models and synchronization
problems.

* Time series correlation and anomaly detection: AKA: I want an alert for
that massive memcached bytes_out spike that doesn't also wake me up with
false positives at 2AM.
___
Wikidata-tech mailing list
Wikidata-tech@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-tech