Re: [Wikidata] Shape Expressions arrive on Wikidata on May 28th

2019-05-30 Thread Benjamin Good
I'd like to restate the initial question.

Why did wikidata choose shex instead of other approaches?

>From this very detailed comparison
http://book.validatingrdf.com/bookHtml013.html  (thank you Andra!) I could
see arguments in both directions.  I'm curious to know what swayed the
wikidata software team as my group is currently grappling with the same
decision.

On Thu, May 30, 2019 at 7:55 AM Peter F. Patel-Schneider <
pfpschnei...@gmail.com> wrote:

> The history of ShEx is quite complex.
>
> I don't think that one can say that there were complete and conforming
> implementations of ShEx in 2017 because the main ShEX specification,
> http://shex.io/shex-semantics-20170713/ was ill-founded.  I pointed this
> out
> in https://lists.w3.org/Archives/Public/public-shex/2018Mar/0008.html
>
> There were several quite different semantics proposed for ShEx somewhat
> earlier, all with significant problems.
>
> peter
>
>
>
>
>
> On 5/30/19 12:34 AM, Andra Waagmeester wrote:
> > I really don't see the issue here. SHACL, like ShEx is a language to
> express
> > data shapes. I adopted using ShEx in a wikidata context 2016 when ShEx
> was
> > demonstrated at a tutorial at the SWAT4HCLS conference [1] in Amsterdam,
> where
> > it was discussed in both a tutorial and a hackathon topic. At that
> conferene,
> > I was convinced that ShEx is helpful in maintaining quality in Wikidata.
> ShEx
> > offers not only the means to validate data shapes in Wikidata, but it
> also
> > provides a way to document how primary data is expressed in Wikidata.
> In 2016
> > I joined the ShEx community group [2]. Since I have been actively using
> ShEx
> > in defining shapes in various projects on Wikidata (e.g. Gene Wiki and
> > Wikicite).  It is not that this happened in secrecy. On the contrary, it
> was
> > discussed at both Wikimedia [3,4] and non-Wikimedia events [5,6,7].
> >
> > It is also not the case that SHACL has not been discussed in this
> context, on
> > the contrary, I have very good memories of a workshop where both were
> debated
> > (see page 24 ;) )  [8]
> >
> > IMHO  the statement that we all should adhere to one standard, simply
> because
> > it is a standard, is not a valid argument. Imagine having to dictate
> that we
> > all should speak English because it is the standard language.  In every
> single
> > talk that I have given since 2016, proponents of SHACL have been very
> vocal in
> > asking the same question over and over again "why not SHACL?", where the
> > discussion never went beyond, "You should because it is a standard". It
> is
> > also a bit disingenuous to suggest we all should adhere to SHACL because
> it is
> > the standard, while in the same sentence calling it a "Recommendation".
> >
> > Although initially, I was open to SHACL as well (I use both Mac and
> Linux, so
> > why not open up to different alternatives in data shapes), (Some)
> Arguments
> > for me to prefer ShEx over SHACL are:
> > 1. Already in 2017 there were different (open) implementations. At the
> time
> > SHACL didn't have much tooling to choose from, other than one javascript
> > implementation and a proprietary software package.
> > 2. ShEx has a more intuitive way of describing Shapes, which is the
> compact
> > syntax (ShExC). SHACL seems to have adopted the compact syntax as well,
> but
> > only yesterday [9].
> > 3. The culture in the Shape Expression community group aligns well with
> the
> > culture in Wikidata.
> > 4. I don't want to be shackled to one standard (pun intended). I assume
> the
> > name was chosen with a shackle in mind, which puts constraints at the
> core of
> > the language. Wikidata already has different methods in place to deal
> with
> > constraints and constraint violations. In the context of Wikidata, ShEx
> should
> > specifically not be intended to impose constraints, on the contrary, it
> allows
> > expressing of disagreement or variants of different shapes, whether
> conflict
> > or not. Which fits well with the NPOV concept. Symbols do matter.
> >
> > For a less personal comparison, I refer to the "Validating RDF data" book
> > which describes both ShEx and SHACL, and has a specific chapter on how
> they
> > compare and differ [10]
> >
> > Up until now, I have been using ShEx in repositories outside the Wikidata
> > ecosystem (e.g. Github), but I am really excited about the release of
> this
> > extension. I am curious about how the wiki extension will influence the
> > maintenance of schemas. Schemas are currently often expressed as static
> > images, while in practice the schemas are as fluid as the underlying data
> > itself. Being able to document these changes dynamically (the wiki way),
> can
> > be very interesting. One specific expectation I have is that it might
> make it
> > easier to write federated SPARQL queries. Currently, when writing these
> > federated queries we often have to rely on either a set of example
> queries or
> > a one-time schema description, which makes it hard to w

Re: [Wikidata] Wikidata Query Service + Mediawiki API = Love

2017-04-27 Thread Benjamin Good
Stas,

One thing that would be extremely useful right away would be an integration
of the free text search from MediaWiki API.  That is one area SPARQL does
not handle well, Mediawiki does well, and its pretty important for many
applications.  If there were some clever way of mixing (fast!) free text
search with sparql it would be quite powerful.  Imagine e.g. building
type-ahead query boxes given semantic constraints.

On Thu, Apr 27, 2017 at 2:40 PM, Stas Malyshev 
wrote:

> Hi!
>
> I am developing functionality that will allow WDQS query to get data
> from Mediawiki API [1].
>
> Currently, the design is as follows:
> - The API should have a pre-defined template
> - The template also specifies which results are available from the API
>
> The need for template is currently because we need to convert data from
> API's treelike format to tabular format that SPARQL needs, and the
> template allows to specify how the conversion is done.
> See https://www.wikidata.org/wiki/Wikidata:WDQS_and_Mediawiki_API for
> detailed description of how it works.
>
> The prototype implementation is running on http://wdqs-test.wmflabs.org/
> (only Categories API described in the page above is configured now, but
> more will be soon). I'd like to hear feedback about this:
> - does template model make sense at all? Is it enough?
> - what APIs would we want to expose?
> - any other features that would be useful?
>
> Other comments and ideas on the matter are of course always welcome.
> Please comment on the talk page[2] or reply to this message.
>
> [1] https://phabricator.wikimedia.org/T148245
> [2]
> https://www.wikidata.org/w/index.php?title=Wikidata_talk:
> WDQS_and_Mediawiki_API&action=edit
>
> Thanks,
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Fwd: HCOMP Deadline May 4

2017-04-19 Thread Benjamin Good
-- Forwarded message --

Reminder: Deadline May 4th

Full CFP: http://www.humancomputation.com/2017/submit.html

The Fifth AAAI Conference on Human Computation and Crowdsourcing (HCOMP
2017) will be held in Quebec City, Canada, Oct. 24-26, 2017. It will be
sponsored by the Association for the Advancement of Artificial Intelligence
and will be co-located with UIST (Oct. 22-25).

Important Dates
* May 4, 21:00 UTC/5:00pm EDT: Full papers (8 pages) due
* June 5–10: [Optional] Author rebuttal period
* June 25: Notification of acceptance for full papers
* June 30: Works-in-progress poster/demo submissions (2 pages) due
* August 1: Doctoral Consortium applications due
* August 15: Camera-ready versions due
* October 23: Doctoral Consortium
* October 24: Workshops, Tutorials, and Crowdcamp
* October 25-26: Main conference

HCOMP strongly believes in inviting, fostering, and promoting broad,
interdisciplinary research on crowdsourcing and human computation.
Submissions may present principles, studies, and/or applications of systems
that rely on programmatic interaction with crowds, or where human
perception, knowledge, reasoning, or physical activity and coordination
contributes to the operation of computational systems, applications, or
services. More generally, we invite submissions from the broad spectrum of
related fields and application areas including (but not limited to):
* Human-centered crowd studies: e.g., human-computer interaction, social
computing, cultural heritage, computer-supported cooperative work, design,
cognitive and behavioral sciences (psychology and sociology), management
science, economics, policy, ethics, etc.
* Applications: e.g., computer vision, databases, digital humanities,
information retrieval, machine learning, natural language (and speech)
processing, optimization, programming languages, systems, etc.
* Crowd/human algorithms: e.g., computer-supported human computation,
crowd/human algorithm design and complexity, mechanism design, etc.
* Crowdsourcing areas: e.g., citizen science, collective action, collective
knowledge, crowdsourcing contests, crowd creativity, crowd funding, crowd
ideation, crowd sensing, distributed work, freelancer economy, open
innovation, microtasks, prediction markets, wisdom of crowds, etc.

All full paper submission must be anonymized (include no information
identifying the authors or their institutions) for double-blind
peer-review. Accepted full papers will be published in the HCOMP conference
proceedings and included in the AAAI Digital Library. Submitted full papers
are allowed up to 8 pages and works-in-progress/demos are up to 2 pages
(references are not included in the page count) and must be formatted in
AAAI two-column, camera-ready style. The AAAI 2017 Author Kit is available
at http://www.aaai.org/Publications/Templates/AuthorKit17.zip. Papers must
be in trouble-free, high-resolution PDF format, formatted for US Letter
(8.5" x 11") paper, using Type 1 or TrueType fonts. Reviewers will be
instructed to evaluate paper submissions according to specific review
criteria. HCOMP is a young but quickly growing conference, with a
historical acceptance rate of 25-30% for full papers. For further details
about submitting full papers, works-in-progress, demos, a!
 nd the doctoral consortium, please visit http://www.humancomputation.
com/2017/submit.html.

Conference History
HCOMP 2017 builds on a series of four successful earlier workshops held
2009–2012 and four AAAI HCOMP conferences held 2013–2016. The conference
was created by researchers from diverse fields to serve as a key focal
point and scholarly venue for the review and presentation of the highest
quality work on the principles, studies, and applications of human
computation and crowdsourcing. Prior HCOMP conferences have included work
in multiple fields, ranging from human-centered fields like human-computer
interaction, psychology, design, economics, management science,
ethnography, and social computing, to technical fields like algorithms,
machine learning, artificial intelligence, computer vision, information
retrieval, optimization, vision, speech, robotics, and planning.
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Timeout querying for cities in Alaska

2017-04-05 Thread Benjamin Good
The 'all cities' part of your query seems strange.  If you replace it with
a simple 'instance of city' - ?x wdt:P31 wd:Q515 . it returns 145 results
in 341ms.  http://tinyurl.com/m4ffatg
Why would you ask for things that have 'occupation' or 'position' with
value city?

Also, I'm not familiar with the use of the '/' in that line of the query.
My experience with that is as a mathematical operator.  What is it supposed
to be doing?

On Wed, Apr 5, 2017 at 10:00 AM, Anatoly Zelenin  wrote:

> Hi all,
> I am building a Question Answering System for Wikidata (I will put the
> demo live in May) and would like to be able to ask quesitons like "Give
> me the cities in Alaska". To query Wikidata I am using the Wikidata
> SPARQL endpoint.
>
> The SPARQL query for this question looks the following way [0]:
>
> SELECT DISTINCT ?x ?xLabel
> {
>   # All things in Alaska
>   ?x (wdt:P131+ | wdt:P276) wd:Q797 .
>   #  All cities
>   ?x ((wdt:P106 | wdt:P39 | wdt:P31) / wdt:P279*) wd:Q515 .
>   SERVICE wikibase:label { bd:serviceParam wikibase:language "en" . }
> }
>
> As it needs to join over two very large sets (the set of all cities and
> the set of all things in Alaska) the query times out. Do you have an
> idea how I could write the query without getting a timeout?
> Thank you ;)
>
> [0]
> https://query.wikidata.org/#SELECT%20DISTINCT%20%3Fx%20%
> 3FxLabel%0A%7B%0A%3Fx%20%28wdt%3AP131%2B%20%7C%20wdt%
> 3AP276%29%20wd%3AQ797%20.%0A%3Fx%20%28%28wdt%3AP106%20%7C%
> 20wdt%3AP39%20%7C%20wdt%3AP31%29%20%2F%20wdt%3AP279%2a%29%
> 20wd%3AQ515%20.%0ASERVICE%20wikibase%3Alabel%20%7B%0Abd%
> 3AserviceParam%20wikibase%3Alanguage%20%22en%22%20.%0A%7D%0A%7D
>
>
> --
> Best,
> Anatoly Zelenin
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] [wikicite-discuss] Entity tagging and fact extraction (from a scholarly publisher perspective)

2016-11-12 Thread Benjamin Good
Denny,

This is a really good question: "How bad / good does Wikidata as a whole
fit the role of an open vocabulary for content tagging?"
but I suspect that it needs some qualifiers on it.

For what domains?
For what purposes?

I think the answer will vary by context and by timestamp.  (But I think
this is ultimately going to be the one of the killer use cases for the
whole system.)


On Fri, Nov 11, 2016 at 5:36 PM, Thomas Krichel  wrote:

>   Andrew Smeall writes
>
> > We do use MeSH for those subjects, but this only applies to about 40% of
> > our papers. In Engineering, for example, we've had more trouble finding
> an
> > open taxonomy with the same level of depth as MeSH.
>
>   Have you found one?
>
> > For most internal applications, we need 100% coverage of all
> > subjects.
>
>   Meaning you want to have a scheme that provides at least
>   one class for any of the papers that you publish? Why?
>
> > The temptation to build a new vocabulary is strong, because it's the
> > fastest way to get to something that is non-proprietary and universal. We
> > can merge existing open vocabularies like MeSH and PLOS to get most of
> the
> > way there, but we then need to extend that with concepts from our corpus.
>
>   I am not sure I follow this. Surely, if you don't have categories
>   for engineering, you can build your own scheme and publish it. I don't
>   see this as a reason for not using MESH when that is valid for the
>   paper under consideration.
>
>   I must be missing something.
>
> --
>
>   Cheers,
>
>   Thomas Krichel  http://openlib.org/home/krichel
>   skype:thomaskrichel
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] [wikicite-discuss] Entity tagging and fact extraction (from a scholarly publisher perspective)

2016-11-02 Thread Benjamin Good
Dario,

One message you can send is that they can and should use existing
controlled vocabularies and ontologies to construct the metadata they want
to share.  For example, MeSH descriptors would be a good way for them to
organize the 'primary topic' assertions for their articles and would make
it easy to find the corresponding items in Wikidata when uploading.  Our
group will be continuing to expand coverage of identifiers and concepts
from vocabularies like that in Wikidata - and any help there from
publishers would be appreciated!

My view here is that Wikidata can be a bridge to the terminologies and
datasets that live outside it - not really a replacement for them.  So, if
they have good practices about using shared vocabularies already, it should
(eventually) be relatively easy to move relevant assertions into the
WIkidata graph while maintaining interoperability and integration with
external software systems.

-Ben

On Wed, Nov 2, 2016 at 8:31 AM, 'Daniel Mietchen' via wikicite-discuss <
wikicite-disc...@wikimedia.org> wrote:

> I'm traveling ( https://twitter.com/EvoMRI/status/793736211009536000
> ), so just in brief:
> In terms of markup, some general comments are in
> https://www.ncbi.nlm.nih.gov/books/NBK159964/ , which is not specific
> to Hindawi but partly applies to them too.
>
> A problem specific to Hindawi (cf.
> https://commons.wikimedia.org/wiki/Category:Media_from_Hindawi) is the
> bundling of the descriptions of all supplementary files, which
> translates into uploads like
> https://commons.wikimedia.org/wiki/File:Evolution-of-Coronary-Flow-in-an-
> Experimental-Slow-Flow-Model-in-Swines-Angiographic-and-623986.f1.ogv
> (with descriptions for nine files)
> and eight files with no description, e.g.
> https://commons.wikimedia.org/wiki/File:Evolution-of-Coronary-Flow-in-an-
> Experimental-Slow-Flow-Model-in-Swines-Angiographic-and-623986.f2.ogv
> .
>
> There are other problems in their JATS, and it would be good if they
> would participate in
> http://jats4r.org/ . Happy to dig deeper with Andrew or whoever is
> interested.
>
> Where they are ahead of the curve is licensing information, so they
> could help us set up workflows to get that info into Wikidata.
>
> In terms of triple suggestions to Wikidata:
> - as long as article metadata is concerned, I would prefer to
> concentrate on integrating our workflows with the major repositories
> of metadata, to which publishers are already posting. They could help
> us by using more identifiers (e.g. for authors, affiliations, funders
> etc.), potentially even from Wikidata (e.g. for keywords/ P921, for
> both journals and articles) and by contributing to the development of
> tools (e.g. a bot that goes through the CrossRef database every day
> and creates Wikidata items for newly published papers).
> - if they have ways to extract statements from their publication
> corpus, it would be good if they would let us/ ContentMine/ StrepHit
> etc. know, so we could discuss how to move this forward.
> d.
>
> On Wed, Nov 2, 2016 at 1:42 PM, Dario Taraborelli
>  wrote:
> > I'm at the Crossref LIVE 16 event in London where I just gave a
> presentation
> > on WikiCite and Wikidata targeted at scholarly publishers.
> >
> > Beside Crossref and Datacite people, I talked to a bunch of folks
> interested
> > in collaborating on Wikidata integration, particularly from PLOS, Hindawi
> > and Springer Nature. I started an interesting discussion with Andrew
> Smeall,
> > who runs strategic projects at Hindawi, and I wanted to open it up to
> > everyone on the lists.
> >
> > Andrew asked me if – aside from efforts like ContentMine and StrepHit –
> > there are any recommendations for publishers (especially OA publishers)
> to
> > mark up their contents and facilitate information extraction and entity
> > matching or even push triples to Wikidata to be considered for ingestion.
> >
> > I don't think we have a recommended workflow for data providers for
> > facilitating triple suggestions to Wikidata, other than leveraging the
> > Primary Sources Tool. However, aligning keywords and terms with the
> > corresponding Wikidata items via ID mapping sounds like a good first
> step. I
> > pointed Andrew to Mix'n'Match as a handy way of mapping identifiers, but
> if
> > you have other ideas on how to best support 2-way integration of Wikidata
> > with scholarly contents, please chime in.
> >
> > Dario
> >
> > --
> >
> > Dario Taraborelli  Head of Research, Wikimedia Foundation
> > wikimediafoundation.org • nitens.org • @readermeter
> >
> > --
> > WikiCite 2016 – May 26-26, 2016, Berlin
> > Meta: https://meta.wikimedia.org/wiki/WikiCite_2016
> > Twitter: https://twitter.com/wikicite16
> > ---
> > You received this message because you are subscribed to the Google Groups
> > "wikicite-discuss" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to wikicite-discuss+unsubscr...@wikimedia.org.
>
> --
> WikiCite 2016 – May 26-26, 2016, Be

Re: [Wikidata] Acquiring general knowledge from Wikidata

2016-10-25 Thread Benjamin Good
Slava,

This sounds interesting I would be happy to be a test subject in the bio or
semantic web domain.

(There is no reason this project needs to be reviewed by Wikimedia Research
or anyone else.  The data is open, please go ahead and use it to do useful
research like this.)  Whether or not any generated content actually moves
into Wikidata is a completely different question.

cheers
-Ben

On Tue, Oct 25, 2016 at 9:31 AM, Slava Sazonau  wrote:

> Yes, absolutely. Apologies for being unclear: we consider each axiom as a
> hypothesis that should be reviewed. This is indeed a jargon since logical
> statements in OWL are usually called axioms (and in our case those
> statements can be actually wrong since they are acquired from data
> automatically).
>
> Slava
>
> On 25 October 2016 at 16:27, Federico Leva (Nemo) 
> wrote:
>
>> As far as I know, an axiom by definition can't be false. What definition
>> are you using? Maybe some jargon specific to this research field?
>>
>> Nemo
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] signing license declarations

2016-10-06 Thread Benjamin Good
That would be the ideal spot - if it said what portion of the data was CC0
or is specifically declared as such for the purposes of allowing it into
wikidata.

On Wed, Oct 5, 2016 at 10:40 PM, Federico Leva (Nemo) 
wrote:

> Benjamin Good, 05/10/2016 23:33:
>
>> somewhere the world can see
>>
>
> Like http://www.uniprot.org/help/license
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] signing license declarations

2016-10-05 Thread Benjamin Good
Yes.  e.g. according to that copyright statement from uniprot our actions
are currently in violation.  We need to make it clear that an agreement has
been reached and have that hosted somewhere the world can see.

On Wed, Oct 5, 2016 at 2:26 PM, Federico Leva (Nemo) 
wrote:

> Benjamin Good, 05/10/2016 19:44:
>
>> As a specific example, we have informal (e.g. an email to us) permission
>> to import data from the Disease Ontology [2] and UniProt [3] but would
>> like to make those informal agreements 'official' and public.
>>
>
> Just make them add such a note to http://www.uniprot.org/help/license or
> equivalent? E.g. http://www.beic.it/it/articoli/copyright releases some
> parts in CC-0.
>
> Nemo
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] signing license declarations

2016-10-05 Thread Benjamin Good
Thanks Egon, thats a reasonable approach to keep the ball rolling.  How do
you plan to handle updates ?

On Wed, Oct 5, 2016 at 12:15 PM, Egon Willighagen <
egon.willigha...@gmail.com> wrote:

> On Wed, Oct 5, 2016 at 7:44 PM, Benjamin Good 
> wrote:
> > When negotiating the import of data from resources that are not CC0, it
> > would be very valuable to have a somewhat formal process to allow them to
> > declare that some portions of their databases may be imported into
> wikidata
> > and thus join its CC0 collection.
>
> For the EPA CompTox Dashboard I asked Antony Williams to release the
> mapping data as CCZero spefically, which he did on Figshare [0].
>
> Egon
>
> 0.https://figshare.com/articles/Mapping_file_of_
> InChIStrings_InChIKeys_and_DTXSIDs_for_the_EPA_CompTox_Dashboard/3578313
>
> --
> E.L. Willighagen
> Department of Bioinformatics - BiGCaT
> Maastricht University (http://www.bigcat.unimaas.nl/)
> Homepage: http://egonw.github.com/
> LinkedIn: http://se.linkedin.com/in/egonw
> Blog: http://chem-bla-ics.blogspot.com/
> PubList: http://www.citeulike.org/user/egonw/tag/papers
> ORCID: -0001-7542-0286
> ImpactStory: https://impactstory.org/u/egonwillighagen
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] signing license declarations

2016-10-05 Thread Benjamin Good
When negotiating the import of data from resources that are not CC0, it
would be very valuable to have a somewhat formal process to allow them to
declare that some portions of their databases may be imported into wikidata
and thus join its CC0 collection.

Does wikidata have anything like this in place?  I think slight
modifications to the protocol for Commons [1] ought to be sufficient.
 (Though I would like to avoid the backlog there that apparently is at 88
days now..).

As a specific example, we have informal (e.g. an email to us) permission to
import data from the Disease Ontology [2] and UniProt [3] but would like to
make those informal agreements 'official' and public.  I suspect this will
be a very common situation going forward.

thoughts?
-Ben

[1] https://commons.wikimedia.org/wiki/Commons:Email_templates
[2] http://www.obofoundry.org/ontology/doid.html
[3] http://www.uniprot.org
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] [wikicite-discuss] Re: (semi-)automatic statement references fro Wikidata from DBpedia

2016-09-01 Thread Benjamin Good
Dimitris,

This seems like good way to seed a large scale data and reference import
process.  The trouble here is that wikidata already has large amounts of
such potentially useful data (e.g. most of freebase, the results of the
StepHit NLP system, etc.) but the processes for moving it in have thus far
gone slowly.  In fact the author of the StepHit system for mining
facts/references for wikidata is shifting his focus entirely to improving
that part of the pipeline (known currently as the 'primary sources' tool)
as it is the bottleneck.  It would be great to see you get involved there:
https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Semi-automatic_Addition_of_References_to_Wikidata_Statements

Once we have a good technical and social pattern for verifying predicted
claims and references at scale, we can get to the business of loading that
system up with good input.

my two cents..
-ben


On Thu, Sep 1, 2016 at 7:53 AM, Dimitris Kontokostas 
wrote:

> Hmm,it is hard to interpret no feedback at all here, it could be
> a) the data is not usable for Wikidata
> b) this is not an interesting idea for Wikidata (now) or
> c) this is not a good place to ask
>
> Based on the very high activity on this list I could only guess (b), even
> though but this idea came from the Wikidata community 1+ year ago. This is
> probably not relevant now.
> https://lists.wikimedia.org/pipermail/wikidata/2015-June/006366.html
>
> For reference, this is the prototype extractor that generated the cited
> facts which can be run on newer dumps
> https://github.com/dbpedia/extraction-framework/blob/master/
> core/src/main/scala/org/dbpedia/extraction/mappings/CitedFac
> tsExtractor.scala
>
> Best,
> Dimitris
>
> On Tue, Aug 30, 2016 at 9:16 PM, Dario Taraborelli <
> dtarabore...@wikimedia.org> wrote:
>
>> cc'ing wikicite-discuss, this is going to be of relevance to many people
>> there too.
>>
>> On Mon, Aug 29, 2016 at 11:09 PM, Dimitris Kontokostas > > wrote:
>>
>>> You can have a look here.
>>> http://downloads.dbpedia.org/temporary/citations/enwiki-2016
>>> 0305-citedFacts.tql.bz2
>>> it is a quad file that contains DBpedia facts and I replaced the context
>>> with the citation when the citation is on the exact same line with the
>>> extracted fact. e.g.
>>>
>>>  <
>>> http://dbpedia.org/property/work> "An American in Paris"@en <
>>> https://www.bnote.de/?set=werk_detail&kompid=246&bnnr=16963&lc=en> .
>>>
>>> It is based on a complete English dump from ~April and contains roughly
>>> 1M cited facts
>>> This is more like a proof-of-concept and there are many ways to improve
>>> and make it more usable for Wikidata
>>>
>>> let me know what you think
>>>
>>>
>>> On Mon, Aug 29, 2016 at 1:38 AM, Brill Lyle 
>>> wrote:
>>>
 Yes? I think so. Except I would like to see fuller citations extracted
 / sampled from / to? I don't have the technical skill to understand the
 extraction completely but Yes. I think there is very rich data in Wikipedia
 that is very extractable.

 Could this approach be a good candidate reference suggestions in
 Wikidata?
 (This particular one is already a reference but the anthem and GDP in
 the attachment are not for example)



 - Erika

 *Erika Herzog*
 Wikipedia *User:BrillLyle
 *

 On Sat, Aug 27, 2016 at 9:37 AM, Dimitris Kontokostas <
 kontokos...@informatik.uni-leipzig.de> wrote:

> Hi,
>
> I had this idea for some time now but never got to test/write it down.
> DBpedia extracts detailed context information in Quads (where
> possible) on where each triple came from, including the line number in the
> wiki text.
> Although each DBpedia extractor is independent, using this context
> there is a small window for combining output from different extractors,
> such as the infobox statements we extract from Wikipedia and the very
> recent citation extractors we announced [1]
>
> I attach a very small sample from the article about Germany where I
> filter out the related triples and order them by the line number they were
> extracted from e.g.
>
> dbr:Germany dbo:populationTotal "82175700"^^xsd:nonNegativeInteger  <
> http://en.wikipedia.org/wiki/Germany?oldid=736355524#
> *absolute-line=66*&template=Infobox_country&property=population_est
> imate&split=1&wikiTextSize=10&plainTextSize=10&valueSize=8> .
>  ilungen/2016/08/PD16_295_12411pdf.pdf;jsessionid=996EC2DF0A8
> D510CF89FDCBC74DBAE9F.cae2?__blob=publicationFile> dbp:isCitedBy
> dbr:Germany  *absolute-line=66*> .
>
> Looking at the wikipedia article we see:
> |population_estimate = 82,175,700{{cite web|url=
> https://www.destatis.de/DE/PresseService/

Re: [Wikidata] pattern for linking to linked data ?

2016-08-17 Thread Benjamin Good
As Andra reminded me above, this property went through pretty extensive
discussions overlapping the one here (with examples) when it was proposed:
https://www.wikidata.org/wiki/Wikidata:Property_proposal/exact_match

I was mainly checking to ensure that no one else had been bolding working
on integrating wikidata with the semantic web according to a different
approach before we invested our time more heavily in this one.  As it seems
that has not transpired, my intention is to lead/follow our team in this
direction - starting with an integration of the concepts in the Gene
Ontology.

-Ben

On Wed, Aug 17, 2016 at 11:44 AM, Andy Mabbett 
wrote:

> On 17 August 2016 at 00:43, Benjamin Good 
> wrote:
>
> > I am about to propose much more widespread use of Property:P2888 "exact
> > match" for linking from a wikidata item to a URI that should resolve to
> > linked data about the same concept from another semantic web resource.
> > (amongst the biomedical items that our team works with)
>
> Do you have examples, please?
>
> Generally, it's better to propose an "external-ID" type property, if
> there are a significant number of items to be identified.
>
> I'm happy to assist you if you decide to go down that route.
>
> --
> Andy Mabbett
> @pigsonthewing
> http://pigsonthewing.org.uk
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] pattern for linking to linked data ?

2016-08-17 Thread Benjamin Good
Perhaps it would be more productive if I give a very specific example.
 (I'd prefer a general, wikidata-wide policy but it sounds like that isn't
going to happen.)

We are working on integrating wikidata with many of the ontologies that are
part of the OBO Foundry [1].  These include, for example, the Gene Ontology
and the Disease Ontology.  Bringing the concepts represented in these
ontologies in as items in wikidata makes it possible to author claims that
capture knowledge about the relationships between, for example, genes,
biological processes, diseases, and drugs.  These claims are thus far
mostly drawn from associated public databases.  They serve to populate
infoboxes on Wikipedias and, we hope, will also help foster the growth of
new applications that can help to capture more knowledge for re-use by the
wikidata community.  Importantly, these imports also bind the wikidata
community here to the community of biomedical researchers over there.
Establishing a coherent pattern for binding the concepts in these
ontologies to the corresponding items in wikidata is important for two key
reasons:

(1) The ontologies and other linked data resources that use them have a lot
of data that is never likely to get into wikidata and vice versa.
Establishing clear mappings makes it possible to integrate that knowledge
(mostly) automatically.  (AKA the whole idea of the semantic web...).  The
more consistent the pattern of mapping, the more automation is possible.

(2) It is vitally important to the maintainers of these resources to be
able to track usage of their work products.  The more an ontology that is
funded for the purpose of supporting research and knowledge dissemination
can show that it is being used, the better the argument to continue its
funding.  When negotiating the import of knowledge products into a CC0
world, it is important that we can demonstrate that the items will
generally remain connected as well as give indications about how they are
being used.  (Accepting of course that with CC0 there is no guarantee.)

Given that context, would you support the proposal of using the exact match
property to bind this specific set of biomedical wikidata items to items
defined elsewhere on the semantic web?  If not, what would be the best
alternative?

-Ben

[1] http://www.obofoundry.org

On Wed, Aug 17, 2016 at 8:05 AM, Andra Waagmeester  wrote:

>
>
> On Wed, Aug 17, 2016 at 9:48 AM, Markus Kroetzsch <
> markus.kroetz...@tu-dresden.de> wrote:
>
>>
>>> As Gerard said, "exact" correspondence might be difficult in most cases,
>> but something slightly weaker should be ok. Something that one should note
>> is that, even in cases where two things are about the same "idea", their
>> usage in RDF is usually different if you compare Wikidata RDF to external
>> RDF. For example, a class in an external RDF document might have instances
>> assigned via rdf:type whereas a class-like item in Wikidata has instances
>> assigned via a chain of properties
>>
>
> This is exactly why the property P2888 is based on skos:exactMatch When I
> proposed the "exact match" property this issue surfaced as well [1].
>  skos:exactMatch provides more freedom in expressing similarity between
> concepts then the related owl:sameAs
>
>
>>
>> possibly with further quantifiers assigned to the middle element. So you
>> cannot get a simple, direct correspondence on a structural level anyway.
>>
>>
> Actually with skos:exactMatch you can.
>
>
>> There are also properties "equivalent class" (P1709), "external
>> superproperty" (P2235), and "external subproperty" (P2236). See
>>
>> https://tools.wmflabs.org/sqid/#/browse?type=properties&datatypes=5:Url
>>
>> for the list of all 34 URL-type properties. Maybe you can discover others
>> that are relevant for you.
>>
>
> The reason I proposed the exact match property was with federated queries
> in mind. Other resources can have a higher level of granularity on a given
> topic. Being able to reach out to these resource can have benefits. In the
> past we were able to federate over for example Wikdiata and Uniprot where
> the URI was composed by generating  the linking URI on the fly: e.g.
> BIND(IRI(CONCAT("http://purl.uniprot.org/uniprot/";, ?wduniprot)) as
> ?uniprot)
>
> At first we experimented with the equivalent class property. But it was
> pointed out by ontologists that using the property for class equality is
> problematic as is partly expressed in its W3C definition: "NOTE: The use of
> owl:equivalentClass does not imply class equality". [1]
>
> Being able to store external URIs of wikidata URIs, would enable us to
> really make WIkidata central in de linked data cloud, allowing representing
> more granularity then currently captured in Wikidata.
>
> Cheers,
>
> Andra
>
> [1] https://www.wikidata.org/wiki/Wikidata:Property_proposal/exact_match
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/li

[Wikidata] pattern for linking to linked data ?

2016-08-16 Thread Benjamin Good
I am about to propose much more widespread use of Property:P2888 "exact
match" for linking from a wikidata item to a URI that should resolve to
linked data about the same concept from another semantic web resource.
(amongst the biomedical items that our team works with)

Is there any other pattern that the community here is using to do this that
we should be following?

thanks
Ben
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Schema.org mappings in Wikidata

2016-06-17 Thread Benjamin Good
Perhaps the new 'exact match'?
https://m.wikidata.org/wiki/Property:P2888

> On Jun 17, 2016, at 8:26 PM, Thad Guidry  wrote:
> 
> Oh boy.  I thought I had a few things figured out with Wikidata...until
> I read through the Property Talk discussions for 
> 
> https://www.wikidata.org/wiki/Property_talk:P2236
> 
> (with its mentions of Freebase mapping)
> 
> So...
> I've been adding a few bits of Schema.org mapping into Wikidata today, and 
> stumbled upon a few things that made me rethink a few things...lolol.
> 
> QUESTION:
> How to state that a Wikidata Entity (not a Property) such as
> place of birth https://www.wikidata.org/wiki/Q1322263 
> is the same concept or idea as a Schema.org property 
> http://schema.org/birthPlace ?
> 
> I thought I could use P2236 above... but then it seems its for WD Properties, 
> not Entities (subjects) ?
> 
> SOLUTION ? Perhaps we could do a best practice of treating 
> http://schema.org/birthPlace as an actual Identifier for the place of birth 
> concept 
> https://www.wikidata.org/wiki/Q1322263 
> 
> ...while reserving the WD Property place of brith 
> https://www.wikidata.org/wiki/Property:P19 to use equivalent property 
> https://www.wikidata.org/wiki/Property:P1628 ?
> 
> TPT and Denny didn't leave enough notes in there for what to do about 
> external mapping cases of external vocabularies that are also loosely 
> considered as metadata dictionaries as well for the common web and 
> developers, like Schema.org is.
> 
> Thoughts on the SOLUTION proposed ?
> 
> Thad
> +ThadGuidry
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] [ANNOUNCEMENT] StrepHit 1.0 Beta Release

2016-06-15 Thread Benjamin Good
Hi Marco,

Where might we find some statistics on the current accuracy of the
automated claim and reference extractors?  I assume that information must
be in there somewhere, but I had trouble finding it.

This is a very ambitious project covering a very large technical territory
(which I applaud).  It would be great if your results could be synthesized
a bit more clearly so we can understand where the weak/strong points are
and where we might be able to help improve or make use of what you have
done in other domains.

-Ben


On Wed, Jun 15, 2016 at 9:06 AM, Marco Fossati 
wrote:

> [Feel free to blame me if you read this more than once]
>
> To whom it may interest,
>
> Full of delight, I would like to announce the first beta release of
> *StrepHit*:
>
> https://github.com/Wikidata/StrepHit
>
> TL;DR: StrepHit is an intelligent reading agent that understands text and
> translates it into *referenced* Wikidata statements.
> It is a IEG project funded by the Wikimedia Foundation.
>
> Key features:
> -Web spiders to harvest a collection of documents (corpus) from reliable
> sources
> -automatic corpus analysis to understand the most meaningful verbs
> -sentences and semi-structured data extraction
> -train a machine learning classifier via crowdsourcing
> -*supervised and rule-based fact extraction from text*
> -Natural Language Processing utilities
> -parallel processing
>
> You can find all the details here:
>
> https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References
>
> https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References/Midpoint
>
> If you like it, star it on GitHub!
>
> Best,
>
> Marco
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Genes, proteins, and bad merges in general

2016-06-14 Thread Benjamin Good
Hi Tom,

I think the example you have there is actually linked up properly at the
moment?
https://en.wikipedia.org/wiki/SPATA5 is about both the gene and the protein
as are most Wikipedia articles of this nature.  And it is linked to the
gene the way we encourage modeling https://www.wikidata.org/wiki/Q18052679
- and indeed the protein item is not linked to a Wikipedia article again
following our preferred pattern.

For the moment...  _our_ merge problem seems to be mostly resolved.
Correcting the sitelinks on the non-english Wikipedias in a big batch
seemed to help slow the flow dramatically.  We have also introduced some
flexibility into the Lua code that produces infobox_gene on Wikipedia.  It
can handle most of the possible situations (e.g. wikipedia linked to
protein, wikipedia linked to gene) automatically so that helps prevent
visible disasters..

On the main issue you raise about merges..  I'm a little on the fence.
Generally I'm opposed to putting constraints in place that slow people down
- e.g. we have a lot of manual merge work that needs to be done in the
medical arena and I do appreciate that the current process is pretty fast.
I guess I would advocate a focus on making the interface more vehemently
educational as a first step.  E.g. lots of 'are you sure' etc. forms to
click through but ultimately still letting people get their work done
without enforcing an approval process.

-Ben

On Tue, Jun 14, 2016 at 10:53 AM, Tom Morris  wrote:

> Bad merges have been mentioned a couple of times recently and I think one
> of the contexts with Ben's gene/protein work.
>
> I think there are two general issues here which could be improved:
>
> 1. Merging is too easy. Because splitting/unmerging is much harder than
> merging, particularly after additional edits, the process should be biased
> to mark merging more difficult.
>
> 2. The impedance mismatch between Wikidata and Wikipedias tempts
> wikipedians who are new to wikidata to do the wrong thing.
>
> The second is a community education issue which will hopefully improve
> over time, but the first could be improved, in my opinion, by requiring
> more than one person to approve a merge. The Freebase scheme was that
> duplicate topics could be flagged for merge by anyone, but instead of
> merging, they'd be placed in a queue for voting. Unanimous votes would
> cause merges to be automatically processed. Conflicting votes would get
> bumped to a second level queue for manual handling. This wasn't foolproof,
> but caught a lot of the naive "these two things have the same name, so they
> must be the same thing" merge proposals by newbies. There are lots of
> variations that could be implemented, but the general idea is to get more
> than one pair of eyes involved.
>
> A specific instance of the structural impedance mismatch is enwiki's
> handling of genes & proteins. Sometimes they have a page for each, but
> often they have a single page that deals with both or, worse, a page who's
> text says its about the protein, but where the page includes a gene infobox.
>
> This unanswered RFC from Oct 2015 asks whether protein & gene should be
> merged:
> https://www.wikidata.org/wiki/Wikidata:Requests_for_comment/Oxytocin_and_OXT_gene
>
> I recently ran across a similar situation where this Wikidata gene SPATA5
> https://www.wikidata.org/wiki/Q18052679 is linked to an enwiki page about
> the associated protein https://en.wikipedia.org/wiki/SPATA5, while the
> Wikidata protein is not linked to any wikis
> https://www.wikidata.org/wiki/Q21207860
>
> These differences in handling make the reconciliation process very
> difficult and the resulting errors encourage erroneous merges. The
> gene/protein case probably needs multiple fixes, but many mergers harder
> would help.
>
> Tom
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Communicating corrections back to data source

2016-06-10 Thread Benjamin Good
Hi Julie,

We've thought a lot about this, but not done anything formally yet.  There
is an example of this happening to improve the disease ontology presented
in this paper [1].

Mechanically, parties interested in a particular swath of data linked to
their resource could set up repeated SPARQL queries to watch for changes.
Beyond that, the core mediawiki API could be used to create alerts when new
discussions are started on articles or items of interest.

At some point we hope to produce a reporting site that would aggregate this
kind of information in our domain (feedback and changes  by the community)
as well as changes by our bots and provide reports back to the primary
sources and to whoever else was interested.  (Maybe we will see a start on
that this summer..)  This hasn't become a priority yet because we haven't
yet generated the community scope to make it a really valuable source of
input to the original databases.

[1] http://biorxiv.org/content/biorxiv/early/2015/11/16/031971.full.pdf


On Fri, Jun 10, 2016 at 11:31 AM, Julie McMurry 
wrote:

> It is great that WikiData provides a way for data to be curated in a
> crowd-sourced way.
> It would be even better if changes (especially corrections) could be
> communicated back to the original source so that all could benefit.
>
> Has this been discussed previously? Considered?
>
> Julie
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Possibility of data lock

2016-06-10 Thread Benjamin Good
+1000 to Markus K's main arguments above.  Yes noise will be introduced as
different people come in to make edits.  Administrators locking them out
isn't the best way to solve that problem in an open system.  There are
other options, as he raised - both technical and social.

Our group maintains hundreds of thousands of items on Wikidata with bots
that import data about genes, drugs, and diseases from a large variety of
'trusted authorities'.  We have also experienced some frustrations when
users make changes that don't fit our views (e.g. there was a thread here
about people merging gene and protein items that caused some major
headaches for us).  But, we resolved it by engaging with the people doing
the problematic edits via the talk pages and by adapting our bot code in
various ways - both by automatically fixing things that we judged to be
clearly broken (producing edits that could be tracked and argued with), by
educating community members about why things were modeled the way they
were, and by fixing processes that led to the confusion.  We could have
argued that because of our PhDs or whatever, we should 'own' these claims
that are "clearly facts...", but that would have been a huge mistake.

Looking ahead, we see that there will be an increasing challenge to keep
things in order as more and more people get involved.  Figuring out this
process will entail a lot of work for us - much more than if the 3 _total_
people that are writing the gene, disease, drug bots currently were left to
their own devices to create our own little data warehouse.  We are here
because we want to be a small part of something much more profound than
that - that will take not 3 people in a closed room, but 30,000 people
collaborating and fighting it out together.  We are a long way from 30,000
here (at least in our part of this).  Lets keep the gates as open as
possible.

-Ben

p.s. In terms of 'making our data more reliable for re-use'.  The formula
for me here is (1) get references on statements (2) develop code that
automates the processes of checking them, (3) provide the re-user with
straightforward ways to filter the data based on the combination of 1 and
2.  This can even be accomplished directly inside of Lua code that builds
Wikipedia templates from Wikidata - we have already started prototyping
that.


On Fri, Jun 10, 2016 at 4:26 AM, Markus Kroetzsch <
markus.kroetz...@tu-dresden.de> wrote:

> On 10.06.2016 12:53, Sandra Fauconnier wrote:
>
>>
>> On 10 Jun 2016, at 12:39, Yellowcard  wrote:
>>>
>>
>> However, there are single statements (!) that
>>> are proven to be correct (possibly in connection with qualifiers) and
>>> are no subject to being changed in future. Locking these statements
>>> would make them much less risky to obtain them and use them directly in
>>> Wikipedia. What would be the disadvantage of this, given that slightly
>>> experienced users can still edit them and the lock is only a protection
>>> against anonymous vandalism?
>>>
>>
>> I agree 100%, and would like to add (again) that this would also make our
>> data more reliable for re-use outside Wikimedia projects.
>>
>> There’s a huge scala of possibilities between locking harshly (no-one can
>> edit it anymore) and leaving stuff entirely open. I disagree that just one
>> tiny step away from ‘entirely open’ betrays the wiki principle.
>>
>
> I don't want to argue about principles :-). What matters is just how users
> perceive things. If they come to the site and want to make a change, and
> they cannot, you have to make sure that they understand why and what to do
> to fix it. The more you require them to learn and do before they are
> allowed to contribute, the more of them you will lose along the way. If a
> statement is not editable, then the (new or old) user has to:
>
> (1) be told why this is the case
> (2) be told what to do to change it anyway or at least to tell somebody to
> have a look at it (because there will always be errors, even in the "fixed"
> statements)
>
> These things are difficult to get right.
>
> There is a lot of discussion in recent years as to why new editors turn
> their back on Wikipedia after a short time, and one major cause is that
> many of their first edits get reverted very quickly, often by automated
> tools. I think the reasons for the reverting are often valid, so it is a
> short-term improvement to the content, yet it severely hurts the community.
> Therefore, whenever one discusses new measure to stop or undo edits, one
> should also discuss how to avoid this negative effect.
>
> I completely agree that there is a lot of middle ground to consider,
> without having to go to an extreme lock-down. However, things tend to
> develop over time, and I think it is fair to say that many Wikipedias have
> become more closed as they evolved. I am not eager to speed up this
> (natural, unavoidable?) process for Wikidata too much.
>
> The pros and cons of flagged revisions have been discussed in breadth on
> severa

Re: [Wikidata] WDQS URL shortener

2016-06-03 Thread Benjamin Good
awesome, thank you!

In case its useful for anyone, I was using wikidata to teach biologists and
chemists about knowledge graphs (many tinyurls toward the end)
http://www.slideshare.net/goodb/computing-on-the-shoulders-of-giants

On the second part of the course students were provided with the following
Jupyter python notebook which runs a sparql query against wikidata and
generates an output file suitable for loading in Cytoscape (a commonly used
network visualization tool in biology).  Could be useful for others that
need to teach this stuff..
https://github.com/SuLab/sparql_to_pandas/blob/master/SPARQL_pandas.ipynb

On Fri, Jun 3, 2016 at 9:26 AM, Stas Malyshev 
wrote:

> Hi!
>
> > I could have used another one on my own I guess, but the current
> > implementation is much faster and less error prone when dealing with
> > monster sparqling urls...
> >
> > please find a way to keep it
>
> There are no plans to remove URL shortening. There are plans to switch
> URL shortener to Wikimedia's own one, which is supposed to be coming up
> eventually, but before that, we plan to use existing ones. We might
> change a provider if it turns out there is a better one, but we do not
> plan to remove the functionality.
>
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] WDQS URL shortener

2016-06-03 Thread Benjamin Good
I don't want to suggest a technology, but the URL shortener on wdqs has
been EXTREMELY USEFUL to me and it would be major bummer to lose it.  I
recently taught a class and used a variety of examples of SPARQL queries
over wikidata.  Having that shortener made it much faster for me to
assemble the lecture and give people easy ways to get to those queries.

I could have used another one on my own I guess, but the current
implementation is much faster and less error prone when dealing with
monster sparqling urls...

please find a way to keep it
-Ben

On Wed, Jun 1, 2016 at 9:51 PM, Tom Morris  wrote:

> On Thu, Jun 2, 2016 at 12:22 AM, Julie McMurry 
> wrote:
>
>> While I agree the primary aim isn't shortening, the result is usually
>> much shorter by virtue of cutting out everything non essential to
>> identification.
>
>
> Except in the case of a giant query string, such as a complex SPARQL
> query.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Aren't pages too long?

2016-05-03 Thread Benjamin Good
I think no one would disagree that the current viewing experience on the
main wikidata.org interface is not ideal.  Keep in mind though that this is
fundamentally an impossible UI task.  There are many many different ways
that the data about an item in wikidata might be best used/viewed depending
on who is doing the viewing and why.  Having the same interface attempt to
support viewing data about astronomy and art history is just never going to
be great for either.  While I applaud and encourage the ongoing development
of the default interface, I think it would benefit the community to think
past that interface to the kinds of purpose built wikidata-driven
applications that serve and benefit from specific communities.  Anything we
can do to make it easier for people to build good apps (in addition to
those in the MediaWiki family) that read and write to wikidata could have a
huge impact on its ultimate impact.

One of the things that freebase (RIP) did in this regard was the Acre [1]
hosted development environment.  While clearly it didn't tip the balance
for them, I think it was a powerful idea that we could probably learn from.

-Ben

[1] http://wiki.freebase.com/wiki/Acre


On Tue, May 3, 2016 at 6:37 AM, Gerard Meijssen 
wrote:

> Hoi,
> For several years the Reasonator has done a stellar job. It is much better
> at providing information based on the Wikidata data.
>
> It just does not have a priority to provide an interface like this. The
> beauty of the Reasonator that it can easily provide you the same
> information in other languages..
> Thanks,
>  GerardM
>
>
> https://tools.wmflabs.org/reasonator/?q=Q181
> https://tools.wmflabs.org/reasonator/?q=Q181&lang=ru
> https://tools.wmflabs.org/reasonator/?q=Q181&lang=ar
> https://tools.wmflabs.org/reasonator/?q=Q181&lang=sv
>
> On 3 May 2016 at 13:07, David Abián  wrote:
>
>> Hi,
>>
>> I think that most elements on Wikidata are nowadays too long to be
>> easily read by humans. There are many properties (which is great), the
>> information is too scattered, and this problem (if you consider it a
>> problem) will continue growing up.
>>
>> Some suggestions come to mind...
>>
>> * Visually group properties by type, using the division of
>> , or another.
>>
>> * Change our current CSS rules to show properties in a more compacted way.
>>
>> * Create a table of contents that automatically appears when more than N
>> properties have been defined for an element.
>>
>> * Combine some of the above.
>>
>> Is there something discussed/planned facing this issue?
>>
>> Regards,
>>
>> --
>> David Abián - davidabian.com
>> Vocal de Comunicación
>>
>> Wikimedia España
>> Vega Sicilia, 2
>> 47008 - Valladolid
>> https://wikimedia.es
>>
>> Wikimedia España es una asociación sin ánimo de lucro española con
>> CIF G-10413698 inscrita en el Registro Nacional de Asociaciones,
>> Grupo 1, Sección 1, Núm. Nacional 597390.
>>
>> «Imagina un mundo en el que cada persona
>> tenga acceso libre a todo el conocimiento».
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] item / claim usage tracking

2016-04-26 Thread Benjamin Good
If a group (lets say either mine or Lydia's team) was interested in doing
what I propose, could we request and sign such an NDA?  This doesn't seem
insurmountable right ?

On Tue, Apr 26, 2016 at 10:01 AM, Dan Garry  wrote:

> On 26 April 2016 at 08:41, Benjamin Good  wrote:
>
>> Perhaps you could use the query log (just the list of SPARQL queries) and
>> utilize an offline installation of the query service to execute them and
>> generate aggregate statistics?
>>
>
> As a rule of thumb, if you think you've found a convenient way around
> needing an NDA... you probably haven't. ;-)
>
> The log of the list of queries would also be covered under the privacy
> policy. The log contains arbitrary, free-form user input and therefore is
> treated as containing personally identifying information until proven
> otherwise. You're correct that aggregates (like the ones that you're after)
> are generally fine to release publicly, but the person creating those
> aggregates would still need an NDA.
>
> I'm sorry for the inconvenience. The Wikimedia Foundation tries its
> hardest to safeguard user data, which can sometimes complicate processes
> like this as they're designed for maximal user safety and privacy rather
> than convenience.
>
> Hope that helps,
> Dan
>
> --
> Dan Garry
> Lead Product Manager, Discovery
> Wikimedia Foundation
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] item / claim usage tracking

2016-04-26 Thread Benjamin Good
Perhaps you could use the query log (just the list of SPARQL queries) and
utilize an offline installation of the query service to execute them and
generate aggregate statistics?  e,g, Q76 appeared in the results of X
queries submitted in January 2016..   That seems like a doable summer
student project and also seems like it ought to be okay to share such
aggregate results - pretty much the same thing as page view statistics but
for a database.

?


On Tue, Apr 26, 2016 at 2:40 AM, Lydia Pintscher <
lydia.pintsc...@wikimedia.de> wrote:

> On Tue, Apr 26, 2016 at 1:30 AM Benjamin Good 
> wrote:
>
>> I'll start with the simple question than give the longer context.  Is
>> there any way to know how many times an item or a claim appears in the
>> results of a query to query.wikidata.org ?   Are there any other ways to
>> quantify query/application usage of specific wikidata content?
>>
>> Background.  The gene wiki people recently attended a conference on
>> 'biocuration' (the construction and maintenance of biological databases)
>> where we gave multiple wikidata-related presentations.  The community there
>> generally had a very positive reaction to what we have been doing but many
>> were concerned about attribution.  They wanted to know that when data was
>> imported into wikidata from their resources (e.g. the Gene Ontology), that
>> there was some way to ensure that the world knew where it came from so that
>> the authors could get appropriate credit (which translates into grant money
>> which translates into their jobs).  We explained the reference model to
>> them, which helped, but still they are concerned.
>>
>> The most important consequence of moving data into wikidata is that it
>> can get used - sometimes a lot! (e.g. when displayed on Wikipedia
>> articles).  If we could quantify usage for data providers, it would really
>> help them make the argument to their funding sources that contributing to
>> wikidata increases their impact.  If we can get that across, it would help
>> bring more people, more high quality data, and more funding into the
>> wikidata fold.
>>
>> thoughts?
>>
>
> Currently there is no way to tell how much a particular data point is used
> in query results to query.wikidata.org. I am not even sure if there is a
> meaningful way to do this. We can't give access to query logs without an
> NDA with the Wikimedia Foundation for privacy reasons.
> As for usage on Wikipedia we do have statistics on that and that is
> available in the database on labs. But those are on the level of the whole
> item or sitelinks, not a particular statement. We'll look into making this
> more accessible on-wiki.
>
> Cheers
> Lydia
> --
> Lydia Pintscher - http://about.me/lydia.pintscher
> Product Manager for Wikidata
>
> Wikimedia Deutschland e.V.
> Tempelhofer Ufer 23-24
> 10963 Berlin
> www.wikimedia.de
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
>
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
> unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt
> für Körperschaften I Berlin, Steuernummer 27/029/42207.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] item / claim usage tracking

2016-04-25 Thread Benjamin Good
I'll start with the simple question than give the longer context.  Is there
any way to know how many times an item or a claim appears in the results of
a query to query.wikidata.org ?   Are there any other ways to quantify
query/application usage of specific wikidata content?

Background.  The gene wiki people recently attended a conference on
'biocuration' (the construction and maintenance of biological databases)
where we gave multiple wikidata-related presentations.  The community there
generally had a very positive reaction to what we have been doing but many
were concerned about attribution.  They wanted to know that when data was
imported into wikidata from their resources (e.g. the Gene Ontology), that
there was some way to ensure that the world knew where it came from so that
the authors could get appropriate credit (which translates into grant money
which translates into their jobs).  We explained the reference model to
them, which helped, but still they are concerned.

The most important consequence of moving data into wikidata is that it can
get used - sometimes a lot! (e.g. when displayed on Wikipedia articles).
If we could quantify usage for data providers, it would really help them
make the argument to their funding sources that contributing to wikidata
increases their impact.  If we can get that across, it would help bring
more people, more high quality data, and more funding into the wikidata
fold.

thoughts?
-Ben
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] From the game to the competition

2016-02-08 Thread Benjamin Good
I for one would certainly encourage further development of many different
interfaces that support the collection of good edits from many editors
across many different communities and I think "games" can be a part of
that.  While I understand the dangers of gamification done badly, done
well, it can be a fantastic thing for rallying a community together.  I
would not advocate that wikidata call itself a game..  or even that it
devote much if any resources to building game-like editing systems itself.
I would however encourage the development of more applications like the
wikidata game(s) that Magnus has created - but better.  Luckily all of the
things that I would suggest for improvements could be layered on top of the
code he has provided..   For me 'better' would mean:
 (1) Require some level of task redundancy as quality control.  As the
number of people playing these games grows, the increase in precision by
adding in something simple like - "only make edit if 2/3 people agree" is
well worth the loss of recall.
 (2) make many games for many specific communities of interest and market
them directly to them.  You don't have to call them games either.  You
could just say "we need volunteers and here is a really easy way that you
can help".
 (3) If you are going to call them games.. (and that is a key IF), then
make them more fun.  There are lots of angles to take this - not just high
score lists.  This the principal double edged sword here that Lydia raised
concern about above.  If you make the game-element very appealing you have
to be very careful that the game incentives are exactly aligned with what
you want to encourage the community to do.  If the game isn't quite lined
up, you can get into trouble with people over optimizing to win the game
while hurting the system the game was built to help.

http://zooniverse.org is an example of a collection of volunteer-tasks (for
"citizen science") that could be gamified, but they have made the explicit
decision not to do so because they are worried about incentive alignment -
but also because they already have a community with more than 1 million
registered users!  While wikidata is off to a great start, there is a long
way to go before we have the community needed to really achieve its
potential.  I support exploring many ways to get there, including well
thought out games.

-Ben

On Sun, Feb 7, 2016 at 10:02 AM, Gerard Meijssen 
wrote:

> Hoi,
> There is no reason why we cannot have both. There are continuous
> activities where play may not be that relevant, there is also the
> occasional stuff. Where time for instance is of the essence.
>
> I think the seriousness of Wiki* is vastly overrated and is often a self
> defeating proposition.
> Thanks,
>  GerardM
>
> On 7 February 2016 at 16:09, Sandra Fauconnier <
> sandra.fauconn...@gmail.com> wrote:
>
>> Wow, these are great links, Lydia, thanks.
>>
>> I, for one, would warmly welcome more well-designed games, especially in
>> the distributed game framework that Magnus has built.
>> Not so much for the playfulness, but because it’s such an easy way to do
>> many useful edits without needing deep concentration, and because I find it
>> really interesting to see all the kind of content that we cover (the games
>> allow me to get out of the ‘filter bubble’ in which I usually edit, which
>> is the field of the visual arts).
>>
>> Game-like tools that I would like to see, would include
>> - a sourcing game to add reference URLs from RKDartists to statements
>> related to artists (birth dates, death dates, places of birth and death,
>> professions)
>> - a nice and pleasant interface that allows me to state what is depicted
>> in an artwork
>> - a better game to add Thesaurus of Geograpic Names IDs to geographic
>> entities. The TGN is now in the distributed and ‘normal’ Mix’n’Match
>> version but is hard to match in these. It really needs a good interface
>> with a large map for ‘our’ items next to the more detailed geographical
>> info contained in the TGN (tree view; is it a city/river/mountain…)
>>
>> A competition element, on the other hand, would really put me off. I
>> don’t care at all, I’m not in it for that and it would chase me away very
>> quickly.
>>
>> Sandra
>>
>>
>> On 07 Feb 2016, at 10:00, Lydia Pintscher 
>> wrote:
>>
>> On Sat, Feb 6, 2016 at 11:08 PM David Abián 
>> wrote:
>>
>>> Hi folks,
>>>
>>> It's fantastic to see that we have such interesting tools to contribute
>>> to Wikidata like Magnus' games.
>>>
>>> With Wikidata Game and The Distributed Game as a base, I think we could
>>> go further and get a tool that serve, not only as a game, but as a real
>>> competition. In particular, with the following additions and a few
>>> suggestions, I believe we could celebrate great /in situ/ Wikidata
>>> competitions over the world:
>>>
>>> * A chronometer with a start and a scheduled end while contributions are
>>> registered for the contest.
>>> * Some quorum (e.g., three) so that edits in the c

Re: [Wikidata] Brief demo of ConceptMap v1.1 - Creating dynamic concept maps from Wikidata

2016-01-11 Thread Benjamin Good
James,

This is cool.  The integration between Wikipedia and Wikidata is really
smooth and useful.  I wonder if you could let users do Wikidata editing
right in context rather than having them have to find the link on the top
there and jump out of the app and into the editor ?  There might be simple
wikidata-game type tasks that would be easy to display and execute if a
full-blown editor is out of scope.

This is highly related to an application that we are developing for
interrogating large knowledge graphs.  It currently runs on top of a
database of triples extracted automatically from the text of abstracts from
biomedical journal articles, but we are planning to open it up to include
wikidata and many other sources in its next incarnation.  The current live
version is written in python-django/mysql (source
https://bitbucket.org/sulab/kb1 ) and can be played with at
http://knowledge.bio .  The development version is being redone in Java
using many of the same technologies you are using.  Would be cool if we
could somehow join forces..  (I cc'd Richard, the main developer on the
project).  I will post a link to the source code when the first build of
the java version is ready.

-Ben


On Mon, Jan 11, 2016 at 11:14 AM,  wrote:

> Fellow Wikidata enthusiasts,
>
> In answer to questions about the purposes for ConceptMap (live at
> http://ConceptMap.io), I've expanded the blog post and updated the
> video.  Here is the updated blog post:
>
>
> http://learnjavafx.typepad.com/weblog/2016/01/brief-demo-of-conceptmap-v11-creating-dynamic-concept-maps.html
>
> Also, for your convenience, here is a link to the updated video, as well
> as the additional text in the post.  I'd appreciate any continued input and
> questions that you have:
>
> https://vimeo.com/151407497
>
> One of the core features of ConceptMap is the synchronized navigation
> between Wikipedia articles and their Wikidata semantic relationships.  In
> the brief video, for example, the Wikipedia article named "David Bowie" is
> displayed in the rightmost panel.  The semantic relationships defined
> in Wikidata for David Bowie (whose Wikidata identifier is "Q5383") are
> displayed in the center panel.  The user can see the items related to David
> Bowie (e.g. All the Young Dudes), as well as the relevant Wikidata
> properties (e.g. performer).  When the user clicks on a related item, the
> Wikipedia article for that item appears in the rightmost panel.
> Conversely, when the user clicks an article link in a Wikipedia page, the
> center panel is updated to show the relationships defined in the Wikidata
> item corresponding to that article.  This synchronized navigation feature
> enables the user to explore areas of interest using an approach that
> combines structured (Wikidata relationships) and freeform (Wikipedia links)
> navigation.
>
> Another core feature of ConceptMap is the dynamically created directed
> graph in the leftmost panel.  When a Wikipedia article is displayed and its
> corresponding Wikidata item appears in the title area, the "Pin item"
> checkbox may be used to pin/unpin the item from the concept map.  Each of
> the pinned items appear as labeled circles (nodes) in the concept map.  As
> items are added to the graph, relationships from Wikidata are displayed as
> labeled lines (links) between the nodes.
>
> As shown in the video, the Wikidata icon in the upper right portion of the
> page opens the Wikidata page for the selected item.  This is useful for
> adding missing relationships to Wikidata, as shown in the video when adding
> "space rock" to the list of genres for his song entitled Space Oddity.  To
> share or bookmark a concept map, click the button with the link icon as
> shown in the video.  A shortened URL appears that you can copy.
>
> In future posts I plan to share details of the technologies and code
> involved in creating this application.  In the meantime, whether you are
> a learner, researcher, teacher, or all of the above, I hope that you'll
> take it for a spin!
>
>
> Regards,
> James Weaver
> Twitter: @JavaFXpert
>
>
> On Sun, Jan 10, 2016, at 04:30 PM, Jo wrote:
>
> I was able to do a mind map like  thing with it. Since some connectio
> where missing in WD.I added them, so it helps people improve WD.
>
> Polyglot
> *___*
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] content for wikidata tutorial?

2015-12-08 Thread Benjamin Good
Hi everyone,  the tutorial was well received though will be much better
next time around..  Thanks for your feedback.  It is discussed with links
to the three sub-presentations that formed the standing and talking part of
it here: http://tinyurl.com/swat4ls-wikidata

Thanks again.  Its pretty clear the community here needs to evolve a really
killer hands-on wikidata tutorial.  Not quite there yet, but good
progress!  We will have to do this again for life scientists in March 2016..

cheers!
-Ben



On Fri, Dec 4, 2015 at 6:33 PM, Jan Ainali  wrote:

> Mine was on purpose Autolist2 to make bulk edits :)
>
> Short documentation here:
> https://www.wikidata.org/wiki/Wikidata:Workshop_at_SMHI
>
> I also remember now that I had prepared a list of Swedes that did not have
> P21 (gender), to serve as a first simple edit for each participant. I made
> them first sign up on the numbered list so they each got a number, then
> pasted in the prepared list of persons and asked them to add the statement
> on the item corresponding to their number in the participants list.
>
>
>
> *Med vänliga hälsningar,Jan Ainali*
>
> Verksamhetschef, Wikimedia Sverige <http://wikimedia.se>
> 0729 - 67 29 48
>
>
> *Tänk dig en värld där varje människa har fri tillgång till mänsklighetens
> samlade kunskap. Det är det vi gör.*
> Bli medlem. <http://blimedlem.wikimedia.se>
>
>
> 2015-12-04 19:22 GMT+01:00 Andrew Gray :
>
>> Yes, we carefully used Autolist1 so that they couldn't easily make bulk
>> edits, to avoid this :-). It was solely a discovery tool rather than an
>> editing one.
>>
>> However, in the workshop, one person did figure out how to, and did a
>> batch of fifty on their own initiative!
>>
>> A.
>>
>> On 4 December 2015 at 18:18, Jan Ainali  wrote:
>>
>>> That is a nice idea Andrew. One thing to be aware of is editing pace. I
>>> had an advanced workshop with prepared pre-filled Autolists, and when 10-15
>>> people with new accounts on the same IP tried to add statements at the same
>>> time through Autolist there was some mechanism that kicked in (to protect
>>> Wikidata). I understand the reason for the feature and do not suggest
>>> changing it, people designing workshops just need to be aware that this
>>> feature exist.
>>>
>>>
>>> *Med vänliga hälsningar,Jan Ainali*
>>>
>>> Verksamhetschef, Wikimedia Sverige <http://wikimedia.se>
>>> 0729 - 67 29 48
>>>
>>>
>>> *Tänk dig en värld där varje människa har fri tillgång till
>>> mänsklighetens samlade kunskap. Det är det vi gör.*
>>> Bli medlem. <http://blimedlem.wikimedia.se>
>>>
>>>
>>> 2015-12-04 19:07 GMT+01:00 Andrew Gray :
>>>
>>>> Charles Matthews and I ran a workshop a little while ago which had
>>>> something like the fortune cookie idea.
>>>>
>>>> First, we demonstrated basic Wikidata editing (adding/changing
>>>> statements) as part of a discussion on the data structure - properties
>>>> and items, item versus text properties, etc.
>>>>
>>>> After this, we gave everyone a numbered slip with a Wikidata query (in
>>>> WDQ form) on it - mostly of the type "claim[X] and noclaim[Y]". Then
>>>> we got them to load up pre-filled Autolist links (all numbered and
>>>> ready), pick a couple of entries from the list, and try to fix
>>>> whatever was missing. (There was an unintended detour at this point
>>>> into how to interpret WDQ queries - people got the idea pretty fast
>>>> that these were one set of items missing particular values)
>>>>
>>>> Queries we used were things like "people with no nationality" (though
>>>> "people born since 1600 with no nationality" would have worked
>>>> better), "people with no occupation", "buildings that don't have a
>>>> 'located in' value", etc.
>>>>
>>>> This got people making small edits very early, ensured that we had a
>>>> fresh supply of "missing cases" to work on (because the lists were
>>>> generated from scratch), and prompted a lot of very good questions for
>>>> discussion, people starting to hack the queries to find more specific
>>>> topics, etc. I was really quite pleased with the way it worked.
>>>>
>>>> Andrew.
>>>>
>>>>
>>>> On 4 December 2015 at 17:38, Benjamin Good 
>>>> wrote:
>>>> > Thanks All!

Re: [Wikidata] content for wikidata tutorial?

2015-12-04 Thread Benjamin Good
Sounds great.  Very much like what we are thinking.

On Fri, Dec 4, 2015 at 10:07 AM, Andrew Gray 
wrote:

> Charles Matthews and I ran a workshop a little while ago which had
> something like the fortune cookie idea.
>
> First, we demonstrated basic Wikidata editing (adding/changing
> statements) as part of a discussion on the data structure - properties
> and items, item versus text properties, etc.
>
> After this, we gave everyone a numbered slip with a Wikidata query (in
> WDQ form) on it - mostly of the type "claim[X] and noclaim[Y]". Then
> we got them to load up pre-filled Autolist links (all numbered and
> ready), pick a couple of entries from the list, and try to fix
> whatever was missing. (There was an unintended detour at this point
> into how to interpret WDQ queries - people got the idea pretty fast
> that these were one set of items missing particular values)
>
> Queries we used were things like "people with no nationality" (though
> "people born since 1600 with no nationality" would have worked
> better), "people with no occupation", "buildings that don't have a
> 'located in' value", etc.
>
> This got people making small edits very early, ensured that we had a
> fresh supply of "missing cases" to work on (because the lists were
> generated from scratch), and prompted a lot of very good questions for
> discussion, people starting to hack the queries to find more specific
> topics, etc. I was really quite pleased with the way it worked.
>
> Andrew.
>
>
> On 4 December 2015 at 17:38, Benjamin Good 
> wrote:
> > Thanks All!
> > (and especially to Lane for by far the best complement I've received,
> maybe
> > ever..)
> >
> > Will get back to you with the final product and some news about the
> > meeting..  Andra Waagmeester had a great idea that unfortunately we are a
> > bit late to implement.  Fortune cookies to pass out where each fortune
> is a
> > single wikidata edit that the recipient is encouraged to make..  Would
> love
> > to see that play out someday.
> >
> > -Ben
> >
> > On Fri, Dec 4, 2015 at 6:51 AM, Lane Rasberry 
> wrote:
> >>
> >> Benjamin,
> >>
> >> It might be helpful for you to get confirmation that there are no
> >> excellent polished Wikidata tutorials in existence.
> >>
> >> The good tutorials are made by people who know Wikidata, like the one
> EMW
> >> shared, but EMW is not a graphic designer and made a practical
> presentation
> >> rather than a corporate scripted slideset.
> >>
> >> Your "poof it works" article is the state of the art.
> >>
> >> <
> http://sulab.org/2015/10/poof-it-works-using-wikidata-to-build-wikipedia-articles-about-genes/
> >
> >>
> >> It is all very casual and everything understates how important and
> >> revolutionary Wikidata is. I still show your article to lots of people.
> Of
> >> all the Wikidata narratives I have read I like yours the best.
> >>
> >> yours,
> >>
> >> On Thu, Dec 3, 2015 at 4:31 PM, Benjamin Good  >
> >> wrote:
> >>>
> >>> The gene wiki people are hosting a tutorial on wikidata in Cambridge,
> UK
> >>> next Monday [1].  In the interest of making the best tutorial in the
> least
> >>> amount of preparation time.. I was wondering if anyone on the list had
> >>> content (slides, handouts, cheatsheets) that they had already used
> >>> successfully and might want to share?  We are assembling the structure
> of
> >>> the 90 minute session in a google doc [2], feel free to chime in there
> !
> >>> And of course everything we generate for that will be available online
> as
> >>> soon as it exists.
> >>>
> >>> cheers
> >>> -Ben
> >>>
> >>> [1]
> http://www.swat4ls.org/workshops/cambridge2015/programme/tutorials/
> >>>
> >>> [2]
> https://docs.google.com/document/d/1dSgm90SbQBpHqEMa17t5zQL0PB2waIKD3LKTPPknmcY/edit#heading=h.m19y528ldds8
> >>>
> >>>
> >>> ___
> >>> Wikidata mailing list
> >>> Wikidata@lists.wikimedia.org
> >>> https://lists.wikimedia.org/mailman/listinfo/wikidata
> >>>
> >>
> >>
> >>
> >> --
> >> Lane Rasberry
> >> user:bluerasberry on Wikipedia
> >> 206.801.0814
> >> l...@bluerasberry.com
> >>
> >> ___
> >> Wikidata mailing list
> >> Wikidata@lists.wikimedia.org
> >> https://lists.wikimedia.org/mailman/listinfo/wikidata
> >>
> >
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
> >
>
>
>
> --
> - Andrew Gray
>   andrew.g...@dunelm.org.uk
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] content for wikidata tutorial?

2015-12-04 Thread Benjamin Good
Thanks All!
(and especially to Lane for by far the best complement I've received, maybe
ever..)

Will get back to you with the final product and some news about the
meeting..  Andra Waagmeester had a great idea that unfortunately we are a
bit late to implement.  Fortune cookies to pass out where each fortune is a
single wikidata edit that the recipient is encouraged to make..  Would love
to see that play out someday.

-Ben

On Fri, Dec 4, 2015 at 6:51 AM, Lane Rasberry  wrote:

> Benjamin,
>
> It might be helpful for you to get confirmation that there are no
> excellent polished Wikidata tutorials in existence.
>
> The good tutorials are made by people who know Wikidata, like the one EMW
> shared, but EMW is not a graphic designer and made a practical presentation
> rather than a corporate scripted slideset.
>
> Your "poof it works" article is the state of the art.
> <
> http://sulab.org/2015/10/poof-it-works-using-wikidata-to-build-wikipedia-articles-about-genes/
> >
>
> It is all very casual and everything understates how important and
> revolutionary Wikidata is. I still show your article to lots of people. Of
> all the Wikidata narratives I have read I like yours the best.
>
> yours,
>
> On Thu, Dec 3, 2015 at 4:31 PM, Benjamin Good 
> wrote:
>
>> The gene wiki people are hosting a tutorial on wikidata in Cambridge, UK
>> next Monday [1].  In the interest of making the best tutorial in the least
>> amount of preparation time.. I was wondering if anyone on the list had
>> content (slides, handouts, cheatsheets) that they had already used
>> successfully and might want to share?  We are assembling the structure of
>> the 90 minute session in a google doc [2], feel free to chime in there !
>> And of course everything we generate for that will be available online as
>> soon as it exists.
>>
>> cheers
>> -Ben
>>
>> [1] http://www.swat4ls.org/workshops/cambridge2015/programme/tutorials/
>> [2]
>> https://docs.google.com/document/d/1dSgm90SbQBpHqEMa17t5zQL0PB2waIKD3LKTPPknmcY/edit#heading=h.m19y528ldds8
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
>
> --
> Lane Rasberry
> user:bluerasberry on Wikipedia
> 206.801.0814
> l...@bluerasberry.com
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] content for wikidata tutorial?

2015-12-03 Thread Benjamin Good
The gene wiki people are hosting a tutorial on wikidata in Cambridge, UK
next Monday [1].  In the interest of making the best tutorial in the least
amount of preparation time.. I was wondering if anyone on the list had
content (slides, handouts, cheatsheets) that they had already used
successfully and might want to share?  We are assembling the structure of
the 90 minute session in a google doc [2], feel free to chime in there !
And of course everything we generate for that will be available online as
soon as it exists.

cheers
-Ben

[1] http://www.swat4ls.org/workshops/cambridge2015/programme/tutorials/
[2]
https://docs.google.com/document/d/1dSgm90SbQBpHqEMa17t5zQL0PB2waIKD3LKTPPknmcY/edit#heading=h.m19y528ldds8
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] provenance tracking for high volume edit sources (was Data model explanation and protection)

2015-11-10 Thread Benjamin Good
You misunderstand me if you thought I was blaming Magnus for this.  It was
a hypothesis that right now seems false and we do not yet have another
answer.  I do think it is entirely possible that a high-volume,
low-user-expertise game interface could generate problems very much like
what we are observing.  I think we should be able to track them more
transparently than we can now.
The widar tag seems a starting point:
https://www.wikidata.org/w/index.php?title=Special:RecentChanges&tagfilter=OAuth+CID%3A+93
but this could be improved.

-Ben
p.s. Side note on the game.  Other very similar things usually incorporate
some level of redundancy - e.g. you show the same thing to multiple people
and only keep statements where 2 or more people agree..  Lower recall but
higher precision - depends on the goal.



On Tue, Nov 10, 2015 at 9:44 AM, Finn Årup Nielsen  wrote:

> If I understand correctly:
>
> 1) Magnus' game already tags the edits with 'Widar'.
>
> 2) Magnus' game cannot merge protein and genes if they link to each other.
> With 'ortholog' and 'expressed by' Magnus' merging game does not contribute
> to the problematic merges (Magnus email from previously today: "FWIW,
> checked again. Neither game can merge two items that link to each other.
> So, if the protein is "expressed by" the gene, that pair will not even be
> suggested.").
>
> There is nothing more that Magnus can do, - except making an unmerging
> game. :-)
>
> /Finn
>
>
>
> On 11/10/2015 05:54 PM, Benjamin Good wrote:
>
>> In another thread, we are discussing the preponderance of problematic
>> merges of gene/protein items.  One of the hypotheses raised to explain
>> the volume and nature of these merges (which are often by fairly
>> inexperienced editors and/or people that seem to only do merges) was
>> that they were coming from the wikidata game.  It seems to me that
>> anything like the wikidata game that has the potential to generate a
>> very large volume of edits - especially from new editors - ought to tag
>> its contributions so that they can easily be tracked by the system.  It
>> should be easy to answer the question of whether an edit came from that
>> game (or any of what I hope to be many of its descendants).  This will
>> make it possible to debug what could potentially be large swathes of
>> problems and to make it straightforward to 'reward' game/other
>> developers with information about the volume of the edits that they have
>> enabled directly from the system (as opposed to their own tracking data).
>>
>> Please don't misunderstand me.  I am a big fan of the wikidata game and
>> actually am pushing for our group to make a bio-specific version of it
>> that will build on that code.  I see a great potential here - but
>> because of the potential scale of edits this could quickly generate, we
>> (the whole wikidata community) need ways to keep an eye on what is going
>> on.
>>
>> -Ben
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
> --
> Finn Årup Nielsen
> http://people.compute.dtu.dk/faan/
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] provenance tracking for high volume edit sources (was Data model explanation and protection)

2015-11-10 Thread Benjamin Good
In another thread, we are discussing the preponderance of problematic
merges of gene/protein items.  One of the hypotheses raised to explain the
volume and nature of these merges (which are often by fairly inexperienced
editors and/or people that seem to only do merges) was that they were
coming from the wikidata game.  It seems to me that anything like the
wikidata game that has the potential to generate a very large volume of
edits - especially from new editors - ought to tag its contributions so
that they can easily be tracked by the system.  It should be easy to answer
the question of whether an edit came from that game (or any of what I hope
to be many of its descendants).  This will make it possible to debug what
could potentially be large swathes of problems and to make it
straightforward to 'reward' game/other developers with information about
the volume of the edits that they have enabled directly from the system (as
opposed to their own tracking data).

Please don't misunderstand me.  I am a big fan of the wikidata game and
actually am pushing for our group to make a bio-specific version of it that
will build on that code.  I see a great potential here - but because of the
potential scale of edits this could quickly generate, we (the whole
wikidata community) need ways to keep an eye on what is going on.

-Ben
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Data model explanation and protection

2015-11-10 Thread Benjamin Good
Finn,

Thanks, I know the gene-protein thing is confusing.  The example you raise
there shows nicely why things are set up the way they are.  One of the
challenges is that there are so many related, but fundamentally different
things to deal with that any reliance on human readable names leads almost
immediately to data disaster.. This is why we have been working hard on
bringing in all the various unique identifier properties for these items.

(The link to the mouse protein was a mistake.. the bot seems to have had
some mouse related problems lately - Andra is working to fix them.)

-Ben

On Tue, Nov 10, 2015 at 2:18 AM, Finn Årup Nielsen  wrote:

> Isn't Magnus Manske's game tagging the edit with "Widar"? I do not see
> that for, for instance, the user Hê de tekhnê makrê.
>
> I must say, being a wannabe bioinformatician, that the gene/protein data
> in Wikidata can be confusing. Take https://www.wikidata.org/wiki/Q14907009
> which had a merging problem (that I have tried to resolve).
>
> Even before merging
> https://www.wikidata.org/w/index.php?title=Q14907009&oldid=261061025 this
> human gene had three gene products "cyclin-dependent kinase inhibitor 2A",
> "P14ARF" (which to me looked like a gene symbol, I changed it to p14ARF),
> and "Tumor suppressor ARF". One of them is a mouse protein. One of the
> others link to http://www.uniprot.org/uniprot/Q8N726 Here the recommended
> name is "Tumor suppressor ARF" while alternative names are
> "Cyclin-dependent kinase inhibitor 2A" and "p14ARF". To me it seems that
> one gene codes two proteins that can be referred to by the same name.
>
> I hope my edits haven't made more damage than good. Several P1889s would
> be nice.
>
> I think, as someone suggested, that adding P1889 and having Wikibase
> merging looking at P1889 would be a solution.
>
>
> /Finn
>
>
> On 11/10/2015 12:34 AM, Benjamin Good wrote:
>
>> Magnus,
>>
>> We are seeing more and more of these problematic merges.  See:
>> http://tinyurl.com/ovutz5x for the current list of (today 61) problems.
>> Are these coming from the wikidata game?
>>
>> All of the editors performing the merges seem to be new and the edit
>> patterns seem to match the game.  I thought the edits were tagged with a
>> statement about them coming from the game, but I don't see that?  If
>> they are, could you just take genes and proteins out of the 'potential
>> merge' queue ?  I'm guessing that their frequently very similar names
>> are putting many of them into the list.
>>
>> We are starting to work on a bot to combat this, but would like to stop
>> the main source of the damage if its possible to detect it.  This is ,
>> making Wikipedia integration more challenging than it already is...
>>
>> thanks
>> -Ben
>>
>>
>> On Wed, Oct 28, 2015 at 3:41 PM, Magnus Manske
>> mailto:magnusman...@googlemail.com>> wrote:
>>
>> I fear my games may contribute to both problems (merging two items,
>> and adding a sitelink to the wrong item). Both are facilitated by
>> identical names/aliases, and sometimes it's hard to tell that a pair
>> is meant to be different, especially if you don't know about the
>> intricate structures of the respective knowledge domain.
>>
>> An item-specific, but somewhat heavy-handed approach would be to
>> prevent merging of any two items where at least one has P1889, no
>> matter what it specifically points to. At least, give a warning that
>> an item is "merge-protected", and require an additional override for
>> the merge.
>>
>> If that is acceptable, it would be easy for me to filter all items
>> with P1889, from the merge game at least.
>>
>> On Wed, Oct 28, 2015 at 8:50 PM Peter F. Patel-Schneider
>> mailto:pfpschnei...@gmail.com>> wrote:
>>
>> On 10/28/2015 12:08 PM, Tom Morris wrote:
>> [...]
>>  > Going back to Ben's original problem, one tool that Freebase
>> used to help
>>  > manage the problem of incompatible type merges was a set of
>> curated sets of
>>  > incompatible types [5] which was used by the merge tools to
>> warn users that
>>  > the merge they were proposing probably wasn't a good idea.
>> People could
>>  > ignore the warning in the Freebase implementation, but
>> Wikidata could make it
>>  > a hard restriction or just a warning.
>>   

Re: [Wikidata] Data model explanation and protection

2015-11-09 Thread Benjamin Good
Magnus,

We are seeing more and more of these problematic merges.  See:
http://tinyurl.com/ovutz5x for the current list of (today 61) problems.
Are these coming from the wikidata game?

All of the editors performing the merges seem to be new and the edit
patterns seem to match the game.  I thought the edits were tagged with a
statement about them coming from the game, but I don't see that?  If they
are, could you just take genes and proteins out of the 'potential merge'
queue ?  I'm guessing that their frequently very similar names are putting
many of them into the list.

We are starting to work on a bot to combat this, but would like to stop the
main source of the damage if its possible to detect it.  This is making
Wikipedia integration more challenging than it already is...

thanks
-Ben


On Wed, Oct 28, 2015 at 3:41 PM, Magnus Manske 
wrote:

> I fear my games may contribute to both problems (merging two items, and
> adding a sitelink to the wrong item). Both are facilitated by identical
> names/aliases, and sometimes it's hard to tell that a pair is meant to be
> different, especially if you don't know about the intricate structures of
> the respective knowledge domain.
>
> An item-specific, but somewhat heavy-handed approach would be to prevent
> merging of any two items where at least one has P1889, no matter what it
> specifically points to. At least, give a warning that an item is
> "merge-protected", and require an additional override for the merge.
>
> If that is acceptable, it would be easy for me to filter all items with
> P1889, from the merge game at least.
>
> On Wed, Oct 28, 2015 at 8:50 PM Peter F. Patel-Schneider <
> pfpschnei...@gmail.com> wrote:
>
>> On 10/28/2015 12:08 PM, Tom Morris wrote:
>> [...]
>> > Going back to Ben's original problem, one tool that Freebase used to
>> help
>> > manage the problem of incompatible type merges was a set of curated
>> sets of
>> > incompatible types [5] which was used by the merge tools to warn users
>> that
>> > the merge they were proposing probably wasn't a good idea.  People could
>> > ignore the warning in the Freebase implementation, but Wikidata could
>> make it
>> > a hard restriction or just a warning.
>> >
>> > Tom
>>
>> I think that this idea is a good one.  The incompatibility information
>> could
>> be added to classes in the form of "this class is disjoint from that other
>> class".  Tools would then be able to look for this information and produce
>> warnings or even have stronger reactions to proposed merging.
>>
>> I'm not sure that using P1889 "different from" is going to be adequate.
>> What
>> links would be needed?  Just between a gene and its protein?  That
>> wouldn't
>> catch merging a gene and a related protein.  Between all genes and all
>> proteins?  It seems to me that this is better handled at the class level.
>>
>> peter
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Use-notes in item descriptions

2015-11-05 Thread Benjamin Good
A section in the talk page associated with the article in question would
seem to solve this (definitely real) problem? - assuming that a would-be
editor was aware of the talk page.
Alternatively, you could propose a generic property with a text field that
could be added to items on an as-needed basis without any change to the
current software.  Again though, the challenge would be getting the
information in front of the user/editor at the right point in time.


On Thu, Nov 5, 2015 at 2:16 AM, Jane Darnell  wrote:

> Yes I have noticed this need for use notes, but it is specific to
> properties, isn't it? I see it in things such as choosing what to put in
> the "genre" property of an artwork. It would be nice to have some sort of
> pop-up that you can fill with more than what you put in. For example I get
> easily confused when I address the relative (as in kinship) properties;
> "father of the subject" is clear, but what about cousin/nephew etc.? You
> need more explanation room than can be stuffed in the label field to fit in
> the drop down. I have thought about this, but don't see any easy solution
> besides what you have done.
>
> On Thu, Nov 5, 2015 at 10:51 AM, James Heald  wrote:
>
>> I have been wondering about the practice of putting use-notes in item
>> descriptions.
>>
>> For example, on Q6581097 (male)
>>   https://www.wikidata.org/wiki/Q6581097
>> the (English) description reads:
>>   "human who is male (use with Property:P21 sex or gender). For
>> groups of males use with subclass of (P279)."
>>
>> I have added some myself recently, working on items in the administrative
>> structure of the UK -- for example on Q23112 (Cambridgeshire)
>>https://www.wikidata.org/wiki/Q23112
>> I have changed the description to now read
>>"ceremonial county of England (use Q21272276 for administrative
>> non-metropolitan county)"
>>
>> These "use-notes" are similar to the disambiguating hat-notes often found
>> at the top of articles on en-wiki and others; and just as those hat-notes
>> can be useful on wikis, so such use-notes can be very useful on Wikidata,
>> for example in the context of a search, or a drop-down menu.
>>
>> But...
>>
>> Given that the label field is also there to be presentable to end-users
>> in contexts outside Wikidata, (eg to augment searches on main wikis, or to
>> feed into the semantic web, to end up being used in who-knows-what
>> different ways), yet away from Wikidata a string like "Q21272276" will
>> typically have no meaning. Indeed there may not even be any distinct thing
>> corresponding to it.  (Q21272276 has no separate en-wiki article, for
>> example).
>>
>> So I'm wondering whether these rather Wikidata-specific use notes do
>> really belong in the general description field ?
>>
>> Is there a case for moving them to a new separate use-note field created
>> for them?
>>
>> The software could be adjusted to include such a field in search results
>> and drop-downs and the item summary, but they would be a separate
>> data-entry field on the item page, and a separate triple for the SPARQL
>> service, leaving the description field clean of Wikidata-specific meaning,
>> better for third-party and downstream applications.
>>
>> Am I right to feel that the present situation of just chucking everything
>> into the description field doesn't seem quite right, and we ought to take a
>> step forward from it?
>>
>>   -- James.
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Data model explanation and protection

2015-10-28 Thread Benjamin Good
For what its worth, I tend to agree with Peter here.  It makes sense to me
to add constraints akin to 'disjoint with' at the class level.   The
problem I see is that we don't exactly have classes here as the term is
used elsewhere.  I guess in wikidata, a 'class' is any entity that happens
to be used in a subclassOf claim ?

Another way forward could be to do this using properties rather than
classes.  I think this could allow use to use the constraint-checking
infrastructure that is already in place?  You could add a constraint on a
property that it is 'incompatible with' another property.  In the
protein/gene case we could pragmatically use Property:P351 (entrez gene
id), incompatible with Property:P352 (uniprot gene id).  More semantically,
we could use 'encoded by' incompatible-with 'encodes' or 'genomic start'

On Wed, Oct 28, 2015 at 5:08 PM, Peter F. Patel-Schneider <
pfpschnei...@gmail.com> wrote:

> I think that using P1889 in this way is abusing its meaning.
>
> Q16657504 P1889 Q6525093 doesn't mean that Q16657504 should not be merged
> with
> some other human item in Wikidata.
>
>
> peter
>
>
> On 10/28/2015 03:41 PM, Magnus Manske wrote:
> > I fear my games may contribute to both problems (merging two items, and
> adding
> > a sitelink to the wrong item). Both are facilitated by identical
> > names/aliases, and sometimes it's hard to tell that a pair is meant to be
> > different, especially if you don't know about the intricate structures
> of the
> > respective knowledge domain.
> >
> > An item-specific, but somewhat heavy-handed approach would be to prevent
> > merging of any two items where at least one has P1889, no matter what it
> > specifically points to. At least, give a warning that an item is
> > "merge-protected", and require an additional override for the merge.
> >
> > If that is acceptable, it would be easy for me to filter all items with
> P1889,
> > from the merge game at least.
> >
> > On Wed, Oct 28, 2015 at 8:50 PM Peter F. Patel-Schneider
> > mailto:pfpschnei...@gmail.com>> wrote:
> >
> > On 10/28/2015 12:08 PM, Tom Morris wrote:
> > [...]
> > > Going back to Ben's original problem, one tool that Freebase used
> to help
> > > manage the problem of incompatible type merges was a set of
> curated sets of
> > > incompatible types [5] which was used by the merge tools to warn
> users that
> > > the merge they were proposing probably wasn't a good idea.  People
> could
> > > ignore the warning in the Freebase implementation, but Wikidata
> could
> > make it
> > > a hard restriction or just a warning.
> > >
> > > Tom
> >
> > I think that this idea is a good one.  The incompatibility
> information  could
> > be added to classes in the form of "this class is disjoint from that
> other
> > class".  Tools would then be able to look for this information and
> produce
> > warnings or even have stronger reactions to proposed merging.
> >
> > I'm not sure that using P1889 "different from" is going to be
> adequate.  What
> > links would be needed?  Just between a gene and its protein?  That
> wouldn't
> > catch merging a gene and a related protein.  Between all genes and
> all
> > proteins?  It seems to me that this is better handled at the class
> level.
> >
> > peter
> >
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org 
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
> >
> >
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
> >
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Data model explanation and protection

2015-10-28 Thread Benjamin Good
>> it a hard restriction or just a warning.
>>
>> Tom
>>
>> [1]
>> https://en.wikipedia.org/w/index.php?title=Reelin&diff=56108806&oldid=56101233
>> [2] http://www.freebase.com/biology/protein/entrez_gene_id
>> [3]
>> https://www.wikidata.org/w/index.php?title=Q414043&type=revision&diff=262778265&oldid=262243280
>> [4]
>> https://en.wikipedia.org/w/index.php?title=Reelin&dir=prev&action=history
>> [5] http://www.freebase.com/dataworld/incompatible_types?instances=
>>
>>
>> On Wed, Oct 28, 2015 at 1:07 PM, Benjamin Good 
>> wrote:
>>
>>> The Gene Wiki team is experiencing a problem that may suggest some areas
>>> for improvement in the general wikidata experience.
>>>
>>> When our project was getting started, we had some fairly long public
>>> debates about how we should structure the data we wanted to load [1].
>>> These resulted in a data model that, we think, remains pretty much true to
>>> the semantics of the data, at the cost of distributing information about
>>> closely related things (genes, proteins, orthologs) across multiple,
>>> interlinked items.  Now, as long as these semantic links between the
>>> different item classes are maintained, this is working out great.  However,
>>> we are consistently seeing people merging items that our model needs to be
>>> distinct.  Most commonly, we see people merging items about genes with
>>> items about the protein product of the gene (e.g. [2]]).  This happens
>>> nearly every day - especially on items related to the more popular
>>> Wikipedia articles. (More examples [3])
>>>
>>> Merges like this, as well as other semantics-breaking edits, make it
>>> very challenging to build downstream apps (like the wikipedia infobox) that
>>> depend on having certain structures in place.  My question to the list is
>>> how to best protect the semantic models that span multiple entity types in
>>> wikidata?  Related to this, is there an opportunity for some consistent way
>>> of explaining these structures to the community when they exist?
>>>
>>> I guess the immediate solutions are to (1) write another bot that
>>> watches for model-breaking edits and reverts them and (2) to create an
>>> article on wikidata somewhere that succinctly explains the model and links
>>> back to the discussions that went into its creation.
>>>
>>> It seems that anyone that works beyond a single entity type is going to
>>> face the same kind of problems, so I'm posting this here in hopes that
>>> generalizable patterns (and perhaps even supporting code) can be realized
>>> by this community.
>>>
>>> [1]
>>> https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Molecular_biology#Distinguishing_between_genes_and_proteins
>>> [2] https://www.wikidata.org/w/index.php?title=Q417782&oldid=262745370
>>> [3]
>>> https://s3.amazonaws.com/uploads.hipchat.com/25885/699742/rTrv5VgLm5yQg6z/mergelist.txt
>>>
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>>
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Data model explanation and protection

2015-10-28 Thread Benjamin Good
Yup, that is characteristic of our problem.  The last proteinboxbot edit
was about the protein item..
This query also works (finds things that are both subclass of gene and
subclass of protein)

PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wd: <http://www.wikidata.org/entity/>
SELECT * WHERE {
  ?gene wdt:P279 wd:Q7187 .
  ?gene wdt:P279 wd:Q8054 .
}

On Wed, Oct 28, 2015 at 11:04 AM, Finn Årup Nielsen  wrote:

>
> The below SPARQL counts 14.
>
> Among them are https://www.wikidata.org/wiki/Q238509 which is "5-HT1A
> receptor human gene" in English and "5-HT₁A-Rezeptor Protein" in German.
> The last editor is ProteinBoxBot. It is coded by by itself. That item has a
> split personality, so it seems that we need to do some cleaning.
>
>
> PREFIX wd: <http://www.wikidata.org/entity/>
> PREFIX wdt: <http://www.wikidata.org/prop/direct/>
> PREFIX wikibase: <http://wikiba.se/ontology#>
> PREFIX p: <http://www.wikidata.org/prop/>
> PREFIX v: <http://www.wikidata.org/prop/statement/>
> PREFIX q: <http://www.wikidata.org/prop/qualifier/>
> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
>
> SELECT ?item WHERE {
>   ?item wdt:P352 ?uniprot ;
> wdt:P353 ?genesymbol .
>   }
>
>
> I now see that Teugnhausen has also merged
> https://www.wikidata.org/wiki/Special:Contributions/Teugnhausen
>
> /Finn
>
>
> On 10/28/2015 06:07 PM, Benjamin Good wrote:
>
>> The Gene Wiki team is experiencing a problem that may suggest some areas
>> for improvement in the general wikidata experience.
>>
>> When our project was getting started, we had some fairly long public
>> debates about how we should structure the data we wanted to load [1].
>> These resulted in a data model that, we think, remains pretty much true
>> to the semantics of the data, at the cost of distributing information
>> about closely related things (genes, proteins, orthologs) across
>> multiple, interlinked items.  Now, as long as these semantic links
>> between the different item classes are maintained, this is working out
>> great.  However, we are consistently seeing people merging items that
>> our model needs to be distinct.  Most commonly, we see people merging
>> items about genes with items about the protein product of the gene (e.g.
>> [2]]).  This happens nearly every day - especially on items related to
>> the more popular Wikipedia articles. (More examples [3])
>>
>> Merges like this, as well as other semantics-breaking edits, make it
>> very challenging to build downstream apps (like the wikipedia infobox)
>> that depend on having certain structures in place.  My question to the
>> list is how to best protect the semantic models that span multiple
>> entity types in wikidata?  Related to this, is there an opportunity for
>> some consistent way of explaining these structures to the community when
>> they exist?
>>
>> I guess the immediate solutions are to (1) write another bot that
>> watches for model-breaking edits and reverts them and (2) to create an
>> article on wikidata somewhere that succinctly explains the model and
>> links back to the discussions that went into its creation.
>>
>> It seems that anyone that works beyond a single entity type is going to
>> face the same kind of problems, so I'm posting this here in hopes that
>> generalizable patterns (and perhaps even supporting code) can be
>> realized by this community.
>>
>> [1]
>>
>> https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Molecular_biology#Distinguishing_between_genes_and_proteins
>> [2] https://www.wikidata.org/w/index.php?title=Q417782&oldid=262745370
>> [3]
>>
>> https://s3.amazonaws.com/uploads.hipchat.com/25885/699742/rTrv5VgLm5yQg6z/mergelist.txt
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
> --
> Finn Årup Nielsen
> http://people.compute.dtu.dk/faan/
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Data model explanation and protection

2015-10-28 Thread Benjamin Good
Yes, I think the problem of maintaining a multi-class data model within
wikidata is a general problem.  You could imagine similar scenarios in any
domain.

Our particular gene/protein merge problem is specific to our work.  It is
not just one user (Fullerene) though, this has been happening for a while
and many have participated.  See e.g. the post here:
https://www.wikidata.org/wiki/User_talk:Andrawaag#ProteinBoxBot_Mistake.3F
and here:
https://www.wikidata.org/wiki/User_talk:DGtal#Merging_items

On Wed, Oct 28, 2015 at 10:47 AM, Finn Årup Nielsen  wrote:

> Do you think it is a general problem? The few merges that I checked was
> all done by Fullerene and s/he has now responded after Andrawaag made a
> note on the talk page https://www.wikidata.org/wiki/User_talk:Fullerene
>
>
> /Finn
>
>
> On 10/28/2015 06:07 PM, Benjamin Good wrote:
>
>> The Gene Wiki team is experiencing a problem that may suggest some areas
>> for improvement in the general wikidata experience.
>>
>> When our project was getting started, we had some fairly long public
>> debates about how we should structure the data we wanted to load [1].
>> These resulted in a data model that, we think, remains pretty much true
>> to the semantics of the data, at the cost of distributing information
>> about closely related things (genes, proteins, orthologs) across
>> multiple, interlinked items.  Now, as long as these semantic links
>> between the different item classes are maintained, this is working out
>> great.  However, we are consistently seeing people merging items that
>> our model needs to be distinct.  Most commonly, we see people merging
>> items about genes with items about the protein product of the gene (e.g.
>> [2]]).  This happens nearly every day - especially on items related to
>> the more popular Wikipedia articles. (More examples [3])
>>
>> Merges like this, as well as other semantics-breaking edits, make it
>> very challenging to build downstream apps (like the wikipedia infobox)
>> that depend on having certain structures in place.  My question to the
>> list is how to best protect the semantic models that span multiple
>> entity types in wikidata?  Related to this, is there an opportunity for
>> some consistent way of explaining these structures to the community when
>> they exist?
>>
>> I guess the immediate solutions are to (1) write another bot that
>> watches for model-breaking edits and reverts them and (2) to create an
>> article on wikidata somewhere that succinctly explains the model and
>> links back to the discussions that went into its creation.
>>
>> It seems that anyone that works beyond a single entity type is going to
>> face the same kind of problems, so I'm posting this here in hopes that
>> generalizable patterns (and perhaps even supporting code) can be
>> realized by this community.
>>
>> [1]
>>
>> https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Molecular_biology#Distinguishing_between_genes_and_proteins
>> [2] https://www.wikidata.org/w/index.php?title=Q417782&oldid=262745370
>> [3]
>>
>> https://s3.amazonaws.com/uploads.hipchat.com/25885/699742/rTrv5VgLm5yQg6z/mergelist.txt
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
> --
> Finn Årup Nielsen
> http://people.compute.dtu.dk/faan/
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Data model explanation and protection

2015-10-28 Thread Benjamin Good
The Gene Wiki team is experiencing a problem that may suggest some areas
for improvement in the general wikidata experience.

When our project was getting started, we had some fairly long public
debates about how we should structure the data we wanted to load [1].
These resulted in a data model that, we think, remains pretty much true to
the semantics of the data, at the cost of distributing information about
closely related things (genes, proteins, orthologs) across multiple,
interlinked items.  Now, as long as these semantic links between the
different item classes are maintained, this is working out great.  However,
we are consistently seeing people merging items that our model needs to be
distinct.  Most commonly, we see people merging items about genes with
items about the protein product of the gene (e.g. [2]]).  This happens
nearly every day - especially on items related to the more popular
Wikipedia articles. (More examples [3])

Merges like this, as well as other semantics-breaking edits, make it very
challenging to build downstream apps (like the wikipedia infobox) that
depend on having certain structures in place.  My question to the list is
how to best protect the semantic models that span multiple entity types in
wikidata?  Related to this, is there an opportunity for some consistent way
of explaining these structures to the community when they exist?

I guess the immediate solutions are to (1) write another bot that watches
for model-breaking edits and reverts them and (2) to create an article on
wikidata somewhere that succinctly explains the model and links back to the
discussions that went into its creation.

It seems that anyone that works beyond a single entity type is going to
face the same kind of problems, so I'm posting this here in hopes that
generalizable patterns (and perhaps even supporting code) can be realized
by this community.

[1]
https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Molecular_biology#Distinguishing_between_genes_and_proteins
[2] https://www.wikidata.org/w/index.php?title=Q417782&oldid=262745370
[3]
https://s3.amazonaws.com/uploads.hipchat.com/25885/699742/rTrv5VgLm5yQg6z/mergelist.txt
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Announcing Wikidata Taxonomy Browser (beta)

2015-10-22 Thread Benjamin Good
I am having the same kinds of 500 problems.  Bitbucket is generally
suffering today:  http://status.bitbucket.org

On Thu, Oct 22, 2015 at 12:27 PM, Markus Kroetzsch <
markus.kroetz...@tu-dresden.de> wrote:

> On 22.10.2015 19:29, Dario Taraborelli wrote:
>
>> I’m constantly getting 500 errors.
>>
>>
> I also observed short outages in the past, and I sometimes had to run a
> request twice to get an answer. It seems that the hosting on bitbucket is
> not very reliable. At the moment, this is still a first preview of the tool
> without everything set up as it should be. The tool should certainly move
> to Wikimedia labs in the future.
>
> Markus
>
>
>
> --
> Markus Kroetzsch
> Faculty of Computer Science
> Technische Universität Dresden
> +49 351 463 38486
> http://korrekt.org/
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] blog post: Poof it works - using Wikidata to build Wikipedia articles about genes

2015-10-22 Thread Benjamin Good
Hi everyone, thanks for the encouragement!

The next technical step is actually a reworking of the template/lua code
for producing that infobox.  The current one works for the few genes I've
manually tested it on, but the way it uses arbitrary access (via nesting
structures in the template code) it fails rather dramatically when data
that it is expecting is missing.  Its also really inefficient.  See (and
contribute to) thread about this on the module:wikidata talk page [1].

The step after that one could end up being much harder.  Converting all of
the current infoboxes to use the 'one template to rule them all' is not
technically very hard, but not everyone on EN Wikipedia is necessarily
excited about this transition.  See e.g. the post complaining about our
experiments by one very experienced Wikipedian [2].  The real sticking
point is the challenge of making it easy for would-be editors of the data
in that infobox to edit it.  Especially given the complexity of our data
model here, its not exactly easy for a new person to find the right
wikidata items to edit.  I'm hopeful that ongoing work will ease that
process!

Happy pre-birthday Wikidata!
-Ben

[1]
https://en.wikipedia.org/wiki/Module_talk:Wikidata#Efficient_pattern_for_gathering_many_attributes_from_wikidata.3F
[2]
https://en.wikipedia.org/wiki/User_talk:JohnBlackburne#Gene_Articles_ARF6_and_RREB1



On Thu, Oct 22, 2015 at 2:21 AM, Federico Leva (Nemo) 
wrote:

> Lydia Pintscher, 22/10/2015 00:00:
>
>>
>> http://i9606.blogspot.de/2015/10/poof-it-works-using-wikidata-to-build.html
>> Yes! Yes! Yes! ;-)
>> GeneWiki people: <3
>>
>
> So the next step is mass replacements like
> https://en.wikipedia.org/w/index.php?title=ARF6&diff=prev&oldid=686833606
> and then stopping the updates to the sub-templates?
>
> Nemo
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Importing Freebase (Was: next Wikidata office hour)

2015-09-28 Thread Benjamin Good
If we want more domain-specific wikidata curators we absolutely have to
improve the flow of:
(1) viewing an article on Wikipedia
(2) discovering the associated item on wikidata
(3) making useful contributions to the item and the items surrounding it in
the graph

That little link on the side of every article in Wikipedia is literally
invaluable... and is the main thing that distinguishes wikidata from
freebase (IMHO).  The (large) technical differences pale in comparison.  I
know that people are already working on that flow, but I think its worth
emphasizing here as we consider the requirements for scaling up community
as we scale up data.

2 cents..
-Ben


On Mon, Sep 28, 2015 at 4:12 PM, John Erling Blad  wrote:

> I would like to add old URLs that seems to be a source but does not
> reference anything in the claim. For example in an item about a person, the
> name or the birth date of the person does not appear on the page still the
> page is used as a source for the persons birth date.
>
>
> On Mon, Sep 28, 2015 at 11:44 PM, Stas Malyshev 
> wrote:
>
>> Hi!
>>
>> > I see that 19.6k statements have been approved through the tool, and
>> > 5.1k statements have been rejected - which means that about 1 in 5
>> > statements is deemed unsuitable by the users of primary sources.
>>
>> From my (limited) experience with Primary Sources, there are several
>> kinds of things there that I had rejected:
>>
>> - Unsourced statements that contradict what is written in Wikidata
>> - Duplicate claims already existing in Wikidata
>> - Duplicate claims with worse data (i.e. less accurate location, less
>> specific categorization, etc) or unnecessary qualifiers (such as adding
>> information which is already contained in the item to item's qualifiers
>> - e.g. zip code for a building)
>> - Source references that do not exist (404, etc.)
>> - Source references that do exist but either duplicate existing one (a
>> number of sources just refer to different URL of the same data) or do
>> not contain the information they should (e.g. link to newspaper's
>> homepage instead of specific article)
>> - Claims that are almost obviously invalid (e.g. "United Kingdom" as a
>> genre of a play)
>>
>> I think at least some of these - esp. references that do not exist and
>> duplicates with no refs - could be removed automatically, thus raising
>> the relative quality of the remaining items.
>>
>> OTOH, some of the entries can be made self-evident - i.e. if we talk
>> about movie and Freebase has IMDB ID or Netflix ID, it may be quite easy
>> to check if that ID is valid and refers to a movie by the same name,
>> which should be enough to merge it.
>>
>> Not sure if those one-off things worth bothering with, just putting it
>> out there to consider.
>>
>> --
>> Stas Malyshev
>> smalys...@wikimedia.org
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Announcing the release of the Wikidata Query Service

2015-09-08 Thread Benjamin Good
Chiming in my excitement that the sparql service is 'officially' up!

As suggested somewhere far above, it would be great for the community to
catalogue the queries that are most important for their use cases that do
not do well on the SPARQL endpoint.  Its likely that the list isn't going
to be super-long (in terms of query structure), hence it might make sense
to establish dedicated, optimized web services (that exist apart from the
endpoint) to call upon when those kinds of queries need to be executed.

-Ben


On Tue, Sep 8, 2015 at 8:58 AM, James Heald  wrote:

> On 08/09/2015 16:15, Markus Krötzsch wrote:
>
>>
>> Yes, path queries (called TREE queries in WDQ) are usually faster in
>> WDQ. I think that WDQ is better optimised for this type of queries. This
>> is also what I had in mind with what I wrote: if you narrow down your
>> query language to specific use cases and (possibly) a subset of the
>> data, then you may be able to achieve a better performance in return.
>> There is always a trade-off there. SPARQL is rather complex (if you look
>> at the query examples page, you get an idea of the possibilities), but
>> there is a price to pay for this. I still hope that path queries in
>> particular can be made faster in the future (it still is a rather recent
>> SPARQL feature and I am sure BlazeGraph are continuously working on
>> improving their code).
>>
>> Markus
>>
>
> Thanks, Markus.
>
> Path queries are pretty important for Wikidata, though, because the way
> Wikidata is constructed in practice you almost never want to query for an
> instance (P31) of a class -- you will almost always want to include its
> subclasses too.
>
> Another query I tried and gave trouble was when somebody asked how to find
> (or even count) all statements referenced to
>http://www.lefigaro.fr/...
> -- see
> https://www.wikidata.org/wiki/Wikidata:Project_chat#WQS:_Searching_for_items_with_reference_to_.22Le_Figaro.22
>
> It may be that there's a better solution than my newbie attempt at
>http://tinyurl.com/pxlrkd7
> -- but on the face of it, it looks as if WQS is timing out trying to make
> a list of all URLs that are the target of a P854 in a reference, and
> falling over trying.
>
> But perhaps people with more SPARQL experience than my approximately 24
> hours may be able to suggest a better way round this?
>
> All best,
>
>   James.
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] WikiConference USA

2015-08-28 Thread Benjamin Good
Some portion of the Gene Wiki team is going to WikiConference USA in
Washington DC this October. http://wikiconferenceusa.org/wiki/2015/Main_Page

Is anyone from wikidata going?

We are trying to decide what we should present.

I see that EMW has submitted a workshop proposal about wikidata
http://wikiconferenceusa.org/wiki/Submissions:2015/An_ambitious_Wikidata_tutorial

and that there is a health-information workshop in the works as well
http://wikiconferenceusa.org/wiki/Submissions:2015/Wikipedia_for_Health_Research_and_Data

None of us has ever been to one of these.  Thoughts on how we can make a
meaningful contribution?

thanks
-Ben
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] wikidata workflow in Wikipedia

2015-07-22 Thread Benjamin Good
I've noticed that a number of EN Wikipedia articles are starting to use
wikidata e.g. all of these templates:
https://en.wikipedia.org/wiki/Category:Templates_using_data_from_Wikidata

This is great, and I have already started trying to encourage more of this
in the areas we are working on (genes, drugs, diseases).  But concerns
raised more than 2 years ago now still seem very valid:
https://en.wikipedia.org/wiki/Wikipedia:Wikidata/Workflow

I'm particularly concerned about the edit button.  When a Wikipedia user
hits edit on an article built using wikidata content, how are they supposed
to edit the wikidata content?  Obviously the well-informed can sort out
which wikidata item is relevant and which properties are being used and go
to edit at wikidata.. but this can't be the solution.

What is the status of this issue ?

thanks very much
-Ben
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Freebase is dead, long live :BaseKB

2015-07-17 Thread Benjamin Good
They wrote a really insightful paper about how their processes for
large-scale data curation worked.  Among may other things, they
investigated mechanical turk 'micro tasks' versus hourly workers and
generally found the latter to be more cost effective.

"The Anatomy of a Large-Scale Human Computation Engine"
http://wiki.freebase.com/images/e/e0/Hcomp10-anatomy.pdf

(I have the PDF in case you want it after that link expires..)

-Ben

p.s. As side note I tend to agree with the camps on this list that think it
would be an enormous waste if the work that went into the content in
freebase was not leveraged effectively for wikidata.  Its not easy to raise
millions of dollars for data curation..



On Fri, Jul 17, 2015 at 7:56 AM, Paul Houle  wrote:

> I know Freebase used oDesk.  Note the number in question is  3000
> judgements per person per day I've run tasks on Mechanical Turk and also I
> make my own judgement sets for various things and I'd agree with that rate;
>  that comes to 9.6 seconds per judgement which I can believe.  If you are
> that fast you can make a living of it and never have to get out of your
> pyjamas,  but as a manager you have to do something about people who do
> huge amounts of fast but barely acceptable work.
>
>   Note that they had $57M of funding
>
> https://www.crunchbase.com/organization/metawebtechnologies
>
> and if the fully loaded cost of those FTE equivalents was $50,000 via
> oDesk,  it would cost $5 M to get 100 million facts processed.  So
> practically they could have got a lot done.  Metaweb and oDesk had
> interlocking directorates
>
>
> https://www.crunchbase.com/organization/metawebtechnologies/insights/current-people/odesk
>
> so they probably had a great relationship with oDesk,  which would have
> helped.
>
> Dealing with "turks" I would estimate that I'd ask each question somewhere
> between 2 and 3 times on the average to catch most of the errors and
> ambiguous cases and also get an estimate of how many I didn't catch.
>
>
>
> On Thu, Jul 16, 2015 at 8:23 PM, Eric Sun  wrote:
>
>> >> Note that Freebase did a lot of human curation and we know they could
>> get
>> about 3000 verifications  of facts by "non-experts" a day who were paid
>> for
>> their efforts.  That scales out to almost a million facts per FTE per
>> year.
>>
>>
>> Where can I found out more about how they were able to do such
>> high-volume human curation?  3000/day is a huge number.
>>
>>
>>
>> On Thu, Jul 16, 2015 at 5:01 AM, 
>> wrote:
>>
>>> Date: Wed, 15 Jul 2015 15:25:27 -0400
>>> From: Paul Houle 
>>> To: "Discussion list for the Wikidata project."
>>> 
>>> Subject: [Wikidata] Freebase is dead, long live :BaseKB
>>> Message-ID:
>>> <
>>> cae__kdqt55e7k7xhmeubcu9qrwrkomu_60nduygcthnkc7d...@mail.gmail.com>
>>> Content-Type: text/plain; charset="utf-8"
>>>
>>> For those who are interested in the project of getting something out of
>>> Freebase for use in Wikidata or somewhere else,  I'd like to point out
>>>
>>> http://basekb.com/gold/
>>>
>>> this a completely workable solution for  running queries out of Freebase
>>> after the MQL API goes dark.
>>>
>>> I have been watching the discussion about the trouble moving Freebase
>>> data
>>> to Wikidata and let me share some thoughts.
>>>
>>> First quality is in the eye of the beholder and if somebody defines that
>>> quality is a matter of citing your sources,  than that is their
>>> definition
>>> of 'quality' and they can attain it.  You might have some other
>>> definition
>>> of quality and be appalled that Wikidata has so little to say about a
>>> topic
>>> that has caused much controversy and suffering:
>>>
>>> https://www.wikidata.org/wiki/Q284451
>>>
>>> there are ways to attain that too.
>>>
>>> Part of the answer is that different products are going to be used in
>>> different places.  For instance,  one person might need 100% coverage of
>>> books he wants to talk about,  another one might want a really great
>>> database of ski areas,  etc.
>>>
>>> Note that Freebase did a lot of human curation and we know they could get
>>> about 3000 verifications  of facts by "non-experts" a day who were paid
>>> for
>>> their efforts.  That scales out to almost a million facts per FTE per
>>> year.
>>>
>>>
>>>
>>> --
>>> Paul Houle
>>>
>>> *Applying Schemas for Natural Language Processing, Distributed Systems,
>>> Classification and Text Mining and Data Lakes*
>>>
>>> (607) 539 6254paul.houle on Skype   ontolo...@gmail.com
>>> https://legalentityidentifier.info/lei/lookup/
>>> 
>>> -- next part --
>>> An HTML attachment was scrubbed...
>>> URL: <
>>> https://lists.wikimedia.org/pipermail/wikidata/attachments/20150715/9f8b37dc/attachment-0001.html
>>> >
>>>
>>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
>
> --
> Paul Houle
>

Re: [Wikidata] Goal: Establish a framework to engage with data engineers and open data organizations

2015-07-01 Thread Benjamin Good
Quim,

I'm not familiar with GLAM or what you are really asking for here.  Could
you elaborate a little?  Our group is actively engaged in writing bots for
populating wikidata with trusted biomedical information and for using that
information to drive applications such as Wikipedia.  Processes for making
this easier would be most welcome.  A lot of what we are doing and hoping
to do is described on this bot page:
https://www.wikidata.org/wiki/User:ProteinBoxBot

?
-Ben



On Wed, Jul 1, 2015 at 9:23 AM, David Cuenca Tudela 
wrote:

> Hello Quim,
>
> There was always the issue of where to publish datasets from partner
> organisations like a http://datahub.io/
>
> Is that being considered in this new iteration?
>
> Cheers,
> Micru
>
> On Wed, Jul 1, 2015 at 6:19 PM, Romaine Wiki 
> wrote:
>
>> Hello Quim,
>>
>> We have in Belgium (as Wikimedia Belgium) a partner organisation who is
>> together with us working with cultural institutions to get open datasets to
>> be used in Wikidata.
>>
>> So yes, we are interested.
>>
>> Greetings,
>> Romaine
>>
>> 2015-07-01 17:31 GMT+02:00 Quim Gil :
>>
>>> Hi, it's first of July and I would like to introduce you a quarterly
>>> goal that the Engineering Community team has committed to:
>>>
>>> Establish a framework to engage with data engineers and open data
>>> organizations
>>> https://phabricator.wikimedia.org/T101950
>>>
>>> We are missing a community framework allowing Wikidata content and tech
>>> contributors, data engineers, and open data organizations to collaborate
>>> effectively. Imagine GLAM applied to data.
>>>
>>> If all goes well, by the end of September we would like to have basic
>>> documentation and community processes for open data engineers and
>>> organizations willing to contribute to Wikidata, and ongoing projects
>>> with one open data org.
>>>
>>> If you are interested, get involved! We are looking for
>>>
>>> * Wikidata contributors with good institutional memory
>>> * people that has been in touch with organizations willing to contribute
>>> their open data
>>> * developers willing to help improving our software and programming
>>> missing pieces
>>> * also contributors familiar with the GLAM model(s), what works and what
>>> didn't work
>>>
>>> This goal has been created after some conversations with Lydia Pintscher
>>> (Wikidata team) and Sylvia Ventura (Strategic Partnerships). Both are on
>>> board, Lydia assuring that this work fits into what is technically
>>> effective, and Sylvia checking our work against real open data
>>> organizations willing to get involved.
>>>
>>> This email effectively starts the bootstrapping of this project. I will
>>> start creating subtasks under that goal based on your feedback and common
>>> sense.
>>>
>>> --
>>> Quim Gil
>>> Engineering Community Manager @ Wikimedia Foundation
>>> http://www.mediawiki.org/wiki/User:Qgil
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
>
> --
> Etiamsi omnes, ego non
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Lists of things as entities in Wikidata

2015-06-15 Thread Benjamin Good
I think this is clearly an evolutionary process.  In the short term,
wikidata needs to support Wikipedia use cases as Andrew mentioned above
(thank you for the clarification).  In the long term, this function and all
other functions will (in my opinion) best be served by a transition into
more and more of an entity graph where claims are made about things in the
world rather than about constructs in a database.  Perhaps there is some
form of the WikiData game that could be generated to support this process
for lists.

The intervening period is going to be a challenge in terms of modeling and
in application-level hiding of weird ontological situations where objects
are being described like (item1: instanceOf, WikipediaList AND item1:
subclassOf moons of jupiter), but there is no way around it.  And its 100%
worthwhile to do whatever it takes to keep things integrated with Wikipedia
and to further establish wikidata as indispensable there.

-Ben






On Mon, Jun 15, 2015 at 9:11 AM, Thad Guidry  wrote:

> In General,
>
> I think Wikidata needs to decide going forward if it will be a strict
> Entity Graph...or if it will be a Big Graph of all things Wikipedia.
> Its an important question...if it decides on the latter...then just give a
> way to filter out non-entities for the API and Search users.
>
>
> Thad
> +ThadGuidry <https://www.google.com/+ThadGuidry>
>
> On Mon, Jun 15, 2015 at 11:07 AM, Thad Guidry 
> wrote:
>
>> Benjamin has the right idea... and we did similar in Freebase in handling
>> that same way... sometimes it was a manual labor of love... most of the
>> time, we just deleted them and hoped that Wikipedia would make them real
>> topic entities later on for us to properly absorb.
>>
>> How Wikidata decided to handle, I don't care...if you keep them around,
>> then just give users a way to filter them out in your API's is all that I
>> ask. :)
>>
>>
>> Thad
>> +ThadGuidry <https://www.google.com/+ThadGuidry>
>>
>> On Mon, Jun 15, 2015 at 10:53 AM, Benjamin Good > > wrote:
>>
>>> This is an important question.  There are apparently 196,839 known list
>>> items based on a query for instanceOf Wikipedia list item
>>> (CLAIM[31:13406463])
>>>
>>> http://tools.wmflabs.org/autolist/autolist1.html?q=CLAIM%5B31%3A13406463%5D
>>>
>>> I tend to agree with Thad that these kinds of items aren't really what
>>> we want filling in WikiData.  In fact replacing them with the ability to
>>> generate them automatically based on queries is a primary use case for
>>> wikidata.  But just deleting them doesn't entirely make sense either
>>> because they are key signposts into things that ought to be brought into
>>> wikidata properly.  The items in these lists clearly matter..
>>>
>>> Ideally we could generate a bot that would examine each of these lists
>>> and identify the unifying properties that should be added to the items
>>> within the list that would enable the list to be reproduced by a query.
>>>
>>> I disagree that this reasoning suggests deleting items about categories
>>> and disambiguation pages. - both of these clearly have functions in
>>> wikidata.  I'm not sure what the function of a list entity is.
>>>
>>>
>>> On Mon, Jun 15, 2015 at 8:47 AM, Federico Leva (Nemo) <
>>> nemow...@gmail.com> wrote:
>>>
>>>> By this reasoning we should also delete items about categories or
>>>> disambiguation pages.
>>>>
>>>> Thad Guidry, 15/06/2015 17:21:
>>>>
>>>>> Ex. List of tallest buildings in Wuhan -
>>>>> https://www.wikidata.org/wiki/Q6642364
>>>>>
>>>>
>>>> What's the issue here? The item doesn't actually contain any list,
>>>> there is no duplication or information "clumped together".
>>>>
>>>> Nemo
>>>>
>>>>
>>>> ___
>>>> Wikidata mailing list
>>>> Wikidata@lists.wikimedia.org
>>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>>
>>>
>>>
>>> ___
>>> Wikidata mailing list
>>> Wikidata@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>>
>>>
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Lists of things as entities in Wikidata

2015-06-15 Thread Benjamin Good
This is an important question.  There are apparently 196,839 known list
items based on a query for instanceOf Wikipedia list item
(CLAIM[31:13406463])
http://tools.wmflabs.org/autolist/autolist1.html?q=CLAIM%5B31%3A13406463%5D

I tend to agree with Thad that these kinds of items aren't really what we
want filling in WikiData.  In fact replacing them with the ability to
generate them automatically based on queries is a primary use case for
wikidata.  But just deleting them doesn't entirely make sense either
because they are key signposts into things that ought to be brought into
wikidata properly.  The items in these lists clearly matter..

Ideally we could generate a bot that would examine each of these lists and
identify the unifying properties that should be added to the items within
the list that would enable the list to be reproduced by a query.

I disagree that this reasoning suggests deleting items about categories and
disambiguation pages. - both of these clearly have functions in wikidata.
I'm not sure what the function of a list entity is.


On Mon, Jun 15, 2015 at 8:47 AM, Federico Leva (Nemo) 
wrote:

> By this reasoning we should also delete items about categories or
> disambiguation pages.
>
> Thad Guidry, 15/06/2015 17:21:
>
>> Ex. List of tallest buildings in Wuhan -
>> https://www.wikidata.org/wiki/Q6642364
>>
>
> What's the issue here? The item doesn't actually contain any list, there
> is no duplication or information "clumped together".
>
> Nemo
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] accessing data from a wikidata concept page

2015-06-11 Thread Benjamin Good
Lydia,

Showing the link to the concept URI on the client intended for human
browsers is a little confusing.  I tried that link, but the content
negotiation fooled me into thinking it was just a redirect and lead me
astray.

An example that I have found very valuable in my work is UniProt - one of
the earliest adopters of semantic web technology in the life sciences. See
for example,

http://www.uniprot.org/uniprot/P15692
and click on 'Format' in the middle of the page there.  Provides both human
and various computational representations of the data.

At one point in time, I recall that freebase added an 'RDF' link to each of
their pages.  Obviously they aren't the model to follow for everything...
 but I remember that being a well-received step forward.

-Ben

p.s. Having very visible links to PDF versions of these pages but not
structured data just seems wrong . ;)








On Thu, Jun 11, 2015 at 6:47 AM, Markus Krötzsch <
mar...@semantic-mediawiki.org> wrote:

> On 11.06.2015 15:06, Lydia Pintscher wrote:
>
>> On Thu, Jun 11, 2015 at 9:08 AM, Markus Krötzsch
>>  wrote:
>>
>>> Hi Ben,
>>>
>>> That's a very good point. We should really have direct links to JSON and
>>> RDF
>>> for each item (e.g., at the top-right, which seems to be the custom now
>>> on
>>> many web sites). We don't have an XML export (unless you count RDF/XML).
>>>
>>
>> Do you have links to some examples how other websites do this?
>>
>
> Here are two examples:
> * http://www.deutsche-biographie.de/sfz99.html
> * http://d-nb.info/gnd/1026312019 (that's more to the right than to the
> top)
>
> Wikipedia already has links in the upper right of its pages to map
> services. Somewhere similar might be intuitive. Of course, this type of
> prominent link only makes sense for sites that are primarily data
> repositories. BBC Music, for example, does not have an RDF download link on
> every page, although they provide RDF.
>
> Markus
>
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] accessing data from a wikidata concept page

2015-06-10 Thread Benjamin Good
I recently introduced wikidata to a (very computationally savvy) colleague
by sending him this link:
https://www.wikidata.org/wiki/Q423111

His response is indicative of an interface problem that I think is actually
very important:
"Is there a simple way to get the RDF for a given concept?  The page seems
to only present the english names for the concept and its linked concepts."

Leaving aside RDF, it is really not straightforward for newcomers to get
from a concept page like that to the corresponding structured data.  This
could be solved with the consistent addition of a simple link like "view
json/xml/rdf" to each of the concept pages on wikidata.  They would just be
links to the API calls: e.g.
http://www.wikidata.org/w/api.php?action=wbgetentities&ids=Q423111 in this
case.

As the concept pages themselves get tossed around a lot, such an addition
could be extremely valuable in teaching the uninitiated what its all about
and would come at very little cost - to me, this button is akin to the
'view source' action on web pages - an absolutely fundamental part of how
the web grows - even now.

-Ben
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata