Re: [Wikidata] [ANNOUNCEMENT] first StrepHit dataset for the primary sources tool

2015-09-09 Thread Marco Fossati

Hi Markus, everyone,

The project proposal is currently in active development.
I would like to focus now on the dissemination of the idea and the 
engagement of the Wikidata community.

Hence, I would love to gather feedback on the following question:

Does StrepHit sounds interesting and useful for you?

It would be great if you could report your thoughts on the project talk 
page:

https://meta.wikimedia.org/wiki/Grants_talk:IEG/StrepHit:_Wikidata_Statements_Validation_via_References

Cheers!

On 9/8/15 2:02 PM, wikidata-requ...@lists.wikimedia.org wrote:

Date: Mon, 07 Sep 2015 16:47:16 +0200
From: Markus Krötzsch
To: "Discussion list for the Wikidata project."

Subject: Re: [Wikidata] [ANNOUNCEMENT] first StrepHit dataset for the
    primary sources tool
Message-ID:<55eda374.2090...@semantic-mediawiki.org>
Content-Type: text/plain; charset=utf-8; format=flowed

Dear Marco,

Sounds interesting, but the project page still has a lot of gaps. Will
you notify us again when you are done? It is a bit tricky to endorse a
proposal that is not finished yet;-)

Markus

On 04.09.2015 17:01, Marco Fossati wrote:

>[Begging pardon if you have already read this in the Wikidata project chat]
>
>Hi everyone,
>
>As Wikidatans, we all know how much data quality matters.
>We all know what high quality stands for: statements need to be
>validated via references to external, non-wiki, sources.
>
>That's why the primary sources tool is being developed:
>https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool
>And that's why I am preparing the StrepHit IEG proposal:
>https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References
>
>
>StrepHit (pronounced "strep hit", means "Statement? repherence it!") is
>a Natural Language Processing pipeline that understands human language,
>extracts structured data from raw text and produces Wikidata statements
>with reference URLs.
>
>As a demonstration to support the IEG proposal, you can find the
>**FBK-strephit-soccer** dataset uploaded to the primary sources tool
>backend.
>It's a small dataset serving the soccer domain use case.
>Please follow the instructions on the project page to activate it and
>start playing with the data.
>
>What is the biggest difference that sets StrepHit datasets apart from
>the currently uploaded ones?
>At least one reference URL is always guaranteed for each statement.
>This means that if StrepHit finds some new statement that was not there
>in Wikidata before, it will always propose its external references.
>We do not want to manually reject all the new statements with no
>reference, right?
>
>If you like the idea, please endorse the StrepHit IEG proposal!


--
Marco Fossati
http://about.me/marco.fossati
Twitter: @hjfocs
Skype: hell_j

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] [ANNOUNCEMENT] first StrepHit dataset for the primary sources tool

2015-09-07 Thread Markus Krötzsch

Dear Marco,

Sounds interesting, but the project page still has a lot of gaps. Will 
you notify us again when you are done? It is a bit tricky to endorse a 
proposal that is not finished yet ;-)


Markus

On 04.09.2015 17:01, Marco Fossati wrote:

[Begging pardon if you have already read this in the Wikidata project chat]

Hi everyone,

As Wikidatans, we all know how much data quality matters.
We all know what high quality stands for: statements need to be
validated via references to external, non-wiki, sources.

That's why the primary sources tool is being developed:
https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool
And that's why I am preparing the StrepHit IEG proposal:
https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References


StrepHit (pronounced "strep hit", means "Statement? repherence it!") is
a Natural Language Processing pipeline that understands human language,
extracts structured data from raw text and produces Wikidata statements
with reference URLs.

As a demonstration to support the IEG proposal, you can find the
**FBK-strephit-soccer** dataset uploaded to the primary sources tool
backend.
It's a small dataset serving the soccer domain use case.
Please follow the instructions on the project page to activate it and
start playing with the data.

What is the biggest difference that sets StrepHit datasets apart from
the currently uploaded ones?
At least one reference URL is always guaranteed for each statement.
This means that if StrepHit finds some new statement that was not there
in Wikidata before, it will always propose its external references.
We do not want to manually reject all the new statements with no
reference, right?

If you like the idea, please endorse the StrepHit IEG proposal!

Cheers,



___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] [ANNOUNCEMENT] first StrepHit dataset for the primary sources tool

2015-09-05 Thread Marco Fossati

Hi Gerard,

Let me add a further reply to your comment.

On 9/5/15 2:01 PM, wikidata-requ...@lists.wikimedia.org wrote:

Message: 3
Date: Fri, 4 Sep 2015 19:26:38 +0200
From: Gerard Meijssen

No.
Quality is not determined by sources. Sources do lie.

When you want quality, you seek sources where they matter most. It is not
by going for "all" of them
I completely agree with you that many sources can be flawed. I may have 
neglected the term "trustworthy" before "sources" and added it in the 
Wikidata project chat.
The IEG proposal will also include an investigation phase to select a 
set of authoritative sources, see the first task in the proposal work 
package:

https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References#Work_Package

I'll expand on this.

Cheers,
--
Marco Fossati
http://about.me/marco.fossati
Twitter: @hjfocs
Skype: hell_j

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] [ANNOUNCEMENT] first StrepHit dataset for the primary sources tool

2015-09-05 Thread Marco Fossati

Dear all,

On 9/5/15 2:01 PM, wikidata-requ...@lists.wikimedia.org wrote:

Message: 3
Date: Fri, 4 Sep 2015 19:26:38 +0200
From: Gerard Meijssen

Quality is not determined by sources. Sources do lie.

When you want quality, you seek sources where they matter most.

Thanks @Gerard for your criticism, let me reply to your concerns.
The following references contrast your points. I got inspired by them 
when developing the idea:


https://www.wikidata.org/wiki/Wikidata:Referencing_improvements_input
http://blog.wikimedia.de/2015/01/03/scaling-wikidata-success-means-making-the-pie-bigger/
https://tools.wmflabs.org/wikidata-todo/sourcery.html
https://phabricator.wikimedia.org/T76230
https://phabricator.wikimedia.org/T76232
https://phabricator.wikimedia.org/T76231
https://phabricator.wikimedia.org/T90881


Message: 4
Date: Fri, 4 Sep 2015 19:34:22 +0200
From: Lydia Pintscher

Thank you for working on this, Marco. This is a great step forward. I
wish you good luck for the IEG proposal!

Thanks @Lydia for your encouragement!

Cheers,
--
Marco Fossati
http://about.me/marco.fossati
Twitter: @hjfocs
Skype: hell_j

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] [ANNOUNCEMENT] first StrepHit dataset for the primary sources tool

2015-09-04 Thread Lydia Pintscher
On Fri, Sep 4, 2015 at 5:01 PM, Marco Fossati  wrote:
> [Begging pardon if you have already read this in the Wikidata project chat]
>
> Hi everyone,
>
> As Wikidatans, we all know how much data quality matters.
> We all know what high quality stands for: statements need to be validated
> via references to external, non-wiki, sources.
>
> That's why the primary sources tool is being developed:
> https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool
> And that's why I am preparing the StrepHit IEG proposal:
> https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References
>
> StrepHit (pronounced "strep hit", means "Statement? repherence it!") is a
> Natural Language Processing pipeline that understands human language,
> extracts structured data from raw text and produces Wikidata statements with
> reference URLs.
>
> As a demonstration to support the IEG proposal, you can find the
> **FBK-strephit-soccer** dataset uploaded to the primary sources tool
> backend.
> It's a small dataset serving the soccer domain use case.
> Please follow the instructions on the project page to activate it and start
> playing with the data.
>
> What is the biggest difference that sets StrepHit datasets apart from the
> currently uploaded ones?
> At least one reference URL is always guaranteed for each statement.
> This means that if StrepHit finds some new statement that was not there in
> Wikidata before, it will always propose its external references.
> We do not want to manually reject all the new statements with no reference,
> right?
>
> If you like the idea, please endorse the StrepHit IEG proposal!


Thank you for working on this, Marco. This is a great step forward. I
wish you good luck for the IEG proposal!


Cheers
Lydia

-- 
Lydia Pintscher - http://about.me/lydia.pintscher
Product Manager for Wikidata

Wikimedia Deutschland e.V.
Tempelhofer Ufer 23-24
10963 Berlin
www.wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.

Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] [ANNOUNCEMENT] first StrepHit dataset for the primary sources tool

2015-09-04 Thread Gerard Meijssen
Hoi,
The danger of blanket statements is that they are often easy to refute. No.
Quality is not determined by sources. Sources do lie.

When you want quality, you seek sources where they matter most. It is not
by going for "all" of them, it is where Wikidata differs from other sources.

Arguably and I do make that argument. Wikidata is so much underdeveloped in
the statement department that having more data with a reasonable
expectation of quality will trump quality for a much smaller dataset.
Thanks,
  GerardM

On 4 September 2015 at 17:01, Marco Fossati  wrote:

> [Begging pardon if you have already read this in the Wikidata project chat]
>
> Hi everyone,
>
> As Wikidatans, we all know how much data quality matters.
> We all know what high quality stands for: statements need to be validated
> via references to external, non-wiki, sources.
>
> That's why the primary sources tool is being developed:
> https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool
> And that's why I am preparing the StrepHit IEG proposal:
>
> https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References
>
> StrepHit (pronounced "strep hit", means "Statement? repherence it!") is a
> Natural Language Processing pipeline that understands human language,
> extracts structured data from raw text and produces Wikidata statements
> with reference URLs.
>
> As a demonstration to support the IEG proposal, you can find the
> **FBK-strephit-soccer** dataset uploaded to the primary sources tool
> backend.
> It's a small dataset serving the soccer domain use case.
> Please follow the instructions on the project page to activate it and
> start playing with the data.
>
> What is the biggest difference that sets StrepHit datasets apart from the
> currently uploaded ones?
> At least one reference URL is always guaranteed for each statement.
> This means that if StrepHit finds some new statement that was not there in
> Wikidata before, it will always propose its external references.
> We do not want to manually reject all the new statements with no
> reference, right?
>
> If you like the idea, please endorse the StrepHit IEG proposal!
>
> Cheers,
> --
> Marco Fossati
> http://about.me/marco.fossati
> Twitter: @hjfocs
> Skype: hell_j
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] [ANNOUNCEMENT] first StrepHit dataset for the primary sources tool

2015-09-04 Thread Marco Fossati

[Begging pardon if you have already read this in the Wikidata project chat]

Hi everyone,

As Wikidatans, we all know how much data quality matters.
We all know what high quality stands for: statements need to be 
validated via references to external, non-wiki, sources.


That's why the primary sources tool is being developed:
https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool
And that's why I am preparing the StrepHit IEG proposal:
https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References

StrepHit (pronounced "strep hit", means "Statement? repherence it!") is 
a Natural Language Processing pipeline that understands human language, 
extracts structured data from raw text and produces Wikidata statements 
with reference URLs.


As a demonstration to support the IEG proposal, you can find the 
**FBK-strephit-soccer** dataset uploaded to the primary sources tool 
backend.

It's a small dataset serving the soccer domain use case.
Please follow the instructions on the project page to activate it and 
start playing with the data.


What is the biggest difference that sets StrepHit datasets apart from 
the currently uploaded ones?

At least one reference URL is always guaranteed for each statement.
This means that if StrepHit finds some new statement that was not there 
in Wikidata before, it will always propose its external references.
We do not want to manually reject all the new statements with no 
reference, right?


If you like the idea, please endorse the StrepHit IEG proposal!

Cheers,
--
Marco Fossati
http://about.me/marco.fossati
Twitter: @hjfocs
Skype: hell_j

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata