Re: [Wikidata] [ANNOUNCEMENT] first StrepHit dataset for the primary sources tool
Hi Markus, everyone, The project proposal is currently in active development. I would like to focus now on the dissemination of the idea and the engagement of the Wikidata community. Hence, I would love to gather feedback on the following question: Does StrepHit sounds interesting and useful for you? It would be great if you could report your thoughts on the project talk page: https://meta.wikimedia.org/wiki/Grants_talk:IEG/StrepHit:_Wikidata_Statements_Validation_via_References Cheers! On 9/8/15 2:02 PM, wikidata-requ...@lists.wikimedia.org wrote: Date: Mon, 07 Sep 2015 16:47:16 +0200 From: Markus Krötzsch To: "Discussion list for the Wikidata project." Subject: Re: [Wikidata] [ANNOUNCEMENT] first StrepHit dataset for the primary sources tool Message-ID:<55eda374.2090...@semantic-mediawiki.org> Content-Type: text/plain; charset=utf-8; format=flowed Dear Marco, Sounds interesting, but the project page still has a lot of gaps. Will you notify us again when you are done? It is a bit tricky to endorse a proposal that is not finished yet;-) Markus On 04.09.2015 17:01, Marco Fossati wrote: >[Begging pardon if you have already read this in the Wikidata project chat] > >Hi everyone, > >As Wikidatans, we all know how much data quality matters. >We all know what high quality stands for: statements need to be >validated via references to external, non-wiki, sources. > >That's why the primary sources tool is being developed: >https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool >And that's why I am preparing the StrepHit IEG proposal: >https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References > > >StrepHit (pronounced "strep hit", means "Statement? repherence it!") is >a Natural Language Processing pipeline that understands human language, >extracts structured data from raw text and produces Wikidata statements >with reference URLs. > >As a demonstration to support the IEG proposal, you can find the >**FBK-strephit-soccer** dataset uploaded to the primary sources tool >backend. >It's a small dataset serving the soccer domain use case. >Please follow the instructions on the project page to activate it and >start playing with the data. > >What is the biggest difference that sets StrepHit datasets apart from >the currently uploaded ones? >At least one reference URL is always guaranteed for each statement. >This means that if StrepHit finds some new statement that was not there >in Wikidata before, it will always propose its external references. >We do not want to manually reject all the new statements with no >reference, right? > >If you like the idea, please endorse the StrepHit IEG proposal! -- Marco Fossati http://about.me/marco.fossati Twitter: @hjfocs Skype: hell_j ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] [ANNOUNCEMENT] first StrepHit dataset for the primary sources tool
Dear Marco, Sounds interesting, but the project page still has a lot of gaps. Will you notify us again when you are done? It is a bit tricky to endorse a proposal that is not finished yet ;-) Markus On 04.09.2015 17:01, Marco Fossati wrote: [Begging pardon if you have already read this in the Wikidata project chat] Hi everyone, As Wikidatans, we all know how much data quality matters. We all know what high quality stands for: statements need to be validated via references to external, non-wiki, sources. That's why the primary sources tool is being developed: https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool And that's why I am preparing the StrepHit IEG proposal: https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References StrepHit (pronounced "strep hit", means "Statement? repherence it!") is a Natural Language Processing pipeline that understands human language, extracts structured data from raw text and produces Wikidata statements with reference URLs. As a demonstration to support the IEG proposal, you can find the **FBK-strephit-soccer** dataset uploaded to the primary sources tool backend. It's a small dataset serving the soccer domain use case. Please follow the instructions on the project page to activate it and start playing with the data. What is the biggest difference that sets StrepHit datasets apart from the currently uploaded ones? At least one reference URL is always guaranteed for each statement. This means that if StrepHit finds some new statement that was not there in Wikidata before, it will always propose its external references. We do not want to manually reject all the new statements with no reference, right? If you like the idea, please endorse the StrepHit IEG proposal! Cheers, ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] [ANNOUNCEMENT] first StrepHit dataset for the primary sources tool
Hi Gerard, Let me add a further reply to your comment. On 9/5/15 2:01 PM, wikidata-requ...@lists.wikimedia.org wrote: Message: 3 Date: Fri, 4 Sep 2015 19:26:38 +0200 From: Gerard Meijssen No. Quality is not determined by sources. Sources do lie. When you want quality, you seek sources where they matter most. It is not by going for "all" of them I completely agree with you that many sources can be flawed. I may have neglected the term "trustworthy" before "sources" and added it in the Wikidata project chat. The IEG proposal will also include an investigation phase to select a set of authoritative sources, see the first task in the proposal work package: https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References#Work_Package I'll expand on this. Cheers, -- Marco Fossati http://about.me/marco.fossati Twitter: @hjfocs Skype: hell_j ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] [ANNOUNCEMENT] first StrepHit dataset for the primary sources tool
Dear all, On 9/5/15 2:01 PM, wikidata-requ...@lists.wikimedia.org wrote: Message: 3 Date: Fri, 4 Sep 2015 19:26:38 +0200 From: Gerard Meijssen Quality is not determined by sources. Sources do lie. When you want quality, you seek sources where they matter most. Thanks @Gerard for your criticism, let me reply to your concerns. The following references contrast your points. I got inspired by them when developing the idea: https://www.wikidata.org/wiki/Wikidata:Referencing_improvements_input http://blog.wikimedia.de/2015/01/03/scaling-wikidata-success-means-making-the-pie-bigger/ https://tools.wmflabs.org/wikidata-todo/sourcery.html https://phabricator.wikimedia.org/T76230 https://phabricator.wikimedia.org/T76232 https://phabricator.wikimedia.org/T76231 https://phabricator.wikimedia.org/T90881 Message: 4 Date: Fri, 4 Sep 2015 19:34:22 +0200 From: Lydia Pintscher Thank you for working on this, Marco. This is a great step forward. I wish you good luck for the IEG proposal! Thanks @Lydia for your encouragement! Cheers, -- Marco Fossati http://about.me/marco.fossati Twitter: @hjfocs Skype: hell_j ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] [ANNOUNCEMENT] first StrepHit dataset for the primary sources tool
On Fri, Sep 4, 2015 at 5:01 PM, Marco Fossati wrote: > [Begging pardon if you have already read this in the Wikidata project chat] > > Hi everyone, > > As Wikidatans, we all know how much data quality matters. > We all know what high quality stands for: statements need to be validated > via references to external, non-wiki, sources. > > That's why the primary sources tool is being developed: > https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool > And that's why I am preparing the StrepHit IEG proposal: > https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References > > StrepHit (pronounced "strep hit", means "Statement? repherence it!") is a > Natural Language Processing pipeline that understands human language, > extracts structured data from raw text and produces Wikidata statements with > reference URLs. > > As a demonstration to support the IEG proposal, you can find the > **FBK-strephit-soccer** dataset uploaded to the primary sources tool > backend. > It's a small dataset serving the soccer domain use case. > Please follow the instructions on the project page to activate it and start > playing with the data. > > What is the biggest difference that sets StrepHit datasets apart from the > currently uploaded ones? > At least one reference URL is always guaranteed for each statement. > This means that if StrepHit finds some new statement that was not there in > Wikidata before, it will always propose its external references. > We do not want to manually reject all the new statements with no reference, > right? > > If you like the idea, please endorse the StrepHit IEG proposal! Thank you for working on this, Marco. This is a great step forward. I wish you good luck for the IEG proposal! Cheers Lydia -- Lydia Pintscher - http://about.me/lydia.pintscher Product Manager for Wikidata Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 10963 Berlin www.wikimedia.de Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V. Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter der Nummer 23855 Nz. Als gemeinnützig anerkannt durch das Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985. ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
Re: [Wikidata] [ANNOUNCEMENT] first StrepHit dataset for the primary sources tool
Hoi, The danger of blanket statements is that they are often easy to refute. No. Quality is not determined by sources. Sources do lie. When you want quality, you seek sources where they matter most. It is not by going for "all" of them, it is where Wikidata differs from other sources. Arguably and I do make that argument. Wikidata is so much underdeveloped in the statement department that having more data with a reasonable expectation of quality will trump quality for a much smaller dataset. Thanks, GerardM On 4 September 2015 at 17:01, Marco Fossati wrote: > [Begging pardon if you have already read this in the Wikidata project chat] > > Hi everyone, > > As Wikidatans, we all know how much data quality matters. > We all know what high quality stands for: statements need to be validated > via references to external, non-wiki, sources. > > That's why the primary sources tool is being developed: > https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool > And that's why I am preparing the StrepHit IEG proposal: > > https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References > > StrepHit (pronounced "strep hit", means "Statement? repherence it!") is a > Natural Language Processing pipeline that understands human language, > extracts structured data from raw text and produces Wikidata statements > with reference URLs. > > As a demonstration to support the IEG proposal, you can find the > **FBK-strephit-soccer** dataset uploaded to the primary sources tool > backend. > It's a small dataset serving the soccer domain use case. > Please follow the instructions on the project page to activate it and > start playing with the data. > > What is the biggest difference that sets StrepHit datasets apart from the > currently uploaded ones? > At least one reference URL is always guaranteed for each statement. > This means that if StrepHit finds some new statement that was not there in > Wikidata before, it will always propose its external references. > We do not want to manually reject all the new statements with no > reference, right? > > If you like the idea, please endorse the StrepHit IEG proposal! > > Cheers, > -- > Marco Fossati > http://about.me/marco.fossati > Twitter: @hjfocs > Skype: hell_j > > ___ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata > ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata
[Wikidata] [ANNOUNCEMENT] first StrepHit dataset for the primary sources tool
[Begging pardon if you have already read this in the Wikidata project chat] Hi everyone, As Wikidatans, we all know how much data quality matters. We all know what high quality stands for: statements need to be validated via references to external, non-wiki, sources. That's why the primary sources tool is being developed: https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool And that's why I am preparing the StrepHit IEG proposal: https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References StrepHit (pronounced "strep hit", means "Statement? repherence it!") is a Natural Language Processing pipeline that understands human language, extracts structured data from raw text and produces Wikidata statements with reference URLs. As a demonstration to support the IEG proposal, you can find the **FBK-strephit-soccer** dataset uploaded to the primary sources tool backend. It's a small dataset serving the soccer domain use case. Please follow the instructions on the project page to activate it and start playing with the data. What is the biggest difference that sets StrepHit datasets apart from the currently uploaded ones? At least one reference URL is always guaranteed for each statement. This means that if StrepHit finds some new statement that was not there in Wikidata before, it will always propose its external references. We do not want to manually reject all the new statements with no reference, right? If you like the idea, please endorse the StrepHit IEG proposal! Cheers, -- Marco Fossati http://about.me/marco.fossati Twitter: @hjfocs Skype: hell_j ___ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata