[Wikimedia-l] soweego 2 proposal

2020-02-19 Thread Marco Fossati
[You can safely skip this message if you have already seen it in the 
Wikidata mailing list, and pardon for the spam]


Hi everyone,

---
TL;DR: soweego 2 is on its way.
   Here's the Project Grant proposal:

https://meta.wikimedia.org/wiki/Grants:Project/Hjfocs/soweego_2
---

Does the name *soweego* ring you a bell?
It's an artificial intelligence that links Wikidata to large catalogs [1].
It's a close friend of Mix'n'match [2], which mainly caters for small 
catalogs.


The next big step is to check Wikidata content against third-party 
trusted sources.
In a nutshell, we want to enable feedback loops between Wikidatans and 
catalog maintainers.
The ultimate goal is to foster mutual benefits in the open knowledge 
landscape.


I'd be really grateful if you could have a look at the proposal page [3].

Can't wait for your feedback.
Best,

Marco

[1] https://soweego.readthedocs.io/
[2] https://tools.wmflabs.org/mix-n-match/
[3] https://meta.wikimedia.org/wiki/Grants:Project/Hjfocs/soweego_2

___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


[Wikimedia-l] [announcement] soweego 1.0 release

2019-08-06 Thread Marco Fossati
[Please disregard this message if you have already read it in the 
Wikidata mailing list, and apologies for the distraction]


Hi everyone,


TL;DR: soweego version 1 is out!
https://soweego.readthedocs.io/
Like it? Star it!


The soweego team is delighted to announce the release of *version 1* [1]!
If you like it, why don't you click on the Star button?

*soweego* links Wikidata to large catalogs through machine learning.
It partners with Mix'n'match [2], which mainly deals with small catalogs.

The soweego bot [3] is currently uploading *255 k confident* links to 
Wikidata: see it in action [4]!
*126 k medium-confident* links are instead getting into Mix'n'match for 
curation: see the current catalogs [5-13].


The soweego team has also worked hard to address the following community 
requests:
1. sync Wikidata to external catalogs & check them to spot 
inconsistencies in Wikidata;

2. import new catalogs with reasonable effort.

Thinking of the best way to contribute? Try to *import a new catalog* [14].

Best,

Marco

[1] https://soweego.readthedocs.io/
[2] https://tools.wmflabs.org/mix-n-match/
[3] https://www.wikidata.org/wiki/User:Soweego_bot
[4] https://xtools.wmflabs.org/ec/wikidata.org/Soweego%20bot
[5] https://tools.wmflabs.org/mix-n-match/#/catalog/2694
[6] https://tools.wmflabs.org/mix-n-match/#/catalog/2695
[7] https://tools.wmflabs.org/mix-n-match/#/catalog/2696
[8] https://tools.wmflabs.org/mix-n-match/#/catalog/2709
[9] https://tools.wmflabs.org/mix-n-match/#/catalog/2710
[10] https://tools.wmflabs.org/mix-n-match/#/catalog/2711
[11] https://tools.wmflabs.org/mix-n-match/#/catalog/2478
[12] https://tools.wmflabs.org/mix-n-match/#/catalog/2712
[13] https://tools.wmflabs.org/mix-n-match/#/catalog/2713
[14] https://soweego.readthedocs.io/en/latest/new_catalog.html

___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


[Wikimedia-l] Call for support: soweego project

2017-09-24 Thread Marco Fossati
[Begging pardon if you have already read this in more specific
Wikimedia mailing lists]

Hi everyone,

You may be aware of the StrepHit project:
https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References
And of the Wikidata primary sources tool:
https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool

While the StrepHit team is building its next version, I'd like to
invite you to have a look at a new project proposal.
The main goal is to add a high volume of identifiers to Wikidata,
ensuring live maintenance of links.

Do you think that Wikidata should become the central linking hub of
open knowledge?

If so, I'd be really grateful if you could endorse the *soweego* project:
https://meta.wikimedia.org/wiki/Grants:Project/Hjfocs/soweego

Of course, any comment is more than welcome on the discussion page.

Looking forward to your valuable feedback.
Best,

Marco

___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


[Wikimedia-l] StrepHit IEG renewal: call for support

2016-07-24 Thread Marco Fossati

[Begging pardon if you read this multiple times]

Dear all,

Remember StrepHit [1], the Natural Language Processing pipeline that 
extracts

structured data from raw text and produces Wikidata [2] statements with
reference URLs?

StrepHit got funded for 6 months by a Wikimedia IEG [3].
Its datasets are now uploaded to the *primary sources tool* [4, 5]: the
goal of the tool is to provide a standard workflow for data donations to 
Wikidata.


Now, we believe that the primary sources tool *really needs lots of 
improvements*, which we have been collecting in a request for comment [5].

That's why we opened a renewal request for the StrepHit IEG [6].

If you like the idea, please endorse the renewal!

Best regards,
Marco Fossati

[1] https://github.com/Wikidata/StrepHit
[2] https://www.wikidata.org
[3] 
https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References

[4] https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool
[5] https://github.com/Wikidata/primarysources
[6] 
https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References/Renewal


___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

Re: [Wikimedia-l] [Wikidata] [ANNOUNCEMENT] StrepHit 1.0 Beta Release

2016-06-15 Thread Marco Fossati

Hi Ben,

On 6/15/16 18:24, Benjamin Good wrote:

Hi Marco,

Where might we find some statistics on the current accuracy of the
automated claim and reference extractors?  I assume that information
must be in there somewhere, but I had trouble finding it.

The StrepHit pipeline (codebase) is ready, while the project is ongoing.
We are not there yet, and will publish performance values in the final 
report.


This is a very ambitious project covering a very large technical
territory (which I applaud).  It would be great if your results could be
synthesized a bit more clearly so we can understand where the
weak/strong points are and where we might be able to help improve or
make use of what you have done in other domains.

Sure, this will be done in the final report.
Up to now, you can have a look at the midpoint report summary:
https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References/Midpoint#Summary

Best,

Marco


-Ben


On Wed, Jun 15, 2016 at 9:06 AM, Marco Fossati <foss...@spaziodati.eu
<mailto:foss...@spaziodati.eu>> wrote:

[Feel free to blame me if you read this more than once]

To whom it may interest,

Full of delight, I would like to announce the first beta release of
*StrepHit*:

https://github.com/Wikidata/StrepHit

TL;DR: StrepHit is an intelligent reading agent that understands
text and translates it into *referenced* Wikidata statements.
It is a IEG project funded by the Wikimedia Foundation.

Key features:
-Web spiders to harvest a collection of documents (corpus) from
reliable sources
-automatic corpus analysis to understand the most meaningful verbs
-sentences and semi-structured data extraction
-train a machine learning classifier via crowdsourcing
-*supervised and rule-based fact extraction from text*
-Natural Language Processing utilities
-parallel processing

You can find all the details here:

https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References

https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References/Midpoint

If you like it, star it on GitHub!

Best,

Marco

___
Wikidata mailing list
wikid...@lists.wikimedia.org <mailto:wikid...@lists.wikimedia.org>
https://lists.wikimedia.org/mailman/listinfo/wikidata




___
Wikidata mailing list
wikid...@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata



___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

[Wikimedia-l] [ANNOUNCEMENT] StrepHit 1.0 Beta Release

2016-06-15 Thread Marco Fossati

[Feel free to blame me if you read this more than once]

To whom it may interest,

Full of delight, I would like to announce the first beta release of 
*StrepHit*:


https://github.com/Wikidata/StrepHit

TL;DR: StrepHit is an intelligent reading agent that understands text 
and translates it into *referenced* Wikidata statements.

It is a IEG project funded by the Wikimedia Foundation.

Key features:
-Web spiders to harvest a collection of documents (corpus) from reliable 
sources

-automatic corpus analysis to understand the most meaningful verbs
-sentences and semi-structured data extraction
-train a machine learning classifier via crowdsourcing
-*supervised and rule-based fact extraction from text*
-Natural Language Processing utilities
-parallel processing

You can find all the details here:
https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References
https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References/Midpoint

If you like it, star it on GitHub!

Best,

Marco

___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] Funding Citation Hunt

2016-04-26 Thread Marco Fossati
Dear James,

Regarding StrepHit, you have certainly missed the timeline & progress tab,
as well as the midpoint report one:
https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References/Timeline
https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References/Midpoint

you can find all the updates about the project there.
Best,

Marco

Date: Sat, 23 Apr 2016 10:00:43 -0600
From: James Salsman 
To: "wikimedia-l@lists.wikimedia.org"

Subject: Re: [Wikimedia-l] Funding Citation Hunt

What is the status of
https://meta.wikimedia.org/wiki/Research:Wikipedia_Knowledge_Graph_with_DeepDive
and
https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References
?
There have been no updates on either at all this year, that I've been able
to find, even though at least one of them is supposed to be producing
monthly status reports.
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 


Re: [Wikimedia-l] [REMINDER] StrepHit IEG project kick-off seminar

2016-01-15 Thread Marco Fossati

Hi everyone,

the seminar will start in a few minutes.
Cheers,

Marco

On 1/11/16 16:52, Marco Fossati wrote:

Here is the link for the online streaming:
https://youtu.be/uvfd_HmPOrc

Cheers,

Marco

2016-01-11 16:11 GMT+01:00 Marco Fossati <foss...@spaziodati.eu
<mailto:foss...@spaziodati.eu>>:

Dear all,

This is a kind reminder for the upcoming StrepHit IEG project
kick-off seminar.
Schedule: 15 January 2016, 11:00 am

**Important update:** the location has moved to downtown Trento.
**New location:** Aula Grande - Fondazione Bruno Kessler, Via
S.Croce 77, Trento, Italy - http://www.openstreetmap.org/way/67197096

The seminar will be streamed online, a link will be shared as soon
as it is available.

See you in Trento!
Cheers,

Marco

2015-12-23 17:03 GMT+01:00 Marco Fossati <foss...@spaziodati.eu
<mailto:foss...@spaziodati.eu>>:

[Begging pardon if you read this multiple times]

Hi everyone,

I would like to announce with great pleasure the StrepHit IEG
project kick-off seminar.
Of course, you are all invited to attend.

The event will be held in a special day: Wikipedia's birthday!

Below you can find the details.

Schedule: 15 January 2016, 11:00 am, Luigi Stringa Conference Room
Location: Fondazione Bruno Kessler, Via Sommarive 18, Povo,
Trento, Italy - http://www.openstreetmap.org/way/28933739

Abstract: We kick-off StrepHit, a project funded by the
Wikimedia Foundation through the Individual Engagement Grants
program.
StrepHit is a Natural Language Processing pipeline that
understands human language, extracts facts from text and
produces Wikidata statements with reference URLs.
It will enhance the data quality of Wikidata by suggesting
references to validate statements, and will help Wikidata become
the gold-standard hub of the Open Data landscape.

Link:

https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References

    Speaker's bio: Marco Fossati is a researcher with a double
background in Natural Languages and Information Technologies. He
works at the Data and Knowledge Management (DKM) research unit
at Fondazione Bruno Kessler, Trento, Italy. He is member of the
DBpedia Association board of trustees, founder and
representative of its Italian chapter. He has interdisciplinary
skills both in linguistics and in programming. His research
focuses on bridging the gap between Natural Language Processing
techniques and Large Scale Structured Knowledge Bases in order
to drive the Web of Data towards its full potential.

See you in Trento and long live Wikipedia!
Cheers,

Marco





___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

Re: [Wikimedia-l] [REMINDER] StrepHit IEG project kick-off seminar

2016-01-11 Thread Marco Fossati
Here is the link for the online streaming:
https://youtu.be/uvfd_HmPOrc

Cheers,

Marco

2016-01-11 16:11 GMT+01:00 Marco Fossati <foss...@spaziodati.eu>:

> Dear all,
>
> This is a kind reminder for the upcoming StrepHit IEG project kick-off
> seminar.
> Schedule: 15 January 2016, 11:00 am
>
> **Important update:** the location has moved to downtown Trento.
> **New location:** Aula Grande - Fondazione Bruno Kessler, Via S.Croce 77,
> Trento, Italy - http://www.openstreetmap.org/way/67197096
>
> The seminar will be streamed online, a link will be shared as soon as it
> is available.
>
> See you in Trento!
> Cheers,
>
> Marco
>
> 2015-12-23 17:03 GMT+01:00 Marco Fossati <foss...@spaziodati.eu>:
>
>> [Begging pardon if you read this multiple times]
>>
>> Hi everyone,
>>
>> I would like to announce with great pleasure the StrepHit IEG project
>> kick-off seminar.
>> Of course, you are all invited to attend.
>>
>> The event will be held in a special day: Wikipedia's birthday!
>>
>> Below you can find the details.
>>
>> Schedule: 15 January 2016, 11:00 am, Luigi Stringa Conference Room
>> Location: Fondazione Bruno Kessler, Via Sommarive 18, Povo, Trento, Italy
>> - http://www.openstreetmap.org/way/28933739
>>
>> Abstract: We kick-off StrepHit, a project funded by the Wikimedia
>> Foundation through the Individual Engagement Grants program.
>> StrepHit is a Natural Language Processing pipeline that understands human
>> language, extracts facts from text and produces Wikidata statements with
>> reference URLs.
>> It will enhance the data quality of Wikidata by suggesting references to
>> validate statements, and will help Wikidata become the gold-standard hub of
>> the Open Data landscape.
>>
>> Link:
>> https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References
>>
>> Speaker's bio: Marco Fossati is a researcher with a double background in
>> Natural Languages and Information Technologies. He works at the Data and
>> Knowledge Management (DKM) research unit at Fondazione Bruno Kessler,
>> Trento, Italy. He is member of the DBpedia Association board of trustees,
>> founder and representative of its Italian chapter. He has interdisciplinary
>> skills both in linguistics and in programming. His research focuses on
>> bridging the gap between Natural Language Processing techniques and Large
>> Scale Structured Knowledge Bases in order to drive the Web of Data towards
>> its full potential.
>>
>> See you in Trento and long live Wikipedia!
>> Cheers,
>>
>> Marco
>>
>
>
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>

[Wikimedia-l] [ANNOUNCEMENT] StrepHit IEG project kick-off seminar

2015-12-23 Thread Marco Fossati
[Begging pardon if you read this multiple times]

Hi everyone,

I would like to announce with great pleasure the StrepHit IEG project
kick-off seminar.
Of course, you are all invited to attend.

The event will be held in a special day: Wikipedia's birthday!

Below you can find the details.

Schedule: 15 January 2016, 11:00 am, Luigi Stringa Conference Room
Location: Fondazione Bruno Kessler, Via Sommarive 18, Povo, Trento, Italy -
http://www.openstreetmap.org/way/28933739

Abstract: We kick-off StrepHit, a project funded by the Wikimedia
Foundation through the Individual Engagement Grants program.
StrepHit is a Natural Language Processing pipeline that understands human
language, extracts facts from text and produces Wikidata statements with
reference URLs.
It will enhance the data quality of Wikidata by suggesting references to
validate statements, and will help Wikidata become the gold-standard hub of
the Open Data landscape.

Link:
https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References

Speaker's bio: Marco Fossati is a researcher with a double background in
Natural Languages and Information Technologies. He works at the Data and
Knowledge Management (DKM) research unit at Fondazione Bruno Kessler,
Trento, Italy. He is member of the DBpedia Association board of trustees,
founder and representative of its Italian chapter. He has interdisciplinary
skills both in linguistics and in programming. His research focuses on
bridging the gap between Natural Language Processing techniques and Large
Scale Structured Knowledge Bases in order to drive the Web of Data towards
its full potential.

See you in Trento and long live Wikipedia!
Cheers,

Marco
___
Wikimedia-l mailing list, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
New messages to: Wikimedia-l@lists.wikimedia.org
Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/wikimedia-l, 
<mailto:wikimedia-l-requ...@lists.wikimedia.org?subject=unsubscribe>