[Wikidata] [Virtual OM-2020] Final CFP: 15th workshop on Ontology Matching collocated with ISWC

2020-07-13 Thread Pavel Shvaiko
--
 FINAL CALL FOR CONTRIBUTIONS
   THE SUBMISSION DEADLINE IS ON AUGUST 10TH, 2020
--
The Fifteenth International Workshop on
ONTOLOGY MATCHING
(OM-2020)
http://om2020.ontologymatching.org/
   November 2nd or 3rd, 2020,
 International Semantic Web Conference (ISWC) Workshop Program,
VIRTUAL CONFERENCE


BRIEF DESCRIPTION AND OBJECTIVES
Ontology matching is a key interoperability enabler for the Semantic Web,
as well as a useful technique in some classical data integration tasks
dealing with the semantic heterogeneity problem. It takes ontologies
as input and determines as output an alignment, that is, a set of
correspondences between the semantically related entities of those
ontologies.
These correspondences can be used for various tasks, such as ontology
merging, data interlinking, query answering or navigation over knowledge
graphs.
Thus, matching ontologies enables the knowledge and data expressed
with the matched ontologies to interoperate.


The workshop has three goals:
1.
To bring together leaders from academia, industry and user institutions
to assess how academic advances are addressing real-world requirements.
The workshop will strive to improve academic awareness of industrial
and final user needs, and therefore, direct research towards those needs.
Simultaneously, the workshop will serve to inform industry and user
representatives about existing research efforts that may meet their
requirements. The workshop will also investigate how the ontology
matching technology is going to evolve, especially with respect to
data interlinking, knowledge graph and web table matching tasks.

2.
To conduct an extensive and rigorous evaluation of ontology matching
and instance matching (link discovery) approaches through
the OAEI (Ontology Alignment Evaluation Initiative) 2020 campaign:
http://oaei.ontologymatching.org/2020/

3.
To examine similarities and differences from other, old, new
and emerging, techniques and usages, such as web table matching
or knowledge embeddings.


This year, in sync with the main conference, we encourage submissions
specifically devoted to: (i) datasets, benchmarks and replication studies,
services, software, methodologies, protocols and measures
(not necessarily related to OAEI), and (ii) application of
the matching technology in real-life scenarios and assessment
of its usefulness to the final users.


TOPICS of interest include but are not limited to:
Business and use cases for matching (e.g., big, open, closed data);
Requirements to matching from specific application scenarios (e.g.,
public sector, homeland security);
Application of matching techniques in real-world scenarios (e.g., in
cloud, with mobile apps);
Formal foundations and frameworks for matching;
Novel matching methods, including link prediction, ontology-based
access;
Matching and knowledge graphs;
Matching and deep learning;
Matching and embeddings;
Matching and big data;
Matching and linked data;
Instance matching, data interlinking and relations between them;
Privacy-aware matching;
Process model matching;
Large-scale and efficient matching techniques;
Matcher selection, combination and tuning;
User involvement (including both technical and organizational aspects);
Explanations in matching;
Social and collaborative matching;
Uncertainty in matching;
Expressive alignments;
Reasoning with alignments;
Alignment coherence and debugging;
Alignment management;
Matching for traditional applications (e.g., data science);
Matching for emerging applications (e.g., web tables, knowledge graphs).


SUBMISSIONS
Contributions to the workshop can be made in terms of technical papers and
posters/statements of interest addressing different issues of ontology
matching
as well as participating in the OAEI 2020 campaign. Long technical papers
should
be of max. 12 pages. Short technical papers should be of max. 5 pages.
Posters/statements of interest should not exceed 2 pages.
All contributions have to be prepared using the LNCS Style:
http://www.springer.com/computer/lncs?SGWID=0-164-6-793341-0
and should be submitted in PDF format (no later than August 10th, 2020)
through the workshop submission site at:

https://www.easychair.org/conferences/?conf=om2020

Contributors to the OAEI 2020 campaign have to follow the campaign
conditions
and schedule at http://oaei.ontologymatching.org/2020/.


DATES FOR TECHNICAL PAPERS AND POSTERS:
August 10th, 2020: Deadline for the submission of papers.
September 11th, 2020: Deadline for the notification of
acceptance/rejection.
September 21st, 2020: Workshop camera ready

Re: [Wikidata] 2 million queries against a Wikidata instance

2020-07-13 Thread Kingsley Idehen
On 7/13/20 1:41 PM, Adam Sanchez wrote:
> Hi,
>
> I have to launch 2 million queries against a Wikidata instance.
> I have loaded Wikidata in Virtuoso 7 (512 RAM, 32 cores, SSD disks with RAID 
> 0).
> The queries are simple, just 2 types.
>
> select ?s ?p ?o {
> ?s ?p ?o.
> filter (?s = ?param)
> }
>
> select ?s ?p ?o {
> ?s ?p ?o.
> filter (?o = ?param)
> }
>
> If I use a Java ThreadPoolExecutor takes 6 hours.
> How can I speed up the queries processing even more?
>
> I was thinking :
>
> a) to implement a Virtuoso cluster to distribute the queries or
> b) to load Wikidata in a Spark dataframe (since Sansa framework is
> very slow, I would use my own implementation) or
> c) to load Wikidata in a Postgresql table and use Presto to distribute
> the queries or
> d) to load Wikidata in a PG-Strom table to use GPU parallelism.
>
> What do you think? I am looking for ideas.
> Any suggestion will be appreciated.
>
> Best,


Hi Adam,

You need to increase the memory available to Virtuoso. If you are at
your limits that's when the Cluster Edition will come in handy i.e.,
enabling you build a large pool or memory from a sharded DB horizontally
partitioning over of collection of commodity computers.

There is a public Google Spreadsheet covering a variety of public
Virtuoso instances that should aid you in this process [1].

Links:

[1]
https://docs.google.com/spreadsheets/d/1-stlTC_WJmMU3xA_NxA1tSLHw6_sbpjff-5OITtrbFw/edit#gid=812792186

-- 
Regards,

Kingsley Idehen   
Founder & CEO 
OpenLink Software   
Home Page: http://www.openlinksw.com
Community Support: https://community.openlinksw.com
Weblogs (Blogs):
Company Blog: https://medium.com/openlink-software-blog
Virtuoso Blog: https://medium.com/virtuoso-blog
Data Access Drivers Blog: 
https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers

Personal Weblogs (Blogs):
Medium Blog: https://medium.com/@kidehen
Legacy Blogs: http://www.openlinksw.com/blog/~kidehen/
  http://kidehen.blogspot.com

Profile Pages:
Pinterest: https://www.pinterest.com/kidehen/
Quora: https://www.quora.com/profile/Kingsley-Uyi-Idehen
Twitter: https://twitter.com/kidehen
Google+: https://plus.google.com/+KingsleyIdehen/about
LinkedIn: http://www.linkedin.com/in/kidehen

Web Identities (WebID):
Personal: http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i
: 
http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this




smime.p7s
Description: S/MIME Cryptographic Signature
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] 2 million queries against a Wikidata instance

2020-07-13 Thread Amirouche Boubekki
Le lun. 13 juil. 2020 à 21:22, Adam Sanchez  a écrit :
>
> I have 14T SSD  (RAID 0)
>
> Le lun. 13 juil. 2020 à 21:19, Amirouche Boubekki
>  a écrit :
> >
> > Le lun. 13 juil. 2020 à 19:42, Adam Sanchez  a écrit 
> > :
> > >
> > > Hi,
> > >
> > > I have to launch 2 million queries against a Wikidata instance.
> > > I have loaded Wikidata in Virtuoso 7 (512 RAM, 32 cores, SSD disks with 
> > > RAID 0).
> > > The queries are simple, just 2 types.
> >
> > How much SSD in Gigabytes do you have?
> >
> > > select ?s ?p ?o {
> > > ?s ?p ?o.
> > > filter (?s = ?param)
> > > }

Can you confirm that the above query is the same as:

select ?p ?o {
  param ?p ?o
}

Where param is one of the two million params.

Also, did you investigate where the bottleneck is? Look into disk
usage and CPU load. glances [0] can provide that information.

Can you run the thread pool on another machine?

Some back of the envelope calculation 2 000 000 queries in 6 hours,
means your system achieve 10 milliseconds per query: AFAIK, that is
good.

[0] https://github.com/nicolargo/glances/

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] 2 million queries against a Wikidata instance

2020-07-13 Thread Adam Sanchez
I have 14T SSD  (RAID 0)

Le lun. 13 juil. 2020 à 21:19, Amirouche Boubekki
 a écrit :
>
> Le lun. 13 juil. 2020 à 19:42, Adam Sanchez  a écrit :
> >
> > Hi,
> >
> > I have to launch 2 million queries against a Wikidata instance.
> > I have loaded Wikidata in Virtuoso 7 (512 RAM, 32 cores, SSD disks with 
> > RAID 0).
> > The queries are simple, just 2 types.
>
> How much SSD in Gigabytes do you have?
>
> > select ?s ?p ?o {
> > ?s ?p ?o.
> > filter (?s = ?param)
> > }
>
> Is that the same as:
>
> select ?p ?o {
>  param ?p ?o
> }
>
> Where param is one of the two million params.
>
> > select ?s ?p ?o {
> > ?s ?p ?o.
> > filter (?o = ?param)
> > }
> >
> > If I use a Java ThreadPoolExecutor takes 6 hours.
> > How can I speed up the queries processing even more?
> >
> > I was thinking :
> >
> > a) to implement a Virtuoso cluster to distribute the queries or
> > b) to load Wikidata in a Spark dataframe (since Sansa framework is
> > very slow, I would use my own implementation) or
> > c) to load Wikidata in a Postgresql table and use Presto to distribute
> > the queries or
> > d) to load Wikidata in a PG-Strom table to use GPU parallelism.
> >
> > What do you think? I am looking for ideas.
> > Any suggestion will be appreciated.
> >
> > Best,
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
>
>
>
> --
> Amirouche ~ https://hyper.dev
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] 2 million queries against a Wikidata instance

2020-07-13 Thread Amirouche Boubekki
Le lun. 13 juil. 2020 à 19:42, Adam Sanchez  a écrit :
>
> Hi,
>
> I have to launch 2 million queries against a Wikidata instance.
> I have loaded Wikidata in Virtuoso 7 (512 RAM, 32 cores, SSD disks with RAID 
> 0).
> The queries are simple, just 2 types.

How much SSD in Gigabytes do you have?

> select ?s ?p ?o {
> ?s ?p ?o.
> filter (?s = ?param)
> }

Is that the same as:

select ?p ?o {
 param ?p ?o
}

Where param is one of the two million params.

> select ?s ?p ?o {
> ?s ?p ?o.
> filter (?o = ?param)
> }
>
> If I use a Java ThreadPoolExecutor takes 6 hours.
> How can I speed up the queries processing even more?
>
> I was thinking :
>
> a) to implement a Virtuoso cluster to distribute the queries or
> b) to load Wikidata in a Spark dataframe (since Sansa framework is
> very slow, I would use my own implementation) or
> c) to load Wikidata in a Postgresql table and use Presto to distribute
> the queries or
> d) to load Wikidata in a PG-Strom table to use GPU parallelism.
>
> What do you think? I am looking for ideas.
> Any suggestion will be appreciated.
>
> Best,
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata



-- 
Amirouche ~ https://hyper.dev

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] 2 million queries against a Wikidata instance

2020-07-13 Thread Imre Samu
> How can I speed up the queries processing even more?

imho: drop the unwanted data as early as you can ...  (~
aggressive prefiltering ;  ~ not import  )

> Any suggestion will be appreciated.

in your case ..
- I will check the RDF dumps ..
https://www.wikidata.org/wiki/Wikidata:Database_download#RDF_dumps
- I will try to write a custom filter for pre-filter for 2 million
parameters  ... ( simple text parsing ..  in GoLang; using multiple cores
... or with other fast code  )
- and just  load the results to PostgreSQL ..

I have a good experience - parsing the and filtering the wikidata json dump
(gzipped) .. and loading the result to PostgreSQL database ..
I can run the full code on my laptop  and the result in my case ~
12 GB in the PostgreSQL ...

the biggest problem .. the memory requirements of  "2 million parameters"
 .. but you can choose some fast key-value storage .. like RocksDB ...
but there are other low tech parsing solutions ...

Regards,
 Imre



Best,
 Imre



Adam Sanchez  ezt írta (időpont: 2020. júl. 13., H,
19:42):

> Hi,
>
> I have to launch 2 million queries against a Wikidata instance.
> I have loaded Wikidata in Virtuoso 7 (512 RAM, 32 cores, SSD disks with
> RAID 0).
> The queries are simple, just 2 types.
>
> select ?s ?p ?o {
> ?s ?p ?o.
> filter (?s = ?param)
> }
>
> select ?s ?p ?o {
> ?s ?p ?o.
> filter (?o = ?param)
> }
>
> If I use a Java ThreadPoolExecutor takes 6 hours.
> How can I speed up the queries processing even more?
>
> I was thinking :
>
> a) to implement a Virtuoso cluster to distribute the queries or
> b) to load Wikidata in a Spark dataframe (since Sansa framework is
> very slow, I would use my own implementation) or
> c) to load Wikidata in a Postgresql table and use Presto to distribute
> the queries or
> d) to load Wikidata in a PG-Strom table to use GPU parallelism.
>
> What do you think? I am looking for ideas.
> Any suggestion will be appreciated.
>
> Best,
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] New URL for OpenRefine reconciliation service

2020-07-13 Thread Reuben Honigwachs
In OpenRefine 3.3 adding the new service with above URL is fine, but when
trying to remove the old service with "x" OpenRefine tries to start
reconciling instead. Fails consequently, and hangs. Same for you all?
Thanks.

On Wed, 8 Jul 2020 at 15:42, Hay (Husky)  wrote:

> Cheers, this seems to work again for me!
>
> -- Hay
>
> On Wed, Jul 8, 2020 at 2:10 PM Antonin Delpeuch (lists)
>  wrote:
> >
> > Hi,
> >
> > This change is now live! If you cannot reconcile to Wikidata anymore,
> > delete the Wikidata reconciliation service and add it again with the new
> > URL:
> >
> > https://wdreconcile.toolforge.org/en/api
> >
> > or by replacing "en" by any other Wikimedia language code.
> >
> > Cheers,
> >
> > Antonin
> >
> > On 15/06/2020 00:22, Antonin Delpeuch (lists) wrote:
> > > Hi,
> > >
> > > The upcoming domain name migration to on the Wikimedia Toolforge
> implies
> > > that OpenRefine users need to update their Wikidata reconciliation
> > > service to the new endpoint:
> > >
> > > https://wdreconcile.toolforge.org/en/api
> > >
> > > or by replacing "en" by any other Wikimedia language code.
> > >
> > > The new home page of the service is at:
> > >
> > > https://wdreconcile.toolforge.org/
> > >
> > > This new endpoint will be available by default in the upcoming release
> > > of OpenRefine (3.4).
> > >
> > > For details about why an automatic migration via redirects is sadly not
> > > possible, see this Phabricator ticket:
> > >
> > > https://phabricator.wikimedia.org/T254172
> > >
> > > Cheers,
> > >
> > > Antonin
> > >
> > >
> >
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] 2 million queries against a Wikidata instance

2020-07-13 Thread Adam Sanchez
Hi,

I have to launch 2 million queries against a Wikidata instance.
I have loaded Wikidata in Virtuoso 7 (512 RAM, 32 cores, SSD disks with RAID 0).
The queries are simple, just 2 types.

select ?s ?p ?o {
?s ?p ?o.
filter (?s = ?param)
}

select ?s ?p ?o {
?s ?p ?o.
filter (?o = ?param)
}

If I use a Java ThreadPoolExecutor takes 6 hours.
How can I speed up the queries processing even more?

I was thinking :

a) to implement a Virtuoso cluster to distribute the queries or
b) to load Wikidata in a Spark dataframe (since Sansa framework is
very slow, I would use my own implementation) or
c) to load Wikidata in a Postgresql table and use Presto to distribute
the queries or
d) to load Wikidata in a PG-Strom table to use GPU parallelism.

What do you think? I am looking for ideas.
Any suggestion will be appreciated.

Best,

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] Weekly Summary #424

2020-07-13 Thread Mohammed Sadat Abdulai
Here's your quick overview of what has been happening around Wikidata over
the last week.
Events 

   - Upcoming: Next Linked Data for Libraries LD4 Wikidata Affinity Group
   call: New co-facilitator introductions and tools for editathons, 14 July.
   Agenda
   

   - Upcoming video: Wikipedia Weekly Network - LIVE Wikidata editing #12,
   18 July Facebook
   
,
   YouTube 
   - Past: Celtic Knot Wikimedia language conference
   , online,
   July 9-10. Videos, documentation and slides of sessions are accessible from
   the program page.

Press, articles, blog posts, videos


   - Using OpenRefine with Wikidata for the first time
   
,
   by Addshore
   - Using Wikidata in Mathematica and Wolfram
   
,
   by Toni Schindler
   - Fostering the presence of museums on Wikipedia and Wikidata
   
,
   by Debora Lopomo
   - Extending the Met’s reach with Wikidata
   
,
   by Jennie Choi
   - Wikidata-related videos from the Celtic Knot Conference:
  - Wikidata and how you can use it to support minority languages
   (Mohammed Sadat)
  - Wikidata-powered infoboxes
   (Pau Cabot)
  - Hands-on Wikidata and lexicographical data
   (Nicolas Vigneron and
  Léa Lacroix)
  - FAIR linguistic data thanks to norm data – Wikidata as part of the
  research project VerbaAlpina
   (Christina Mutter)
  - Sámi placenames on Wikidata
   (Jon Harald Søby)
   - Video: Wikipedia Weekly Network - Wikidata editing #11 Facebook
   
,
   YouTube 
   - Video: Introducing the Wikidata powered Dictionary of Welsh Biography
   timeline (Jason Evans), YouTube
   
   - Video: Diving into Wikidata workshop (Will Kent), YouTube
   
   - Video: Linked data for libraries with Wikibase, Infrastructure track
   (Jens Ohlig), YouTube 

Tool of the week

   - DragNDrop.js 
   makes it possible to show a Wikipedia article and drag items to Wikidata.

Other Noteworthy Stuff

   - GQIS can now render results from a Wikidata SPARQL query with a new
   plugin .
   - New feature in *V for Wikipedia* that uses Wikidata identifiers to
   direct users to their favorite streaming app
   .

Did you know?

   - Newest properties
   :
  - General datatypes: Dowker-Thistlethwaite name
  
  - External identifiers: AlloCiné theater ID
  , Maniadb artist ID
  , Group Properties wiki
  ID , Oberwolfach
  mathematician ID , Archive
  Of Our Own tag , GameBanana
  video game ID , Spanish
  Olympic Committee athlete ID
  
   - New property proposals
   
   to review:
  - General datatypes: Image link
  ,
  numeric ID
  ,
  peak bagging classification
  
,
  vanDerWaalsConstantA
  
,
  vanDerWaalsConstantB
  


Re: [Wikidata] Differences in label searching with SPARQL and MediaWiki API

2020-07-13 Thread David Causse
On Sat, Jul 11, 2020 at 7:12 PM Thad Guidry  wrote:

> This query times out:
>
> SELECT ?item ?label
> WHERE
> {
>   ?item wdt:P31 ?instance ;
> rdfs:label ?label ;
> rdfs:label ?enLabel .
>   FILTER(CONTAINS(lcase(?label), "Soriano")).
>   FILTER(?instance != wd:Q5).
>   SERVICE wikibase:label {bd:serviceParam wikibase:language "en".}
> }
> LIMIT 100
>
> I have this feeling that it's not actually using an index or even asking
> the right question and so is slow and times out?
>
>
Indeed, none of the criteria in your query allows the triple store to
determine an index to follow to extract the results in a timely manner.
The sole non negative criterion would be FILTER(CONTAINS(lcase(?label),
"Soriano")) but being in a FILTER and moreover a function it cannot be used
to determine an index to work on.
The only way to speed-up your query would be to introduce a discriminant
"matching" criterion.

However the MediaWiki wbsearchentities API does seem to use an index and is
> performant for label searching:
>
> https://www.wikidata.org/w/api.php?action=wbsearchentities&search=soriano&language=en
>
>
wbsearchentitiies is backed by elasticsearch which is optimized for such
lookups.

How can I get my SPARQL query to be more performant or asking the right
> question?
>
>
Unfortunate I don't see an obvious way to adapt your sparql query and keep
exactly the same semantic but to illustrate the problem:

SELECT ?item ?label WHERE {
  ?item wdt:P31 ?instance ;
rdfs:label "Soriano"@en .
  FILTER(?instance != wd:Q5).
}
LIMIT 100

will return results in a timely manner, only because we helped the graph
traversal with an initial path on ?item rdfs:label "Soriano"@en.

But by combining the query service and the wikidata API[0] baked by
elasticsearch I think you can extract what you want:

SELECT ?item ?itemLabel WHERE {
  ?item wdt:P31 ?instance .
  FILTER(?instance != wd:Q5).
  SERVICE wikibase:mwapi {
  bd:serviceParam wikibase:endpoint "www.wikidata.org";
wikibase:api "EntitySearch";
mwapi:search "soriano";
mwapi:language "en".
  ?item wikibase:apiOutputItem mwapi:item.
  }
  SERVICE wikibase:label {bd:serviceParam wikibase:language "en".}
}
LIMIT 100

This query will first contact EntitySearch (an alias to wbsearchentities)
which will pass the items it found to the triple store which in turn can
now query the graph in a timely manner. Obviously this solution only works
if the number of items returned by wbsearchentities remains reasonable.

0: https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI
--
David C.
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata