Re: [Wikidata] SPARQL power users and developers

2016-10-02 Thread Yuri Astrakhan
I would highly recommend using X-Analytics header for this, and
establishing a "well known" key name(s). X-Analytics gets parsed into
key-value pairs (object field) by our varnish/hadoop infrastructure,
whereas the user agent is basically a semi-free form text string. Also,
user agent cannot be set for by any javascript client, so we will
constantly have to perform two types of analysis - those that came from the
"backend" and those that were made by the browser.

On Sun, Oct 2, 2016 at 4:28 PM Stas Malyshev 
wrote:

> Hi!
>
> > I'll try to throw in a #TOOL: comment where I can remember using SPARQL,
> > but I'll be bound to forget a few...
>
> Thanks, though using distinct User-Agent may be easier for analysis,
> since those are stored as separate fields, and doing operations on
> separate field would be much easier than extracting comments from query
> field e.g. when doing Hive data processing.
>
> --
> Stas Malyshev
> smalys...@wikimedia.org
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] SPARQL power users and developers

2016-10-02 Thread Stas Malyshev
Hi!

> I'll try to throw in a #TOOL: comment where I can remember using SPARQL,
> but I'll be bound to forget a few...

Thanks, though using distinct User-Agent may be easier for analysis,
since those are stored as separate fields, and doing operations on
separate field would be much easier than extracting comments from query
field e.g. when doing Hive data processing.

-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] SPARQL power users and developers

2016-09-30 Thread Stas Malyshev
Hi!

> Would it help if I add the following header to every large batch of queries?

I think having a distinct User-Agent header (maybe with URL linking to
the rest of the info) would be enough. This is recorded by the request
log and can be used later in processing.

In general, every time you create a bot which does large number of
processing it is a good practice to send distinct User-Agent header so
people on the other side would know what's going on.

-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] SPARQL power users and developers

2016-09-30 Thread Andra Waagmeester
>
> Once we have this, we would like to analyse for content (which properties
> and classes are used, etc.) but also for query feature (how many OPTIONALs,
> GROUP BYs, etc. are used). Ideas on what to analyse further are welcome. Of
> course, SPARQL can only give a partial idea of "usage", since Wikidata
> content can be used in ways that don't involve SPARQL. Moreover, counting
> raw numbers of queries can also be misleading: we have had cases where a
> single query result was discussed by hundreds of people (e.g. the Panama
> papers query that made it to Le Monde online), but in the logs it will
> still show up only as a single query among millions.
>

Yes I agree and we certainly need to look into different metrics on how
Wikidata is used. I am happy to join the discussion, but even the partial
view on the usage is already a big step forward. A lot of the data being
fed into Wikidata through the different bots resulted from funded
initiatives. Currently, we have no way of demonstrating to funders how
using Wikidata in distributing their efforts  is beneficial to the
community at large. Simply counting the shared use of different properties
could already be a very crude metric on the dissemination of scientific
knowledge over different domains.
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] SPARQL power users and developers

2016-09-30 Thread Markus Kroetzsch

On 30.09.2016 20:47, Denny Vrandečić wrote:

Markus, do you have access to the corresponding HTTP request logs? The
fields there might be helpful (although I might be overtly optimistic
about it)


Yes, we can access all logs. For bot-based queries, this should be very 
helpful indeed. I can still think of several cases where this won't help 
much:


* People writing a quick Python (or whatever) script to run thousands of 
queries, without setting a meaningful user agent.
* Web applications like Reasonator or SQID that cause the client to run 
SPARQL queries when viewing a page (in this case, the user agent that 
gets logged is the user's browser).


But, yes, we will definitely look at all signals that we can get from 
the data.


Best,

Markus





On Fri, Sep 30, 2016 at 11:38 AM Yuri Astrakhan
> wrote:

I guess I qualify for #2 several times:
* The  &  support access to the geoshapes
service, which in turn can make requests to WDQS. For example, see
https://en.wikipedia.org/wiki/User:Yurik/maplink  (click on
"governor's link")

* The  wiki tag supports the same geoshapes service, as well
as direct queries to WDQS. This graph uses both (one to get all
countries, the other is to get the list of disasters)
https://www.mediawiki.org/wiki/Extension:Graph/Demo/Sparql/Largest_disasters

* There has been some discussion to allow direct WDQS querying from
maps too - e.g. to draw points of interest based on Wikidata (very
easy to implement, but we should be careful to cache it properly)

Since all these queries are called from either nodejs or our
javascript, we could attach extra headers, like X-Analytics, which
is already handled by Varnish.  Also, NodeJS queries could set the
user agent string.


On Fri, Sep 30, 2016 at 10:44 AM Markus Kroetzsch
> wrote:

On 30.09.2016 16:18, Andra Waagmeester wrote:
> Would it help if I add the following header to every large
batch of queries?
>
> ###
> # access: (http://query.wikidata.org
> or
https://query.wikidata.org/bigdata/namespace/wdq/sparql?query={SPARQL}


.)
> # contact: email, acountname, twittername etc
> # bot: True/False
> # .
> ##

This is already more detailed than what I had in mind. Having a
way to
tell apart bots and tools from "organic" queries would already
be great.
We are mainly looking for something that will help us to understand
sudden peaks of activity. For this, it might be enough to have a
short
signature (a URL could be given, but a tool name with a version
would
also be fine). This is somewhat like the "user agent" field in HTTP.

But you are right that some formatting convention may help
further here.
How about this:

#TOOL:

Then one could look for comments of this form without knowing
all the
tools upfront. Of course, this is just a hint in any case, since one
could always use the same comment in any manually written query.

Best regards,

Markus

>
> On Fri, Sep 30, 2016 at 4:00 PM, Markus Kroetzsch
> 
>>
> wrote:
>
> Dear SPARQL users,
>
> We are starting a research project to investigate the use
of the
> Wikidata SPARQL Query Service, with the goal to gain
insights that
> may help to improve Wikidata and the query service [1].
Currently,
> we are still waiting for all data to become available.
Meanwhile, we
> would like to ask for your input.
>
> Preliminary analyses show that the use of the SPARQL query
service
> varies greatly over time, presumably because power users and
> software tools are running large numbers of queries. For a
> meaningful analysis, we would like to understand such
high-impact
> biases in the data. We therefore need your help:
>
> (1) Are you a SPARQL power user who sometimes runs large
numbers of
> queries (over 10,000)? If so, please let us know how your
queries
> might typically look so we can identify them in the logs.
>
> (2) Are you the developer of a tool that launches SPARQL
queries? If
> so, then please let us know if there is any way to

Re: [Wikidata] SPARQL power users and developers

2016-09-30 Thread Maarten Dammers

Hi Denny,

On 30-09-16 20:47, Denny Vrandečić wrote:
Markus, do you have access to the corresponding HTTP request logs? The 
fields there might be helpful (although I might be overtly optimistic 
about it)
I was about to say the same. I use pywikibot quite a lot and it sends 
some nice headers like described at 
https://www.mediawiki.org/wiki/API:Main_page#Identifying_your_client .


Maarten

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] SPARQL power users and developers

2016-09-30 Thread Sebastian Burgstaller
Hi Markus,

 I assume I qualify for (1) and (2). I can add an identifyable comment
with a  '#Tool:' prefix to every major sparql query done by our tools.

One bot run usually generates a few very heavy queries, and 10,000s of
smaller ones, depending on the actual task a bot performs. All of this
serves to keep the data in WD consistent, avoid duplicates, etc and,
in principle, acts as a combination of database connector and Wikidata
API wrapper.

Best,
Sebastian

-- 

Sebastian Burgstaller-Muehlbacher, PhD
Research Associate
Andrew Su Lab
MEM-216, Department of Molecular and Experimental Medicine
The Scripps Research Institute
10550 North Torrey Pines Road
La Jolla, CA 92037


On Fri, Sep 30, 2016 at 11:53 AM, Markus Kroetzsch
 wrote:
> On 30.09.2016 19:50, Andra Waagmeester wrote:
>>
>> Just curious while we are on the topic. When you are inspecting the
>> headers to separate between "organic" queries and bot queries, would it
>> be possible to count the times a set of properties is used in the
>> different queries? This would be a nice way to demonstrate to original
>> external resources how "their" data is used and which combination of
>> properties are used together with "their" properties (eg. P351 for ncbi
>> gene or P699 for the disease ontology). It would be interesting to know
>> how often for example those two properties are used in one single query.
>
>
> Yes, we definitely want to do such analyses. The first task is to clean up
> and group/categorize queries so we can get a better understanding (if a
> property is used in 100K queries a day, it would still be nice to know if
> they come from a single script or from many users).
>
> Once we have this, we would like to analyse for content (which properties
> and classes are used, etc.) but also for query feature (how many OPTIONALs,
> GROUP BYs, etc. are used). Ideas on what to analyse further are welcome. Of
> course, SPARQL can only give a partial idea of "usage", since Wikidata
> content can be used in ways that don't involve SPARQL. Moreover, counting
> raw numbers of queries can also be misleading: we have had cases where a
> single query result was discussed by hundreds of people (e.g. the Panama
> papers query that made it to Le Monde online), but in the logs it will still
> show up only as a single query among millions.
>
> Best,
>
> Markus
>
>
>> On Fri, Sep 30, 2016 at 4:44 PM, Markus Kroetzsch
>> >
>> wrote:
>>
>> On 30.09.2016 16:18, Andra Waagmeester wrote:
>>
>> Would it help if I add the following header to every large batch
>> of queries?
>>
>> ###
>> # access: (http://query.wikidata.org
>> or
>>
>> https://query.wikidata.org/bigdata/namespace/wdq/sparql?query={SPARQL}
>>
>> 
>> .)
>> # contact: email, acountname, twittername etc
>> # bot: True/False
>> # .
>> ##
>>
>>
>> This is already more detailed than what I had in mind. Having a way
>> to tell apart bots and tools from "organic" queries would already be
>> great. We are mainly looking for something that will help us to
>> understand sudden peaks of activity. For this, it might be enough to
>> have a short signature (a URL could be given, but a tool name with a
>> version would also be fine). This is somewhat like the "user agent"
>> field in HTTP.
>>
>> But you are right that some formatting convention may help further
>> here. How about this:
>>
>> #TOOL:
>>
>> Then one could look for comments of this form without knowing all
>> the tools upfront. Of course, this is just a hint in any case, since
>> one could always use the same comment in any manually written query.
>>
>> Best regards,
>>
>> Markus
>>
>>
>> On Fri, Sep 30, 2016 at 4:00 PM, Markus Kroetzsch
>> > 
>> >
>> >>
>>
>> wrote:
>>
>> Dear SPARQL users,
>>
>> We are starting a research project to investigate the use of
>> the
>> Wikidata SPARQL Query Service, with the goal to gain
>> insights that
>> may help to improve Wikidata and the query service [1].
>> Currently,
>> we are still waiting for all data to become available.
>> Meanwhile, we
>> would like to ask for your input.
>>
>> Preliminary analyses show that the use of the SPARQL query
>> service
>> varies greatly over time, presumably because power users and
>> software tools are running large numbers of queries. For a
>> meaningful analysis, we would like to understand such
>>

Re: [Wikidata] SPARQL power users and developers

2016-09-30 Thread Markus Kroetzsch

On 30.09.2016 19:50, Andra Waagmeester wrote:

Just curious while we are on the topic. When you are inspecting the
headers to separate between "organic" queries and bot queries, would it
be possible to count the times a set of properties is used in the
different queries? This would be a nice way to demonstrate to original
external resources how "their" data is used and which combination of
properties are used together with "their" properties (eg. P351 for ncbi
gene or P699 for the disease ontology). It would be interesting to know
how often for example those two properties are used in one single query.


Yes, we definitely want to do such analyses. The first task is to clean 
up and group/categorize queries so we can get a better understanding (if 
a property is used in 100K queries a day, it would still be nice to know 
if they come from a single script or from many users).


Once we have this, we would like to analyse for content (which 
properties and classes are used, etc.) but also for query feature (how 
many OPTIONALs, GROUP BYs, etc. are used). Ideas on what to analyse 
further are welcome. Of course, SPARQL can only give a partial idea of 
"usage", since Wikidata content can be used in ways that don't involve 
SPARQL. Moreover, counting raw numbers of queries can also be 
misleading: we have had cases where a single query result was discussed 
by hundreds of people (e.g. the Panama papers query that made it to Le 
Monde online), but in the logs it will still show up only as a single 
query among millions.


Best,

Markus



On Fri, Sep 30, 2016 at 4:44 PM, Markus Kroetzsch
>
wrote:

On 30.09.2016 16:18, Andra Waagmeester wrote:

Would it help if I add the following header to every large batch
of queries?

###
# access: (http://query.wikidata.org
or
https://query.wikidata.org/bigdata/namespace/wdq/sparql?query={SPARQL}


.)
# contact: email, acountname, twittername etc
# bot: True/False
# .
##


This is already more detailed than what I had in mind. Having a way
to tell apart bots and tools from "organic" queries would already be
great. We are mainly looking for something that will help us to
understand sudden peaks of activity. For this, it might be enough to
have a short signature (a URL could be given, but a tool name with a
version would also be fine). This is somewhat like the "user agent"
field in HTTP.

But you are right that some formatting convention may help further
here. How about this:

#TOOL:

Then one could look for comments of this form without knowing all
the tools upfront. Of course, this is just a hint in any case, since
one could always use the same comment in any manually written query.

Best regards,

Markus


On Fri, Sep 30, 2016 at 4:00 PM, Markus Kroetzsch

>>

wrote:

Dear SPARQL users,

We are starting a research project to investigate the use of the
Wikidata SPARQL Query Service, with the goal to gain
insights that
may help to improve Wikidata and the query service [1].
Currently,
we are still waiting for all data to become available.
Meanwhile, we
would like to ask for your input.

Preliminary analyses show that the use of the SPARQL query
service
varies greatly over time, presumably because power users and
software tools are running large numbers of queries. For a
meaningful analysis, we would like to understand such
high-impact
biases in the data. We therefore need your help:

(1) Are you a SPARQL power user who sometimes runs large
numbers of
queries (over 10,000)? If so, please let us know how your
queries
might typically look so we can identify them in the logs.

(2) Are you the developer of a tool that launches SPARQL
queries? If
so, then please let us know if there is any way to identify your
queries.

If (1) or (2) applies to you, then it would be good if you could
include an identifying comment into your SPARQL queries in the
future, to make it easier to recognise them. In return, this
would
enable us to provide you with statistics on the usage of
your tool [2].

Further feedback is welcome.

Cheers,

Markus


[1]


Re: [Wikidata] SPARQL power users and developers

2016-09-30 Thread Denny Vrandečić
Markus, do you have access to the corresponding HTTP request logs? The
fields there might be helpful (although I might be overtly optimistic about
it)

On Fri, Sep 30, 2016 at 11:38 AM Yuri Astrakhan 
wrote:

> I guess I qualify for #2 several times:
> * The  &  support access to the geoshapes service,
> which in turn can make requests to WDQS. For example, see
> https://en.wikipedia.org/wiki/User:Yurik/maplink  (click on "governor's
> link")
>
> * The  wiki tag supports the same geoshapes service, as well as
> direct queries to WDQS. This graph uses both (one to get all countries, the
> other is to get the list of disasters)
>
> https://www.mediawiki.org/wiki/Extension:Graph/Demo/Sparql/Largest_disasters
>
> * There has been some discussion to allow direct WDQS querying from maps
> too - e.g. to draw points of interest based on Wikidata (very easy to
> implement, but we should be careful to cache it properly)
>
> Since all these queries are called from either nodejs or our javascript,
> we could attach extra headers, like X-Analytics, which is already handled
> by Varnish.  Also, NodeJS queries could set the user agent string.
>
>
> On Fri, Sep 30, 2016 at 10:44 AM Markus Kroetzsch <
> markus.kroetz...@tu-dresden.de> wrote:
>
>> On 30.09.2016 16:18, Andra Waagmeester wrote:
>> > Would it help if I add the following header to every large batch of
>> queries?
>> >
>> > ###
>> > # access: (http://query.wikidata.org
>> > or
>> https://query.wikidata.org/bigdata/namespace/wdq/sparql?query={SPARQL} .)
>> > # contact: email, acountname, twittername etc
>> > # bot: True/False
>> > # .
>> > ##
>>
>> This is already more detailed than what I had in mind. Having a way to
>> tell apart bots and tools from "organic" queries would already be great.
>> We are mainly looking for something that will help us to understand
>> sudden peaks of activity. For this, it might be enough to have a short
>> signature (a URL could be given, but a tool name with a version would
>> also be fine). This is somewhat like the "user agent" field in HTTP.
>>
>> But you are right that some formatting convention may help further here.
>> How about this:
>>
>> #TOOL:
>>
>> Then one could look for comments of this form without knowing all the
>> tools upfront. Of course, this is just a hint in any case, since one
>> could always use the same comment in any manually written query.
>>
>> Best regards,
>>
>> Markus
>>
>> >
>> > On Fri, Sep 30, 2016 at 4:00 PM, Markus Kroetzsch
>> > > >>
>> > wrote:
>> >
>> > Dear SPARQL users,
>> >
>> > We are starting a research project to investigate the use of the
>> > Wikidata SPARQL Query Service, with the goal to gain insights that
>> > may help to improve Wikidata and the query service [1]. Currently,
>> > we are still waiting for all data to become available. Meanwhile, we
>> > would like to ask for your input.
>> >
>> > Preliminary analyses show that the use of the SPARQL query service
>> > varies greatly over time, presumably because power users and
>> > software tools are running large numbers of queries. For a
>> > meaningful analysis, we would like to understand such high-impact
>> > biases in the data. We therefore need your help:
>> >
>> > (1) Are you a SPARQL power user who sometimes runs large numbers of
>> > queries (over 10,000)? If so, please let us know how your queries
>> > might typically look so we can identify them in the logs.
>> >
>> > (2) Are you the developer of a tool that launches SPARQL queries? If
>> > so, then please let us know if there is any way to identify your
>> > queries.
>> >
>> > If (1) or (2) applies to you, then it would be good if you could
>> > include an identifying comment into your SPARQL queries in the
>> > future, to make it easier to recognise them. In return, this would
>> > enable us to provide you with statistics on the usage of your tool
>> [2].
>> >
>> > Further feedback is welcome.
>> >
>> > Cheers,
>> >
>> > Markus
>> >
>> >
>> > [1]
>> >
>> https://meta.wikimedia.org/wiki/Research:Understanding_Wikidata_Queries
>> > <
>> https://meta.wikimedia.org/wiki/Research:Understanding_Wikidata_Queries>
>> >
>> > [2] Pending permission by the WMF. Like all Wikimedia usage data,
>> > the query logs are under strict privacy protection, so we will need
>> > to get clearance before sharing any findings with the public. We
>> > hope, however, that there won't be any reservations against
>> > publishing non-identifying information.
>> >
>> > --
>> > Prof. Dr. Markus Kroetzsch
>> > Knowledge-Based Systems Group
>> > Faculty of Computer Science
>> > TU Dresden
>> > +49 351 463 38486 
>> > https://iccl.inf.tu-dresden.de/web/KBS/en
>> > 

Re: [Wikidata] SPARQL power users and developers

2016-09-30 Thread Yuri Astrakhan
I guess I qualify for #2 several times:
* The  &  support access to the geoshapes service, which
in turn can make requests to WDQS. For example, see
https://en.wikipedia.org/wiki/User:Yurik/maplink  (click on "governor's
link")

* The  wiki tag supports the same geoshapes service, as well as
direct queries to WDQS. This graph uses both (one to get all countries, the
other is to get the list of disasters)
https://www.mediawiki.org/wiki/Extension:Graph/Demo/Sparql/Largest_disasters

* There has been some discussion to allow direct WDQS querying from maps
too - e.g. to draw points of interest based on Wikidata (very easy to
implement, but we should be careful to cache it properly)

Since all these queries are called from either nodejs or our javascript, we
could attach extra headers, like X-Analytics, which is already handled by
Varnish.  Also, NodeJS queries could set the user agent string.


On Fri, Sep 30, 2016 at 10:44 AM Markus Kroetzsch <
markus.kroetz...@tu-dresden.de> wrote:

> On 30.09.2016 16:18, Andra Waagmeester wrote:
> > Would it help if I add the following header to every large batch of
> queries?
> >
> > ###
> > # access: (http://query.wikidata.org
> > or
> https://query.wikidata.org/bigdata/namespace/wdq/sparql?query={SPARQL} .)
> > # contact: email, acountname, twittername etc
> > # bot: True/False
> > # .
> > ##
>
> This is already more detailed than what I had in mind. Having a way to
> tell apart bots and tools from "organic" queries would already be great.
> We are mainly looking for something that will help us to understand
> sudden peaks of activity. For this, it might be enough to have a short
> signature (a URL could be given, but a tool name with a version would
> also be fine). This is somewhat like the "user agent" field in HTTP.
>
> But you are right that some formatting convention may help further here.
> How about this:
>
> #TOOL:
>
> Then one could look for comments of this form without knowing all the
> tools upfront. Of course, this is just a hint in any case, since one
> could always use the same comment in any manually written query.
>
> Best regards,
>
> Markus
>
> >
> > On Fri, Sep 30, 2016 at 4:00 PM, Markus Kroetzsch
> > >
> > wrote:
> >
> > Dear SPARQL users,
> >
> > We are starting a research project to investigate the use of the
> > Wikidata SPARQL Query Service, with the goal to gain insights that
> > may help to improve Wikidata and the query service [1]. Currently,
> > we are still waiting for all data to become available. Meanwhile, we
> > would like to ask for your input.
> >
> > Preliminary analyses show that the use of the SPARQL query service
> > varies greatly over time, presumably because power users and
> > software tools are running large numbers of queries. For a
> > meaningful analysis, we would like to understand such high-impact
> > biases in the data. We therefore need your help:
> >
> > (1) Are you a SPARQL power user who sometimes runs large numbers of
> > queries (over 10,000)? If so, please let us know how your queries
> > might typically look so we can identify them in the logs.
> >
> > (2) Are you the developer of a tool that launches SPARQL queries? If
> > so, then please let us know if there is any way to identify your
> > queries.
> >
> > If (1) or (2) applies to you, then it would be good if you could
> > include an identifying comment into your SPARQL queries in the
> > future, to make it easier to recognise them. In return, this would
> > enable us to provide you with statistics on the usage of your tool
> [2].
> >
> > Further feedback is welcome.
> >
> > Cheers,
> >
> > Markus
> >
> >
> > [1]
> >
> https://meta.wikimedia.org/wiki/Research:Understanding_Wikidata_Queries
> > <
> https://meta.wikimedia.org/wiki/Research:Understanding_Wikidata_Queries>
> >
> > [2] Pending permission by the WMF. Like all Wikimedia usage data,
> > the query logs are under strict privacy protection, so we will need
> > to get clearance before sharing any findings with the public. We
> > hope, however, that there won't be any reservations against
> > publishing non-identifying information.
> >
> > --
> > Prof. Dr. Markus Kroetzsch
> > Knowledge-Based Systems Group
> > Faculty of Computer Science
> > TU Dresden
> > +49 351 463 38486 
> > https://iccl.inf.tu-dresden.de/web/KBS/en
> > 
> >
> > ___
> > Wikidata mailing list
> > Wikidata@lists.wikimedia.org 
> > https://lists.wikimedia.org/mailman/listinfo/wikidata
> > 
> >
> >
> >
> >
> > ___
> > Wikidata 

Re: [Wikidata] SPARQL power users and developers

2016-09-30 Thread Andra Waagmeester
Just curious while we are on the topic. When you are inspecting the headers
to separate between "organic" queries and bot queries, would it be possible
to count the times a set of properties is used in the different queries?
This would be a nice way to demonstrate to original external resources how
"their" data is used and which combination of properties are used together
with "their" properties (eg. P351 for ncbi gene or P699 for the disease
ontology). It would be interesting to know how often for example those two
properties are used in one single query.

Cheers,

Andra

On Fri, Sep 30, 2016 at 4:44 PM, Markus Kroetzsch <
markus.kroetz...@tu-dresden.de> wrote:

> On 30.09.2016 16:18, Andra Waagmeester wrote:
>
>> Would it help if I add the following header to every large batch of
>> queries?
>>
>> ###
>> # access: (http://query.wikidata.org
>> or https://query.wikidata.org/bigdata/namespace/wdq/sparql?query={SPARQL}
>> .)
>> # contact: email, acountname, twittername etc
>> # bot: True/False
>> # .
>> ##
>>
>
> This is already more detailed than what I had in mind. Having a way to
> tell apart bots and tools from "organic" queries would already be great. We
> are mainly looking for something that will help us to understand sudden
> peaks of activity. For this, it might be enough to have a short signature
> (a URL could be given, but a tool name with a version would also be fine).
> This is somewhat like the "user agent" field in HTTP.
>
> But you are right that some formatting convention may help further here.
> How about this:
>
> #TOOL:
>
> Then one could look for comments of this form without knowing all the
> tools upfront. Of course, this is just a hint in any case, since one could
> always use the same comment in any manually written query.
>
> Best regards,
>
> Markus
>
>
>> On Fri, Sep 30, 2016 at 4:00 PM, Markus Kroetzsch
>> >
>>
>> wrote:
>>
>> Dear SPARQL users,
>>
>> We are starting a research project to investigate the use of the
>> Wikidata SPARQL Query Service, with the goal to gain insights that
>> may help to improve Wikidata and the query service [1]. Currently,
>> we are still waiting for all data to become available. Meanwhile, we
>> would like to ask for your input.
>>
>> Preliminary analyses show that the use of the SPARQL query service
>> varies greatly over time, presumably because power users and
>> software tools are running large numbers of queries. For a
>> meaningful analysis, we would like to understand such high-impact
>> biases in the data. We therefore need your help:
>>
>> (1) Are you a SPARQL power user who sometimes runs large numbers of
>> queries (over 10,000)? If so, please let us know how your queries
>> might typically look so we can identify them in the logs.
>>
>> (2) Are you the developer of a tool that launches SPARQL queries? If
>> so, then please let us know if there is any way to identify your
>> queries.
>>
>> If (1) or (2) applies to you, then it would be good if you could
>> include an identifying comment into your SPARQL queries in the
>> future, to make it easier to recognise them. In return, this would
>> enable us to provide you with statistics on the usage of your tool
>> [2].
>>
>> Further feedback is welcome.
>>
>> Cheers,
>>
>> Markus
>>
>>
>> [1]
>> https://meta.wikimedia.org/wiki/Research:Understanding_Wikid
>> ata_Queries
>> > data_Queries>
>>
>> [2] Pending permission by the WMF. Like all Wikimedia usage data,
>> the query logs are under strict privacy protection, so we will need
>> to get clearance before sharing any findings with the public. We
>> hope, however, that there won't be any reservations against
>> publishing non-identifying information.
>>
>> --
>> Prof. Dr. Markus Kroetzsch
>> Knowledge-Based Systems Group
>> Faculty of Computer Science
>> TU Dresden
>> +49 351 463 38486 
>> https://iccl.inf.tu-dresden.de/web/KBS/en
>> 
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org 
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>> 
>>
>>
>>
>>
>> ___
>> Wikidata mailing list
>> Wikidata@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikidata
>>
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org

Re: [Wikidata] SPARQL power users and developers

2016-09-30 Thread Markus Kroetzsch

On 30.09.2016 16:18, Andra Waagmeester wrote:

Would it help if I add the following header to every large batch of queries?

###
# access: (http://query.wikidata.org
or https://query.wikidata.org/bigdata/namespace/wdq/sparql?query={SPARQL} .)
# contact: email, acountname, twittername etc
# bot: True/False
# .
##


This is already more detailed than what I had in mind. Having a way to 
tell apart bots and tools from "organic" queries would already be great. 
We are mainly looking for something that will help us to understand 
sudden peaks of activity. For this, it might be enough to have a short 
signature (a URL could be given, but a tool name with a version would 
also be fine). This is somewhat like the "user agent" field in HTTP.


But you are right that some formatting convention may help further here. 
How about this:


#TOOL:

Then one could look for comments of this form without knowing all the 
tools upfront. Of course, this is just a hint in any case, since one 
could always use the same comment in any manually written query.


Best regards,

Markus



On Fri, Sep 30, 2016 at 4:00 PM, Markus Kroetzsch
>
wrote:

Dear SPARQL users,

We are starting a research project to investigate the use of the
Wikidata SPARQL Query Service, with the goal to gain insights that
may help to improve Wikidata and the query service [1]. Currently,
we are still waiting for all data to become available. Meanwhile, we
would like to ask for your input.

Preliminary analyses show that the use of the SPARQL query service
varies greatly over time, presumably because power users and
software tools are running large numbers of queries. For a
meaningful analysis, we would like to understand such high-impact
biases in the data. We therefore need your help:

(1) Are you a SPARQL power user who sometimes runs large numbers of
queries (over 10,000)? If so, please let us know how your queries
might typically look so we can identify them in the logs.

(2) Are you the developer of a tool that launches SPARQL queries? If
so, then please let us know if there is any way to identify your
queries.

If (1) or (2) applies to you, then it would be good if you could
include an identifying comment into your SPARQL queries in the
future, to make it easier to recognise them. In return, this would
enable us to provide you with statistics on the usage of your tool [2].

Further feedback is welcome.

Cheers,

Markus


[1]
https://meta.wikimedia.org/wiki/Research:Understanding_Wikidata_Queries


[2] Pending permission by the WMF. Like all Wikimedia usage data,
the query logs are under strict privacy protection, so we will need
to get clearance before sharing any findings with the public. We
hope, however, that there won't be any reservations against
publishing non-identifying information.

--
Prof. Dr. Markus Kroetzsch
Knowledge-Based Systems Group
Faculty of Computer Science
TU Dresden
+49 351 463 38486 
https://iccl.inf.tu-dresden.de/web/KBS/en


___
Wikidata mailing list
Wikidata@lists.wikimedia.org 
https://lists.wikimedia.org/mailman/listinfo/wikidata





___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata




___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] SPARQL power users and developers

2016-09-30 Thread Andra Waagmeester
Would it help if I add the following header to every large batch of queries?

###
# access: (http://query.wikidata.org or
https://query.wikidata.org/bigdata/namespace/wdq/sparql?query={SPARQL} .)
# contact: email, acountname, twittername etc
# bot: True/False
# .
##

On Fri, Sep 30, 2016 at 4:00 PM, Markus Kroetzsch <
markus.kroetz...@tu-dresden.de> wrote:

> Dear SPARQL users,
>
> We are starting a research project to investigate the use of the Wikidata
> SPARQL Query Service, with the goal to gain insights that may help to
> improve Wikidata and the query service [1]. Currently, we are still waiting
> for all data to become available. Meanwhile, we would like to ask for your
> input.
>
> Preliminary analyses show that the use of the SPARQL query service varies
> greatly over time, presumably because power users and software tools are
> running large numbers of queries. For a meaningful analysis, we would like
> to understand such high-impact biases in the data. We therefore need your
> help:
>
> (1) Are you a SPARQL power user who sometimes runs large numbers of
> queries (over 10,000)? If so, please let us know how your queries might
> typically look so we can identify them in the logs.
>
> (2) Are you the developer of a tool that launches SPARQL queries? If so,
> then please let us know if there is any way to identify your queries.
>
> If (1) or (2) applies to you, then it would be good if you could include
> an identifying comment into your SPARQL queries in the future, to make it
> easier to recognise them. In return, this would enable us to provide you
> with statistics on the usage of your tool [2].
>
> Further feedback is welcome.
>
> Cheers,
>
> Markus
>
>
> [1] https://meta.wikimedia.org/wiki/Research:Understanding_Wikid
> ata_Queries
>
> [2] Pending permission by the WMF. Like all Wikimedia usage data, the
> query logs are under strict privacy protection, so we will need to get
> clearance before sharing any findings with the public. We hope, however,
> that there won't be any reservations against publishing non-identifying
> information.
>
> --
> Prof. Dr. Markus Kroetzsch
> Knowledge-Based Systems Group
> Faculty of Computer Science
> TU Dresden
> +49 351 463 38486
> https://iccl.inf.tu-dresden.de/web/KBS/en
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


[Wikidata] SPARQL power users and developers

2016-09-30 Thread Markus Kroetzsch

Dear SPARQL users,

We are starting a research project to investigate the use of the 
Wikidata SPARQL Query Service, with the goal to gain insights that may 
help to improve Wikidata and the query service [1]. Currently, we are 
still waiting for all data to become available. Meanwhile, we would like 
to ask for your input.


Preliminary analyses show that the use of the SPARQL query service 
varies greatly over time, presumably because power users and software 
tools are running large numbers of queries. For a meaningful analysis, 
we would like to understand such high-impact biases in the data. We 
therefore need your help:


(1) Are you a SPARQL power user who sometimes runs large numbers of 
queries (over 10,000)? If so, please let us know how your queries might 
typically look so we can identify them in the logs.


(2) Are you the developer of a tool that launches SPARQL queries? If so, 
then please let us know if there is any way to identify your queries.


If (1) or (2) applies to you, then it would be good if you could include 
an identifying comment into your SPARQL queries in the future, to make 
it easier to recognise them. In return, this would enable us to provide 
you with statistics on the usage of your tool [2].


Further feedback is welcome.

Cheers,

Markus


[1] https://meta.wikimedia.org/wiki/Research:Understanding_Wikidata_Queries

[2] Pending permission by the WMF. Like all Wikimedia usage data, the 
query logs are under strict privacy protection, so we will need to get 
clearance before sharing any findings with the public. We hope, however, 
that there won't be any reservations against publishing non-identifying 
information.


--
Prof. Dr. Markus Kroetzsch
Knowledge-Based Systems Group
Faculty of Computer Science
TU Dresden
+49 351 463 38486
https://iccl.inf.tu-dresden.de/web/KBS/en

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata