Re: [Wikidata] Use of Sparql service is going through the roof

2015-11-06 Thread David Lowe
Thanks, all!
I ran each query separately and reassembled the results in a spreadsheet,
so I think I got what I was after. Brand new to SPARQL, so I'll try and
figure out your correct query above.
Thanks again,
d

On Fri, Nov 6, 2015 at 5:40 PM, James Heald  wrote:

> Hi David,
>
> I think the issue with your query was with the line
>
> OPTIONAL {?nat rdfs:label ?nat_label filter (lang(?nat_label) = "en") .}
>
>
> The problem was that if the photographer didn't have a P27, so ?nat wasn't
> bound in the previous OPTIONAL line, then when it gets to the line above,
> with ?nat unbound, it will then be a directive to start binding labels for
> the *entire database* ... which is why it is just as well that Stas turns
> over an egg timer for each query.  :-)
>
> The way around this is to nest the two OPTIONAL clauses, one inside the
> other:
>
>OPTIONAL {?photographer wdt:P27 ?nat .
>   OPTIONAL {?nat rdfs:label ?nat_label filter (lang(?nat_label) =
> "en") .}
>}
>
> This should now run fine.  (Provided you remember to remove the old
> OPTIONAL line).
>
> All best,
>
>James.
>
>
>
>
>
> On 06/11/2015 19:38, David Lowe wrote:
>
>> Might this be affecting our searches? The following query times out very
>> quickly on Chrome, and runs forever in Firefox before crashing the whole
>> browser (or is there a problem with my query?)
>>
>> PREFIX wd: 
>> PREFIX wdt: 
>> PREFIX wikibase: 
>> PREFIX p: 
>> PREFIX v: 
>> PREFIX q: 
>> PREFIX rdfs: 
>>
>> SELECT ?photographer ?photographer_label ?nat ?nat_label ?dob ?dod WHERE {
>> ?photographer wdt:P106 wd:Q33231 .# find items that have
>> "occupation
>> (P106): photographer (Q33231) "
>> OPTIONAL {?photographer wdt:P27 ?nat .}  # with a P19 (place of
>> birth) claim
>> OPTIONAL {?photographer wdt:P569 ?dob .}  # ... where the pob has
>> a
>> Country
>> OPTIONAL {?photographer wdt:P570 ?dod ;} #where the pob has a state
>>
>> OPTIONAL {?photographer rdfs:label ?photographer_label filter
>> (lang(?photographer_label) = "en") .}
>> OPTIONAL {?nat rdfs:label ?nat_label filter (lang(?nat_label) = "en")
>> .}
>> #OPTIONAL {?cob rdfs:label ?cob_label filter (lang(?cob_label) =
>> "en") .}
>> #OPTIONAL {?state rdfs:label ?state_label filter (lang(?state_label) =
>> "en") .}
>>  }
>>
>>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Use of Sparql service is going through the roof

2015-11-06 Thread James Heald

Hi David,

I think the issue with your query was with the line

OPTIONAL {?nat rdfs:label ?nat_label filter (lang(?nat_label) = "en") .}


The problem was that if the photographer didn't have a P27, so ?nat 
wasn't bound in the previous OPTIONAL line, then when it gets to the 
line above, with ?nat unbound, it will then be a directive to start 
binding labels for the *entire database* ... which is why it is just as 
well that Stas turns over an egg timer for each query.  :-)


The way around this is to nest the two OPTIONAL clauses, one inside the 
other:


   OPTIONAL {?photographer wdt:P27 ?nat .
  OPTIONAL {?nat rdfs:label ?nat_label filter (lang(?nat_label) 
= "en") .}

   }

This should now run fine.  (Provided you remember to remove the old 
OPTIONAL line).


All best,

   James.




On 06/11/2015 19:38, David Lowe wrote:

Might this be affecting our searches? The following query times out very
quickly on Chrome, and runs forever in Firefox before crashing the whole
browser (or is there a problem with my query?)

PREFIX wd: 
PREFIX wdt: 
PREFIX wikibase: 
PREFIX p: 
PREFIX v: 
PREFIX q: 
PREFIX rdfs: 

SELECT ?photographer ?photographer_label ?nat ?nat_label ?dob ?dod WHERE {
?photographer wdt:P106 wd:Q33231 .# find items that have "occupation
(P106): photographer (Q33231) "
OPTIONAL {?photographer wdt:P27 ?nat .}  # with a P19 (place of
birth) claim
OPTIONAL {?photographer wdt:P569 ?dob .}  # ... where the pob has a
Country
OPTIONAL {?photographer wdt:P570 ?dod ;} #where the pob has a state

OPTIONAL {?photographer rdfs:label ?photographer_label filter
(lang(?photographer_label) = "en") .}
OPTIONAL {?nat rdfs:label ?nat_label filter (lang(?nat_label) = "en") .}
#OPTIONAL {?cob rdfs:label ?cob_label filter (lang(?cob_label) = "en") .}
#OPTIONAL {?state rdfs:label ?state_label filter (lang(?state_label) =
"en") .}
 }




___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Use of Sparql service is going through the roof

2015-11-06 Thread David Lowe
Might this be affecting our searches? The following query times out very
quickly on Chrome, and runs forever in Firefox before crashing the whole
browser (or is there a problem with my query?)

PREFIX wd: 
PREFIX wdt: 
PREFIX wikibase: 
PREFIX p: 
PREFIX v: 
PREFIX q: 
PREFIX rdfs: 

SELECT ?photographer ?photographer_label ?nat ?nat_label ?dob ?dod WHERE {
   ?photographer wdt:P106 wd:Q33231 .# find items that have "occupation
(P106): photographer (Q33231) "
   OPTIONAL {?photographer wdt:P27 ?nat .}  # with a P19 (place of
birth) claim
   OPTIONAL {?photographer wdt:P569 ?dob .}  # ... where the pob has a
Country
   OPTIONAL {?photographer wdt:P570 ?dod ;} #where the pob has a state

   OPTIONAL {?photographer rdfs:label ?photographer_label filter
(lang(?photographer_label) = "en") .}
   OPTIONAL {?nat rdfs:label ?nat_label filter (lang(?nat_label) = "en") .}
   #OPTIONAL {?cob rdfs:label ?cob_label filter (lang(?cob_label) = "en") .}
   #OPTIONAL {?state rdfs:label ?state_label filter (lang(?state_label) =
"en") .}
}

On Fri, Nov 6, 2015 at 1:27 PM, Neil Harris  wrote:

> On 06/11/15 18:04, Mikhail Popov wrote:
>
>> Hi! We looked at the logs. 21,740,641 requests are coming from a single IP
>> without a user agent that we can't geolocate because it's in the 10 range.
>>
>> Looking into the actual queries revealed that it's probably a broken bot.
>> Stas said "the query makes no sense and is broken" and that it "looks like
>> somebody trying to download whole DB in very weird way but is doing it all
>> wrong."
>>
>> We are investigating the issue.
>>
>> – *Mikhail Popov* // Data Analyst, Discovery
>>
>>
>>
> Michail,
>
> If by "in the 10 range", you mean an IPv4 address of the form 10.x.x.x,
> then it's an RFC1918 address, and more than likely coming from inside your
> own network.
>
> Neil
>
>
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Use of Sparql service is going through the roof

2015-11-06 Thread Tom Morris
Tangential question - is there a similar dashboard for WDQ (no S)?  Or
better yet, one that charts both query services so that they can be
compared?

Tom

On Fri, Nov 6, 2015 at 9:27 AM, James Heald  wrote:

> Does anyone know what's going on with the Sparql service ?
>
> Up until a couple of days ago, the most hits ever in one day was about
> 6000.
>
> But according to
>  http://searchdata.wmflabs.org/wdqs/
>
> two days ago suddenly there were 6.77 *million* requests, and yesterday
> over 21 million.
>
> Does anyone know what sort of requests these are, and whether they are all
> coming from the same place ?
>
>-- James.
>
> ___
> Wikidata mailing list
> Wikidata@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Use of Sparql service is going through the roof

2015-11-06 Thread Neil Harris

On 06/11/15 18:04, Mikhail Popov wrote:

Hi! We looked at the logs. 21,740,641 requests are coming from a single IP
without a user agent that we can't geolocate because it's in the 10 range.

Looking into the actual queries revealed that it's probably a broken bot.
Stas said "the query makes no sense and is broken" and that it "looks like
somebody trying to download whole DB in very weird way but is doing it all
wrong."

We are investigating the issue.

– *Mikhail Popov* // Data Analyst, Discovery




Michail,

If by "in the 10 range", you mean an IPv4 address of the form 10.x.x.x, 
then it's an RFC1918 address, and more than likely coming from inside 
your own network.


Neil


___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Use of Sparql service is going through the roof

2015-11-06 Thread Mikhail Popov
Hi! We looked at the logs. 21,740,641 requests are coming from a single IP
without a user agent that we can't geolocate because it's in the 10 range.

Looking into the actual queries revealed that it's probably a broken bot.
Stas said "the query makes no sense and is broken" and that it "looks like
somebody trying to download whole DB in very weird way but is doing it all
wrong."

We are investigating the issue.

– *Mikhail Popov* // Data Analyst, Discovery
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Use of Sparql service is going through the roof

2015-11-06 Thread Stas Malyshev
Hi!

> Does anyone know what's going on with the Sparql service ?
> 
> Up until a couple of days ago, the most hits ever in one day was about
> 6000.
> 
> But according to
>  http://searchdata.wmflabs.org/wdqs/
> 
> two days ago suddenly there were 6.77 *million* requests, and yesterday
> over 21 million.
> 
> Does anyone know what sort of requests these are, and whether they are
> all coming from the same place ?

Looks like yes, they are coming from the same place, and in that place
seems to be a bot doing something wrong. So if anybody knows whose bot
it is please ask that person to seek advice and guidance (which I would
be glad to provide) on how to make it work properly :)

-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Use of Sparql service is going through the roof

2015-11-06 Thread Stas Malyshev
Hi!

> Might this be affecting our searches? The following query times out very
> quickly on Chrome, and runs forever in Firefox before crashing the whole
> browser (or is there a problem with my query?)

The symptoms you describe seem to suggest you have too many results for
this query and browser gets out of memory. Try this query with LIMIT 10
first and see what happens.

As for the bot activities affecting other users, the effect seems to be
negligible, so if this query is slow, it is slow on its own merits :)


-- 
Stas Malyshev
smalys...@wikimedia.org

___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Use of Sparql service is going through the roof

2015-11-06 Thread Kingsley Idehen
On 11/6/15 9:27 AM, James Heald wrote:
> Does anyone know what's going on with the Sparql service ?
>
> Up until a couple of days ago, the most hits ever in one day was about
> 6000.
>
> But according to
>  http://searchdata.wmflabs.org/wdqs/
>
> two days ago suddenly there were 6.77 *million* requests, and
> yesterday over 21 million.
>
> Does anyone know what sort of requests these are, and whether they are
> all coming from the same place ?
>
>-- James. 

Lookup #SPARQL on Twitter :)

-- 
Regards,

Kingsley Idehen   
Founder & CEO 
OpenLink Software 
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this



smime.p7s
Description: S/MIME Cryptographic Signature
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata


Re: [Wikidata] Use of Sparql service is going through the roof

2015-11-06 Thread Kingsley Idehen
On 11/6/15 1:04 PM, Mikhail Popov wrote:
> Hi! We looked at the logs. 21,740,641 requests are coming from a
> single IP without a user agent that we can't geolocate because it's in
> the 10 range.
>
> Looking into the actual queries revealed that it's probably a broken
> bot. Stas said "the query makes no sense and is broken" and that it
> "looks like somebody trying to download whole DB in very weird way but
> is doing it all wrong."
>
> We are investigating the issue.
>
> – *Mikhail Popov*// Data Analyst, Discovery

That will always happen, folks always want to dump the entire DB.

Takes a while for clarity to arise.

This has been the DBpedia experience for years.

[1]
https://docs.google.com/document/d/12VljKl-yDNBoMGb_FnQWiXDAaZC3VnQHqy-E9iD8Mz4/edit
-- DBpedia Usage Report

-- 
Regards,

Kingsley Idehen   
Founder & CEO 
OpenLink Software 
Company Web: http://www.openlinksw.com
Personal Weblog 1: http://kidehen.blogspot.com
Personal Weblog 2: http://www.openlinksw.com/blog/~kidehen
Twitter Profile: https://twitter.com/kidehen
Google+ Profile: https://plus.google.com/+KingsleyIdehen/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Personal WebID: http://kingsley.idehen.net/dataspace/person/kidehen#this



smime.p7s
Description: S/MIME Cryptographic Signature
___
Wikidata mailing list
Wikidata@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata