[Wikidata-bugs] [Maniphest] [Commented On] T112151: Support POST for SPARQL query endpoint

2017-02-22 Thread Smalyshev
Smalyshev added a comment.
POST can be used with any queries, same as GET, though I'm not sure how it would be cached, so I'd recommend using POST only for big queries now, unless you don't have a choice (e.g. the tool only supports POST). I'll update about caching once I know it.TASK DETAILhttps://phabricator.wikimedia.org/T112151EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Ricordisamoa, Arbnos, Jonas, chasemp, JanZerebecki, BBlack, Andrew, Deskana, Joe, gerritbot, nichtich, Jneubert, Karima, Aklapper, Smalyshev, Th3d3v1ls, Ramalepe, Liugev6, EBjune, merbst, Avner, Lewizho99, Maathavan, debt, TerraCodes, Gehel, D3r1ck01, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, fbstj, santhosh, Mbch331, Jay8g, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T112151: Support POST for SPARQL query endpoint

2017-02-22 Thread gerritbot
gerritbot added a comment.
Change 243883 merged by Gehel:
Allow SPARQL endpoint to be queried via POST

https://gerrit.wikimedia.org/r/243883TASK DETAILhttps://phabricator.wikimedia.org/T112151EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, gerritbotCc: Ricordisamoa, Arbnos, Jonas, chasemp, JanZerebecki, BBlack, Andrew, Deskana, Joe, gerritbot, nichtich, Jneubert, Karima, Aklapper, Smalyshev, Th3d3v1ls, Ramalepe, Liugev6, EBjune, merbst, Avner, Lewizho99, Maathavan, debt, Gehel, D3r1ck01, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T112151: Support POST for SPARQL query endpoint

2017-02-14 Thread Smalyshev
Smalyshev added a comment.
Java/Blazegraph seems to have URL length limit around 8K. So we may have to support real POST on Blazegraph side anyway.TASK DETAILhttps://phabricator.wikimedia.org/T112151EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Ricordisamoa, Arbnos, Jonas, chasemp, JanZerebecki, BBlack, Andrew, Deskana, Joe, gerritbot, nichtich, Jneubert, Karima, Aklapper, Smalyshev, Th3d3v1ls, Ramalepe, Liugev6, EBjune, merbst, Avner, Lewizho99, Maathavan, debt, Gehel, D3r1ck01, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T112151: Support POST for SPARQL query endpoint

2017-01-13 Thread Jneubert
Jneubert added a comment.
Examples for the federated query use case see at https://github.com/zbw/sparql-queries/tree/master/wikidata#power-queries (queries with econ_pers or ebds in name). The intermediate set fed into Wikidata was about 450,000 GND IDs. Currently, that requieres setting up a custom Wikidata endpoint.TASK DETAILhttps://phabricator.wikimedia.org/T112151EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Smalyshev, JneubertCc: Jonas, chasemp, JanZerebecki, BBlack, Andrew, Deskana, Joe, gerritbot, nichtich, Jneubert, Karima, Aklapper, Smalyshev, Th3d3v1ls, Ramalepe, Liugev6, EBjune, mschwarzer, Avner, Lewizho99, Maathavan, debt, Gehel, D3r1ck01, FloNight, Xmlizer, Izno, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T112151: Support POST for SPARQL query endpoint

2015-11-19 Thread Smalyshev
Smalyshev added a comment.

> Primarily it's that we can't cache them in Varnish like we should be able to

Caching SPARQL results via Varnish is not a good idea in most cases. For two 
reasons:

1. Usually the same query is not repeated very often (unless somebody has a 
buggy script, etc.). Every client would run its own query and once the result 
is returned, it will go away and another client would use another query.
2. When the query **is** repeated, it is usually because the client expects 
something to change - like getting update from Wikidata. Since we do not have 
any real way to invalidate Varnish when underlying data changes, caching here 
would only hinder the clients.

Also, these results can be pretty big - so we'd be storing tons of information 
which is not useful. Of course, there are exceptions and some scenarios would 
still benefit from caching, but most would not.

> but as mentioned before it's going to affect how multi-DC request routing 
> works as well,

There are no plans of multi-DC setup for WDQS, as far as I know.

> AFAICS there is no general solution to this problem.

We do not need a general solution for the whole HTTP world. We need a 
particular solution for specific practical issue, and we have everything we 
need to actually solve it - except willingness to solve it.

> But still, trying to stick to standards here isn't just a matter of theory 
> trumping user requests.

I think in this particular case it is exactly that, as all objections so far, 
when applied to what happens with WDQS, appear to be only theoretical.

All proposed solutions require creating a whole new middleware layer, and I 
don't think anybody is ready to allocate resources to develop and maintain this 
layer.

> If there's a problem with client tooling, we can submit patches to those 
> client projects

Not all of those are open source projects, and they would not spend resources 
on rewriting their tools just because one particular endpoint (ours) does not 
support POST. Also, who exactly would be spending time on learning these tools, 
developing and submitting those patches?


TASK DETAIL
  https://phabricator.wikimedia.org/T112151

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: chasemp, JanZerebecki, BBlack, Andrew, Deskana, Joe, gerritbot, nichtich, 
Jneubert, Karima, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, 
Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T112151: Support POST for SPARQL query endpoint

2015-11-19 Thread JanZerebecki
JanZerebecki added a comment.

Yes. But: 1) This ticket is only about compatibility with existing clients we 
didn't create and don't control. Those clients are conforming implementations 
of SPARQL 1.1 2) Option 2 and 3 in 
https://phabricator.wikimedia.org/T112151#1818392 are not defined in the SPARQL 
1.1 protocol. See 
http://www.w3.org/TR/2013/REC-sparql11-protocol-20130321/#query-operation . 3) 
That standard explicitly allows POST for read queries. (I'll spare you a long 
rant about W3C standards.)


TASK DETAIL
  https://phabricator.wikimedia.org/T112151

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev, JanZerebecki
Cc: chasemp, JanZerebecki, BBlack, Andrew, Deskana, Joe, gerritbot, nichtich, 
Jneubert, Karima, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, 
Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T112151: Support POST for SPARQL query endpoint

2015-11-19 Thread BBlack
BBlack added a comment.

POST isn't just theoretically imperfect.  Proliferation of POST for what should 
be cacheable, readonly, idempotent queries is a serious long-term problem for 
us.  Primarily it's that we can't cache them in Varnish like we should be able 
to (which absorbs over 90% of all public traffic before things get down to the 
application layers), but as mentioned before it's going to affect how multi-DC 
request routing works as well, the current design of which is built on the idea 
that HTTP methods are used correctly (that we can sticky users to the primary 
writeable DC on POST for updates, and otherwise distribute their load fairly 
when they're doing only GETs and haven't done POST in a while).  Of course we 
already have cases like this though, even in the MediaWiki API.   But still, 
trying to stick to standards here isn't just a matter of theory trumping user 
requests.  It's a matter of saving ourselves future pain when we debug 
performance and availability issues across our entire
infrastructure as whole, including this one of many application services.

AFAICS there is no general solution to this problem.  You could call it a 
deficiency in the HTTP protocol itself.  On the one hand, we have strong 
reasons to prefer that methods are used correctly (i.e. GET used for readonly 
idempotent operations, POST used for non-idempotent operations that modify 
server-side data).  On the other hand, we face a serious restriction in the 
amount of data that can be passed in a GET query which can only be 
realistically worked around with long POST body data.

There's no right answer, but I think answers which make use of GET are the 
better answers here (because they preserve the meaning of the methods).  Most 
of the possible workarounds with GET basically boil down to:

1. Find a way to keep the query strings under 2K (which should be reasonable 
for a whole lot of data models!).  For this you can encode queries better (e.g. 
use short key names instead of long textual ones, etc), you can compress the 
query strings prior to encoding, etc.  Often it's simply a matter of bad data 
models, or very inefficient encoding of said data model into the URI.  For 
cases where the basic entropy of the possible query strings is always going to 
exceed 2K even after efficient coding and compression (which is probably always 
going to be the case for an open-ended generic query language like SPARQL?), 
you can look at the other two options:

2. Making it a two-phase operation: a POST of a large query to "save" the query 
itself for that user/session under some label/index, and then one or more GET 
operations after that which make use of the saved query and actually execute it 
for results.  Persisting these (as opposed to just nailing a POST in front of 
every GET) would be ideal, as is letting users share/reuse common queries where 
it makes sense, etc.

3. Using headers.  Header limits are generally higher than URI limits, and 
usually the limit is tunable server-side, so you can for instance put the bulk 
of the data in a request header like `X-SPARQL: ` (and then still do things 
from (1) to keep the size in check).

The idea of compressing away comments and indents isn't "bad practice" - it 
would in fact be very good practice in this case.  HTTP URIs are not text 
editors or source-code-storage, they're essentially just the wire protocol for 
transmission of a compressible idea at this layer.  If there's a problem with 
client tooling, we can submit patches to those client projects and/or nudge 
them in the direction of these arguments.


TASK DETAIL
  https://phabricator.wikimedia.org/T112151

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev, BBlack
Cc: chasemp, JanZerebecki, BBlack, Andrew, Deskana, Joe, gerritbot, nichtich, 
Jneubert, Karima, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, 
Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T112151: Support POST for SPARQL query endpoint

2015-11-04 Thread Jneubert
Jneubert added a comment.

This is not only a question of the tools in use, but also it also imposes 
length restrictions to queries, which are transmitted in the GET URL.

To cite the Stackoverflow entry mentioned above 
:
 //"Extremely long URLs are usually a mistake. URLs over 2,000 characters will 
not work in the most popular web browsers. Don't use them if you intend your 
site to work for the majority of Internet users."//

Complex queries hit this limit quite easily (for examples from another 
application domain, see 
https://github.com/jneubert/skos-history/tree/master/sparql/stw). And we won't 
enforce bad practices like waiving comments and indents, one-character variable 
names and the like.

Another use case are queries with long VALUES lists as input (e.g. of names or 
codes from another application). Such clauses also can easily exceed any 
reasonable GET URL length.


TASK DETAIL
  https://phabricator.wikimedia.org/T112151

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev, Jneubert
Cc: chasemp, JanZerebecki, BBlack, Andrew, Deskana, Joe, gerritbot, nichtich, 
Jneubert, Karima, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, 
Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T112151: Support POST for SPARQL query endpoint

2015-11-03 Thread Smalyshev
Smalyshev added a comment.

It is not a question of resources. No resources are needed for it, it's already 
done and the only thing missing is permission from Ops to actually run it. It 
costs us literally nothing but willing to do it.

Now we have a choice whether we want to serve our users, which asked us for it, 
however theoretically imperfect the using POST for queries may be, or we can 
reject them telling them their tools are not conforming to our high theoretical 
standards so they can not get access to our data and they can do nothing about 
it (telling them to fix those tools is akin to telling people to fix Windows XP 
- it's just not going to happen, they are users, not creators of those tools). 
I do not see anybody who benefits from such rejection, but since it's not my 
decision, I don't see what else I can do here. The patch is there, if there 
ever will be the decision to use it, it'll still be there. Until then, the 
access to our data for those users will not be possible.


TASK DETAIL
  https://phabricator.wikimedia.org/T112151

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: chasemp, JanZerebecki, BBlack, Andrew, Deskana, Joe, gerritbot, nichtich, 
Jneubert, Karima, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, 
Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T112151: Support POST for SPARQL query endpoint

2015-11-03 Thread chasemp
chasemp added a comment.

In https://phabricator.wikimedia.org/T112151#1743222, @Smalyshev wrote:

> > I am looking for a use case we are trying to solve.
>
>
> The use case is users using SPARQL query tool that only does POST. 
> Unfortunately, such tools exist, and in non-nelgligible numbers
>
> > At first glance, it seems like this means mangling all POST to GET would 
> > result in an incompatibility with the sparql protocol.
>
>
> @chasemp I think you're missing an important point here - lack of SPARQL 
> Update support on query endpoint is **intentional**. That's the whole point 
> of the exercise - how we allow to send POST without allowing SPARQL Update. 
> We do not want to be compatible with SPARQL Update in query endpoint - in 
> fact, we want to explicitly forbid it, since we don't want the whole internet 
> to mess with our database (there's wikidata.org site for that ;) That's why 
> we do not want to allow anybody from outside to POST to Blazegraph. However, 
> we do want to allow tools that use POST to do SPARQL Query to do so. 
> Unfortunately, distinguishing SPARQL Query from SPARQL Update and other 
> update requests may be non-trivial to do from something like nginx, so it is 
> much safer and easier to just never send POST to Blazegraph, thus ensuring we 
> never produce an update.
>
> We could, of course, take a stance that since REST dictates retrieval queries 
> should go through GET, we only support GET. However, this stance sounds to me 
> unpractical and not user-friendly, as most users do not care about the purity 
> of our REST track record (in fact, most of them have only the vaguest idea of 
> REST and their requirements about POST/GET) and just want their SPARQL tool 
> (which they probably use with a dozen of other SPARQL endpoints by now) to 
> work with WDQS endpoint.


Hey @Smalyshev sorry you were ready to roll on this and it got locked up in 
debate.  I appreciate what you are saying here I really do.  In large part it 
comes down to the idea of what is practical as you said.  We have opposite 
ideas in this case.  From my POV here we are trying to hide the complexity for 
poorly written clients on our end and we have few resources to do such things 
over the long term.  We do make many decisions about how far down this gradient 
to travel in the pursuit of sane user expectations, SNI 
 is a 
good example where we are using it though we know a not insignificant amount of 
users are suck in XP land.  I'm not saying this is a 1:1 example.

I do think we are missing each other on the 'use case' bit here.  I believe I 
understand this:

> The use case is users using SPARQL query tool that only does POST. 
> Unfortunately, such tools exist, and in non-nelgligible numbers


From this angle is every poorly written tool a use case?  We don't seem we have 
a backlog of users who are held up trying to use POST to make queries and 
without specific examples to push this narrative along I am viewing this as 
premature complexity.  We know poorly written tools exist for everything.  We 
generally don't try to head them off at the pass.

I did look through a bunch of sparql tools to get an idea and there was at 
least one 
 
default POSTer that I found, but it can be changed.  The majority of tools I 
see do the sane thing:

https://github.com/BorderCloud/SPARQL/blob/master/Curl.php#L244
https://github.com/RDFLib/sparqlwrapper/blob/master/SPARQLWrapper/Wrapper.py#L503

Not that this is a slam dunk argument either way.   Maybe we revisit in the 
near future but at the moment I can't see my way to this making sense to 
support.  If you feel strongly about this bring it up in Scrum of Scrums to get 
more visibility?


TASK DETAIL
  https://phabricator.wikimedia.org/T112151

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev, chasemp
Cc: chasemp, JanZerebecki, BBlack, Andrew, Deskana, Joe, gerritbot, nichtich, 
Jneubert, Karima, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, 
Manybubbles, Mbch331



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T112151: Support POST for SPARQL query endpoint

2015-10-21 Thread Smalyshev
Smalyshev added a comment.

> I am looking for a use case we are trying to solve.


The use case is users using SPARQL query tool that only does POST. 
Unfortunately, such tools exist, and in non-nelgligible numbers

> At first glance, it seems like this means mangling all POST to GET would 
> result in an incompatibility with the sparql protocol.


@chasemp I think you're missing an important point here - lack of SPARQL Update 
support on query endpoint is **intentional**. That's the whole point of the 
exercise - how we allow to send POST without allowing SPARQL Update. We do not 
want to be compatible with SPARQL Update in query endpoint - in fact, we want 
to explicitly forbid it, since we don't want the whole internet to mess with 
our database (there's wikidata.org site for that ;) That's why we do not want 
to allow anybody from outside to POST to Blazegraph. However, we do want to 
allow tools that use POST to do SPARQL Query to do so. Unfortunately, 
distinguishing SPARQL Query from SPARQL Update and other update requests may be 
non-trivial to do from something like nginx, so it is much safer and easier to 
just never send POST to Blazegraph, thus ensuring we never produce an update.

We could, of course, take a stance that since REST dictates retrieval queries 
should go through GET, we only support GET. However, this stance sounds to me 
unpractical and not user-friendly, as most users do not care about the purity 
of our REST track record (in fact, most of them have only the vaguest idea of 
REST and their requirements about POST/GET) and just want their SPARQL tool 
(which they probably use with a dozen of other SPARQL endpoints by now) to work 
with WDQS endpoint.

Answering @BBlack's question, we do not know why these tools only use POST, but 
most importantly, it's completely beyond the point **why** they do it - 
whatever the reasons are, they do it, and we're not going to change that. So we 
can either support them or refuse to support them. I do not see any practical 
reason to refuse support given that we can do it.

As for cookies and multi-DC setup, many non-browser clients would just ignore 
cookies completely anyway. For browser-based clients, I don't think in the 
perceivable timeframe we'd get traffic numbers that would make it any problem. 
The traffic numbers now are low, and if they raise most of the traffic would be 
bots, tools and widgets feeding data from SPARQL, not browser traffic. If it 
ever does become a problem, it should be trivial to make an exception for 
requests coming to query.wikidata.org - they are easily identifiable. Also, we 
will never have all or significant part of the overall traffic produced by 
these tools - these tools are user-friendly frontends, but the bulk of the work 
will be done by automated tools, just like a typical SQL database would serve 
most traffic via programmatic connections, not via Web interface.


TASK DETAIL
  https://phabricator.wikimedia.org/T112151

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: chasemp, JanZerebecki, BBlack, Andrew, Deskana, Joe, gerritbot, nichtich, 
Jneubert, Karima, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, 
Manybubbles



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T112151: Support POST for SPARQL query endpoint

2015-10-21 Thread JanZerebecki
JanZerebecki added a comment.

The first sentence in the task is the use case I looked at while coming up with 
my suggestion. To explain why these tools would use POST I came up with the 
Request-URI length. But another workable explanation is that the tool just 
doesn't bother to differentiate between write (update) and read-only query and 
thus to support both it always uses POST, which is a superset of what works 
with GET.


TASK DETAIL
  https://phabricator.wikimedia.org/T112151

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev, JanZerebecki
Cc: chasemp, JanZerebecki, BBlack, Andrew, Deskana, Joe, gerritbot, nichtich, 
Jneubert, Karima, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, aude, 
Manybubbles



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T112151: Support POST for SPARQL query endpoint

2015-10-21 Thread chasemp
chasemp added a comment.

In https://phabricator.wikimedia.org/T112151#1742627, @JanZerebecki wrote:

> In https://phabricator.wikimedia.org/T112151#1742574, @chasemp wrote:
>
> > At first glance, it seems like this means mangling all POST to GET would 
> > result in an incompatibility with the sparql protocol.
>
>
> I didn't suggest that. I suggested to apply GET semantics to POST requests 
> that indicate this via a part in the query of their Request-URI.


Most of the discussion in this ticket is surrounding 
https://gerrit.wikimedia.org/r/#/c/243883/

> 

> 

> > If we are saying we don't want to leave clients (which follow a somewhat 
> > odd spec to my mind) in the dust I don't think we are accomplishing that, 
> > and as Brandon outlined POST in the Wikimedia multi-dc world has specific 
> > consequences.

> 

> 

> Would you describe how my suggestion doesn't accomplish that?


I wasn't meaning to respond to the outline you provided necessarily, but is 
anyone being limited by this now?  Are users submitting queries from IE at the 
moment that are limited in this way?

I am looking for a use case we are trying to solve.  The sort of 'we are trying 
to do X now and can't because of this' that we can revolve the discussion 
around.


TASK DETAIL
  https://phabricator.wikimedia.org/T112151

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev, chasemp
Cc: chasemp, JanZerebecki, BBlack, Andrew, Deskana, Joe, gerritbot, nichtich, 
Jneubert, Karima, Aklapper, Smalyshev, JGirault, jkroll, Wikidata-bugs, 
Jdouglas, aude, Manybubbles



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T112151: Support POST for SPARQL query endpoint

2015-10-21 Thread JanZerebecki
JanZerebecki added a comment.

In https://phabricator.wikimedia.org/T112151#1742574, @chasemp wrote:

> At first glance, it seems like this means mangling all POST to GET would 
> result in an incompatibility with the sparql protocol.


I didn't suggest that. I suggested to apply GET semantics to POST requests that 
indicate this via a part in the query of their Request-URI.

> If we are saying we don't want to leave clients (which follow a somewhat odd 
> spec to my mind) in the dust I don't think we are accomplishing that, and as 
> Brandon outlined POST in the Wikimedia multi-dc world has specific 
> consequences.


Would you describe how my suggestion doesn't accomplish that?


TASK DETAIL
  https://phabricator.wikimedia.org/T112151

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev, JanZerebecki
Cc: chasemp, JanZerebecki, BBlack, Andrew, Deskana, Joe, gerritbot, nichtich, 
Jneubert, Karima, Aklapper, Smalyshev, JGirault, jkroll, Wikidata-bugs, 
Jdouglas, aude, Manybubbles



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T112151: Support POST for SPARQL query endpoint

2015-10-21 Thread chasemp
chasemp added a subscriber: chasemp.
chasemp added a comment.

Yesterday in IRC it seemed like there was consensus we don't have any actual 
(quantifiable) concerns regarding query length at the moment.

  SMalyshev: chasemp: I'm not worried too much about query length

The 2008  
guidelines seem explicit that GET is the mechanism of choice up to the point 
that length for GET isn't viable.  The 2013 
 guidelines still indicate the 
use of POST query but also outline that it isn't the only operation for which 
POST is necessary :

  This protocol is a companion to the use of both SPARQL Update and SPARQL 
Query over the SPARQL  
  protocol via HTTP POST. Both protocols specify different operations performed 
via the HTTP POST method.

At first glance, it seems like this means mangling all POST to GET would result 
in an incompatibility with the sparql protocol.

If we are saying we don't want to leave clients (which follow a somewhat odd 
spec to my mind) in the dust I don't think we are accomplishing that, and as 
Brandon outlined POST in the Wikimedia multi-dc world has specific consequences.

Besides knowing that at some point length is a factor do we have a 
demonstratable need for this functionality (which will be very nuanced in 
implementation)?


TASK DETAIL
  https://phabricator.wikimedia.org/T112151

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev, chasemp
Cc: chasemp, JanZerebecki, BBlack, Andrew, Deskana, Joe, gerritbot, nichtich, 
Jneubert, Karima, Aklapper, Smalyshev, JGirault, jkroll, Wikidata-bugs, 
Jdouglas, aude, Manybubbles



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T112151: Support POST for SPARQL query endpoint

2015-10-21 Thread JanZerebecki
JanZerebecki added a comment.

Short: Because IE is broken.
Long: The problem here is that while 
https://tools.ietf.org/html/rfc2616#section-3.2.1 says "The HTTP protocol does 
not place any a priori limit on the length of a URI." this reality doesn't 
conform. On the server side our caching layer is AFAIK even in violation of 
"SHOULD be able to handle URIs of unbounded length if they provide GET-based 
forms that could generate such URIs". It probably caps at around 16k. In 
reality we can probably live with that as the limit on queries for the public 
queries.wikidata.org. But while we might manage to fix all the SPARQL tools, we 
can't fix all the browsers. Which leaves us with: "Servers ought to be cautious 
about depending on URI lengths above 255 bytes".

To best serve the listed requirements we probably need to have generic handling 
for this in the layer that differentiates between POST and GET in the described 
multi data center way. It would e.g. know by a certain query argument in the 
Request-URI that this POST request has GET semantics with the actual resource 
specified by the Request-URI together with the request body. Should we then 
transform the request to a GET request or handle this special case all the way 
in the stack? Any better ideas?


TASK DETAIL
  https://phabricator.wikimedia.org/T112151

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev, JanZerebecki
Cc: JanZerebecki, BBlack, Andrew, Deskana, Joe, gerritbot, nichtich, Jneubert, 
Karima, Aklapper, Smalyshev, JGirault, jkroll, Wikidata-bugs, Jdouglas, aude, 
Manybubbles



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T112151: Support POST for SPARQL query endpoint

2015-10-21 Thread BBlack
BBlack added a comment.

In https://phabricator.wikimedia.org/T112151#1739918, @Smalyshev wrote:

> > As Andrew said above, why not support this directly in WQDS if you have to 
> > support it at all?
>
>
> Because in Blazegraph, allowing POST means allowing write requests. This is 
> not good for security.


So even in blazegraph, GET and POST have standard semantic meanings... why is 
it that clients don't honor this?

> > even though most (all?) SPARQL traffic is readonly and probably can be 
> > cached

> 

> 

> I don't think it is a good idea to cache SPARQL queries. They are big, they 
> are rarely repeated as-is, and if they are repeated this is usually because 
> the client expects new result. At least until we have some setup that 
> repeatedly requests same data over SPARQL, I don't see much point in caching 
> SPARQL responses.


It would be wiser to give them some (perhaps minimal) cacheability via 
Cache-Control, if nothing else as a buffer against simplistic DoS attacks that 
spam the same query at high rates...

> > This also conflicts with our overall strategy for multi-datacenter work,

> 

> 

> Given that there is no multi-datacenter setup for wdqs, I'm not sure how it 
> is relevant. Most SPARQL clients for which it is relevant wouldn't probably 
> have cookie storage mechanisms anyway.


There is eventually multi-datacenter for **everything** in the long term, so 
yeah that includes wdqs and is very relevant.  Every service is going to have 
to account for it in the long run with how state is managed, and one of our 
mechanisms for balancing traffic and avoiding state replication lag is the idea 
that while a normal (no special cookie) GET request is balanced between the DCs 
based on geography and load like normal, but POST requests are always directed 
to the primary DC only.  Additionally, once a POST request is seen, it sets a 
short-duration session cookie that maps all of that client's GET requests to 
the primary DC only as well (so that they don't suffer lag effects in reading 
the results of their own modifications).  If a read-(only|mostly) service uses 
POST for all of its traffic due to client deficiency, all of that traffic will 
be stuck on the primary DC only and not benefit from read load balancing.


TASK DETAIL
  https://phabricator.wikimedia.org/T112151

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev, BBlack
Cc: JanZerebecki, BBlack, Andrew, Deskana, Joe, gerritbot, nichtich, Jneubert, 
Karima, Aklapper, Smalyshev, JGirault, jkroll, Wikidata-bugs, Jdouglas, aude, 
Manybubbles



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T112151: Support POST for SPARQL query endpoint

2015-10-20 Thread Smalyshev
Smalyshev added a comment.

> As Andrew said above, why not support this directly in WQDS if you have to 
> support it at all?


Because in Blazegraph, allowing POST means allowing write requests. This is not 
good for security.

> let the POSTs come through the rest of the stack unmolested, and deal with it 
> inside WQDS


This would mean building some middleware layer to filter read requests from 
write requests, which would create both a lot on unnecessary work and a 
security issue if the filtering is not perfect. Not allowing POST on Blazegraph 
side automatically cuts off all modification requests since GET processing 
engine in Blazegraph only supports read queries. POST processing supports both, 
so we'd have to work much harder to achieve same level of security.

> even though most (all?) SPARQL traffic is readonly and probably can be cached


I don't think it is a good idea to cache SPARQL queries. They are big, they are 
rarely repeated as-is, and if they are repeated this is usually because the 
client expects new result. At least until we have some setup that repeatedly 
requests same data over SPARQL, I don't see much point in caching SPARQL 
responses.

> This also conflicts with our overall strategy for multi-datacenter work,


Given that there is no multi-datacenter setup for wdqs, I'm not sure how it is 
relevant. Most SPARQL clients for which it is relevant wouldn't probably have 
cookie storage mechanisms anyway.


TASK DETAIL
  https://phabricator.wikimedia.org/T112151

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: BBlack, Andrew, Deskana, Joe, gerritbot, nichtich, Jneubert, Karima, 
Aklapper, Smalyshev, JGirault, jkroll, Wikidata-bugs, Jdouglas, aude, 
Manybubbles, JanZerebecki



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T112151: Support POST for SPARQL query endpoint

2015-10-20 Thread BBlack
BBlack added a subscriber: BBlack.
BBlack added a comment.

I'm not a fan of this on a few levels:

1. As Andrew said above, why not support this directly in WQDS if you have to 
support it at all? as in, let the POSTs come through the rest of the stack 
unmolested, and deal with it inside WQDS, instead of a dubious POST-to-GET 
conversion in the midst of our inbound traffic stack.
2. We don't want more app-specific hacks in nginx|varnish|apache config than 
necessary, and this seems unnecessary.
3. Regardless of which layer handles the issue: It breaks the semantics of GET 
as readonly and POST as a potential write-op.  POSTs as a rule as not cacheable 
responses, even though most (all?) SPARQL traffic is readonly and probably can 
be cached with at least some minimal TTL.  This also conflicts with our overall 
strategy for multi-datacenter work, which involves sticky-routing via cookies 
to the primary DC at the applayer for POST requests, but allowing normal 
balancing of GET/read requests.


TASK DETAIL
  https://phabricator.wikimedia.org/T112151

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev, BBlack
Cc: BBlack, Andrew, Deskana, Joe, gerritbot, nichtich, Jneubert, Karima, 
Aklapper, Smalyshev, JGirault, jkroll, Wikidata-bugs, Jdouglas, aude, 
Manybubbles, JanZerebecki



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T112151: Support POST for SPARQL query endpoint

2015-10-20 Thread Andrew
Andrew added a subscriber: Andrew.
Andrew added a comment.

WDQS is in internal tool, so if its API doesn't conform to clients' 
expectations then this is a bug in WDQS, yes?  Would it not be better to fix 
that rather than adding mysterious middleware obfuscation?


TASK DETAIL
  https://phabricator.wikimedia.org/T112151

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev, Andrew
Cc: Andrew, Deskana, Joe, gerritbot, nichtich, Jneubert, Karima, Aklapper, 
Smalyshev, JGirault, jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, 
JanZerebecki



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T112151: Support POST for SPARQL query endpoint

2015-10-08 Thread Smalyshev
Smalyshev added a comment.

I've tested it in labs and it seems OK, would like somebody from ops to review 
if it's OK for production.


TASK DETAIL
  https://phabricator.wikimedia.org/T112151

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: Deskana, Joe, gerritbot, nichtich, Jneubert, Karima, Aklapper, Smalyshev, 
jkroll, Wikidata-bugs, Jdouglas, aude, Manybubbles, JanZerebecki



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T112151: Support POST for SPARQL query endpoint

2015-10-07 Thread Karima
Karima added a comment.

You can test your endpoint with this tool (View->Show endpoint bar and write 
the url of your endpoint).

http://openuplabs.tso.co.uk/demos/sparqleditor

Try to respect the standard. (there is again a problem with the content 
negociation)
http://www.w3.org/TR/sparql11-protocol/#query-operation


TASK DETAIL
  https://phabricator.wikimedia.org/T112151

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev, Karima
Cc: gerritbot, nichtich, Jneubert, Karima, Aklapper, Smalyshev, jkroll, 
Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, JanZerebecki



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T112151: Support POST for SPARQL query endpoint

2015-10-05 Thread gerritbot
gerritbot added a subscriber: gerritbot.
gerritbot added a comment.

Change 243883 had a related patch set uploaded (by Smalyshev):
Allow SPARQL endpoint to be queries via POST

https://gerrit.wikimedia.org/r/243883


TASK DETAIL
  https://phabricator.wikimedia.org/T112151

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev, gerritbot
Cc: gerritbot, nichtich, Jneubert, Karima, Aklapper, Smalyshev, jkroll, 
Wikidata-bugs, Jdouglas, aude, Deskana, Manybubbles, JanZerebecki



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T112151: Support POST for SPARQL query endpoint

2015-09-23 Thread Jneubert
Jneubert added a subscriber: Jneubert.
Jneubert added a comment.

Another argument for enabling POST requests is, that GET requests / URLs are 
limited in length. The exact limit depends (see 
http://stackoverflow.com/questions/417142/what-is-the-maximum-length-of-a-url-in-different-browsers),
 but often seems to be about 2000 characters. I hit this in the past in several 
real queries.


TASK DETAIL
  https://phabricator.wikimedia.org/T112151

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev, Jneubert
Cc: Jneubert, Karima, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, 
aude, Deskana, Manybubbles, JanZerebecki



___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs