[Wikidata-bugs] [Maniphest] [Commented On] T178492: Create a more controlled WDQS cluster

2018-04-30 Thread Smalyshev
Smalyshev added a comment.
New cluster is up and serving traffic.TASK DETAILhttps://phabricator.wikimedia.org/T178492EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Daniel_Mietchen, BBlack, Lydia_Pintscher, Volans, mobrovac, Jonas, Aklapper, debt, Smalyshev, Gehel, Lahi, PDrouin-WMF, Gq86, E1presidente, Ramsey-WMF, Cparle, Darkminds3113, SandraF_WMF, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, Tramullas, Acer, merbst, LawExplorer, Avner, FloNight, Xmlizer, Susannaanas, Eevans, Aschroet, Jane023, jkroll, Hardikj, Wikidata-bugs, Jdouglas, PKM, Base, matthiasmullie, aude, Tobias1984, Manybubbles, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T178492: Create a more controlled WDQS cluster

2017-11-13 Thread Gehel
Gehel added a comment.

In T178492#3754832, @Lydia_Pintscher wrote:
I believe people will also want to run arbitrary queries. However I guess the majority coming from users who are searching files on the Commons website will be fairly controlled.


So we want users to be able to run arbitrary search on commons, but not arbitrary SPARQL queries, right (at least in the context of the structured data on commons project, we of course still want users to be able to run arbitrary SPARQL queries on the current query.wikidata.org)? Feel free to ping me for a chat (irc, hangouts, other, ...) to clarify that more if you think that's needed.TASK DETAILhttps://phabricator.wikimedia.org/T178492EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: BBlack, Lydia_Pintscher, Volans, mobrovac, Jonas, Aklapper, debt, Smalyshev, Gehel, Lahi, PDrouin-WMF, Gq86, E1presidente, Ramsey-WMF, SandraF_WMF, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, Acer, merbst, Avner, FloNight, Xmlizer, Susannaanas, Eevans, Jane023, jkroll, Hardikj, Wikidata-bugs, Jdouglas, PKM, Base, matthiasmullie, aude, Tobias1984, Manybubbles, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T178492: Create a more controlled WDQS cluster

2017-11-13 Thread Lydia_Pintscher
Lydia_Pintscher added a comment.
I believe people will also want to run arbitrary queries. However I guess the majority coming from users who are searching files on the Commons website will be fairly controlled.TASK DETAILhttps://phabricator.wikimedia.org/T178492EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Lydia_PintscherCc: Lydia_Pintscher, Volans, mobrovac, Jonas, Aklapper, debt, Smalyshev, Gehel, Lahi, PDrouin-WMF, Gq86, E1presidente, Ramsey-WMF, SandraF_WMF, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, Acer, merbst, Avner, FloNight, Xmlizer, Susannaanas, Eevans, Jane023, jkroll, Hardikj, Wikidata-bugs, Jdouglas, PKM, Base, matthiasmullie, aude, Tobias1984, Manybubbles, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T178492: Create a more controlled WDQS cluster

2017-11-13 Thread Gehel
Gehel added a comment.

In T178492#3754772, @Lydia_Pintscher wrote:

In T178492#3725811, @Gehel wrote:

I have heard WDQS mentioned in the context of structured data on commons, but I have no idea how it is going to be used. If anyone has a pointer to some documentation / discussion, please let me know.



We will have a lot of structured data on Commons in the same way/format as it currently exists on Wikidata. We will want to provide people with the ability to search/query for media files on Commons based on the statements on these files. People will also want to combine the data on Commons and Wikidata in queries. To make this all work we'll want to load that data into WDQS as well (or set up a separate query service but iirc Stas said no and I can see the benefits of not doing it.) I am not sure we are much farther in planning than this.


As I understand, this will allow users to run queries in fairly controlled way, not to run arbitrary SPARQL queries. This means that we can probably fairly easily ensure that those queries have a reasonable cost and don't endanger the "controlled" WDQS cluster. As I understand it, this pattern is similar to what we do with any MySQL backed service, we don't allow random SQL, but we take input from users and inject it in known queries.

I'm not too worried about the additional data at this point (we have more head room on storage space than we have on computational resources).TASK DETAILhttps://phabricator.wikimedia.org/T178492EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: Lydia_Pintscher, Volans, mobrovac, Jonas, Aklapper, debt, Smalyshev, Gehel, Lahi, PDrouin-WMF, Gq86, E1presidente, Ramsey-WMF, SandraF_WMF, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, Acer, merbst, Avner, FloNight, Xmlizer, Susannaanas, Eevans, Jane023, jkroll, Hardikj, Wikidata-bugs, Jdouglas, PKM, Base, matthiasmullie, aude, Tobias1984, Manybubbles, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T178492: Create a more controlled WDQS cluster

2017-11-13 Thread Lydia_Pintscher
Lydia_Pintscher added a comment.

In T178492#3725811, @Gehel wrote:
Random thoughts gathered in multiple discussion (note that this is very much a brainstorming, nothing written below is an actual decision of a path to follow):


I have heard WDQS mentioned in the context of structured data on commons, but I have no idea how it is going to be used. If anyone has a pointer to some documentation / discussion, please let me know.



We will have a lot of structured data on Commons in the same way/format as it currently exists on Wikidata. We will want to provide people with the ability to search/query for media files on Commons based on the statements on these files. People will also want to combine the data on Commons and Wikidata in queries. To make this all work we'll want to load that data into WDQS as well (or set up a separate query service but iirc Stas said no and I can see the benefits of not doing it.) I am not sure we are much farther in planning than this.


Since we have 2 active / active WDQS clusters (eqiad / codfw), we could use one of them to serve internal traffic and one as external endpoint. This defeats the purpose of having a backup datacenter, so that's not a long term solution.
The hard to support use case is the external endpoint, since it is subject to uncontrolled load. Do we want to continue supporting this use case (I very much think we should continue to support it, but it isn't my place to make that decision, that question should be asked, and answered).


Yes that is definitely a use-case we want to continue to support.


Since each WDQS node is independent and does its own updates by querying wikidata, increasing the number of nodes increases the load on wikidata. Is that an issue? Is there a way to better share the resources required for updates?


TASK DETAILhttps://phabricator.wikimedia.org/T178492EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Lydia_PintscherCc: Lydia_Pintscher, Volans, mobrovac, Jonas, Aklapper, debt, Smalyshev, Gehel, Lahi, PDrouin-WMF, Gq86, E1presidente, Ramsey-WMF, SandraF_WMF, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, Acer, merbst, Avner, FloNight, Xmlizer, Susannaanas, Eevans, Jane023, jkroll, Hardikj, Wikidata-bugs, Jdouglas, PKM, Base, matthiasmullie, aude, Tobias1984, Manybubbles, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T178492: Create a more controlled WDQS cluster

2017-11-06 Thread Smalyshev
Smalyshev added a comment.
I think for now limiting it by IP should probably work? I think IP ranges from production hosts, labs and outside are segregated?TASK DETAILhttps://phabricator.wikimedia.org/T178492EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: SmalyshevCc: Volans, mobrovac, Jonas, Aklapper, debt, Smalyshev, Gehel, Lahi, Gq86, E1presidente, Ramsey-WMF, SandraF_WMF, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, Acer, merbst, Avner, FloNight, Xmlizer, Susannaanas, Eevans, Jane023, jkroll, Hardikj, Wikidata-bugs, Jdouglas, PKM, Base, matthiasmullie, aude, Tobias1984, Manybubbles, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T178492: Create a more controlled WDQS cluster

2017-11-06 Thread Gehel
Gehel added a comment.
It might make sense to add whitelisting / authentication / authorization on the "controlled" wdqs cluster to ensure the usage is under control. This might or might not be required. If we go that way, we might want to address this as a more general problem, which might make sense to address in a more general way.TASK DETAILhttps://phabricator.wikimedia.org/T178492EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: Volans, mobrovac, Jonas, Aklapper, debt, Smalyshev, Gehel, Lahi, Gq86, E1presidente, Ramsey-WMF, SandraF_WMF, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, Acer, merbst, Avner, FloNight, Xmlizer, Susannaanas, Eevans, Jane023, jkroll, Hardikj, Wikidata-bugs, Jdouglas, PKM, Base, matthiasmullie, aude, Tobias1984, Manybubbles, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T178492: Create a more controlled WDQS cluster

2017-11-01 Thread Gehel
Gehel added a comment.

In T178492#3726074, @mobrovac wrote:
Perhaps a viable solution for the short- to mid-term is to have two LVS end points in each DC: one that serves external traffic, and another one internal, much like we have api.svc.{site}.wmnet and api-async.svc.{site}.wmnet for the MW API, respectively,


But if those 2 LVS endpoints are served by the same cluster, we've not achieved anything in term of isolation...TASK DETAILhttps://phabricator.wikimedia.org/T178492EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: Volans, mobrovac, Jonas, Aklapper, debt, Smalyshev, Gehel, Lahi, Gq86, E1presidente, Ramsey-WMF, SandraF_WMF, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, Acer, merbst, Avner, FloNight, Xmlizer, Susannaanas, Eevans, Jane023, jkroll, Hardikj, Wikidata-bugs, Jdouglas, PKM, Base, matthiasmullie, aude, Tobias1984, Manybubbles, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T178492: Create a more controlled WDQS cluster

2017-11-01 Thread mobrovac
mobrovac added a comment.

In T178492#3725811, @Gehel wrote:

Since we have 2 active / active WDQS clusters (eqiad / codfw), we could use one of them to serve internal traffic and one as external endpoint. This defeats the purpose of having a backup datacenter, so that's not a long term solution.



Perhaps a viable solution for the short- to mid-term is to have two LVS end points in each DC: one that serves external traffic, and another one internal, much like we have api.svc.{site}.wmnet and api-async.svc.{site}.wmnet for the MW API, respectively,TASK DETAILhttps://phabricator.wikimedia.org/T178492EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: mobrovacCc: Volans, mobrovac, Jonas, Aklapper, debt, Smalyshev, Gehel, Lahi, Gq86, E1presidente, Ramsey-WMF, SandraF_WMF, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, Acer, merbst, Avner, FloNight, Xmlizer, Susannaanas, Eevans, Jane023, jkroll, Hardikj, Wikidata-bugs, Jdouglas, PKM, Base, matthiasmullie, aude, Tobias1984, Manybubbles, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T178492: Create a more controlled WDQS cluster

2017-11-01 Thread Gehel
Gehel added a comment.
Random thoughts gathered in multiple discussion (note that this is very much a brainstorming, nothing written below is an actual decision of a path to follow):


I have heard WDQS mentioned in the context of structured data on commons, but I have no idea how it is going to be used. If anyone has a pointer to some documentation / discussion, please let me know.
Since we have 2 active / active WDQS clusters (eqiad / codfw), we could use one of them to serve internal traffic and one as external endpoint. This defeats the purpose of having a backup datacenter, so that's not a long term solution.
The hard to support use case is the external endpoint, since it is subject to uncontrolled load. Do we want to continue supporting this use case (I very much think we should continue to support it, but it isn't my place to make that decision, that question should be asked, and answered).
TASK DETAILhttps://phabricator.wikimedia.org/T178492EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: GehelCc: Volans, mobrovac, Jonas, Aklapper, debt, Smalyshev, Gehel, Lahi, Gq86, E1presidente, Ramsey-WMF, SandraF_WMF, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, Acer, merbst, Avner, FloNight, Xmlizer, Susannaanas, Eevans, Jane023, jkroll, Hardikj, Wikidata-bugs, Jdouglas, PKM, Base, matthiasmullie, aude, Tobias1984, Manybubbles, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T178492: Create a more controlled WDQS cluster

2017-10-18 Thread Jonas
Jonas added a comment.
My understanding is that the structured data on commons project will rely on WDQS for some of its functionalities. If we want that project to be stable, we need to address the WDQS stability issues.

In what form does it rely on it?
What I can say is that #wikibase-quality-constraints depends on WDQS and also on short response times.
Could we maybe create an official list?TASK DETAILhttps://phabricator.wikimedia.org/T178492EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: JonasCc: Jonas, Aklapper, debt, Smalyshev, Gehel, Gq86, E1presidente, Ramsey-WMF, Jmmuguerza, SandraF_WMF, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, Acer, merbst, Avner, FloNight, Xmlizer, Susannaanas, jkroll, Wikidata-bugs, Jdouglas, PKM, Base, matthiasmullie, aude, Tobias1984, Manybubbles, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs