dbarratt added a comment.

@Tpt

I've been making a somewhat complex query and I've run into a pretty big performance issue. The query takes about 12-13 seconds to execute. :(

I do have some time to work on this, so I'm really looking for what you think is the best option.

Here's some ideas I have on how to resolve this:

  1. Collect all of the concurrent requests and execute them simultaneously with Guzzle (or the like). This relies on curl_multi_exec() to execute the requests in parallel. The problem with doing this, is that the requests have to be "collected" on each level of the hierarchy and then the result must be put back where it was. I can't think of an easy way to do this with the GraphQL resolvers (unless the library can accept something like a Guzzle promise and do the collection for us?)
  2. Run Wikibase Client on Toolforge and use the Client to access Wikidata's database directly. I suppose the code would be moved into a MediaWiki extension so it could define the routes within MediaWiki. I don't know if this is actually possible to do. It doesn't bring concurrency, but it would speed up the requests substantially (to the point where concurrency is not needed). I'm not sure if Wikidata's repository is available in the replicas (I imagine it is?). This also doesn't fix anything with SPARQL (i.e. if you have multiple SPARQL queries they are not going to run concurrently), although, fetching the entities after the query would at least be quick. I suppose if there's a chance that this could one day operate on Wikidata then this is a good option, otherwise I don't really like it because it requires the software to be run on Toolforge (i.e. I can't just run the GraphQL server on my own server).
  3. Rewrite in _javascript_ (node.js) and (use Apollo Server or the like). This would naturally allow the requests to run concurrently and asynchronously (i.e. a group of requests would be resolved individually rather than as a whole group). As a plus, all of the requests would continue to execute on production. This seems like it might be the most amount of work, but has the biggest amount of benefit. Also, if we wanted to run this as a production service, it could use the API over the local network. Or if someone wants to run it on their own server they'd be able to do that as well.

What do you think? I'm leaning towards Option #3 as it gives the most bang for the buck, but I wanted to make sure you were good with that option before I go rewriting everything. :)


TASK DETAIL
https://phabricator.wikimedia.org/T173214

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dbarratt
Cc: Smalyshev, Lydia_Pintscher, Addshore, larsgw, Saerdnaer, simon04, bearND, Siznax, Tpt, Jonas, Ricordisamoa, hoo, Lucas_Werkmeister_WMDE, Aklapper, dbarratt, PokestarFan, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, Wikidata-bugs, aude, He7d3r, Jdforrester-WMF, Mbch331, Jay8g, Tgr
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to