[Wikidata-bugs] [Maniphest] [Commented On] T177275: Add ordinal variable to MWAPI service calls

2017-11-05 Thread gerritbot
gerritbot added a comment.
Change 388287 merged by jenkins-bot:
[wikidata/query/rdf@master] Add option to fetch ordinal of the result in MWAPI query:

https://gerrit.wikimedia.org/r/388287TASK DETAILhttps://phabricator.wikimedia.org/T177275EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: gerritbotCc: gerritbot, Eloquence, Aklapper, Smalyshev, Lahi, Lordiis, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, merbst, Avner, Lewizho99, Maathavan, debt, Gehel, Jonas, FloNight, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T114904: Migrate wb_items_per_site to using prefixed entity IDs instead of numeric IDs

2017-11-05 Thread hoo
hoo added a comment.
Note: I just also found T179793: Consider dropping the "wb_items_per_site.wb_ips_site_page" index while looking at this… maybe this can be done at once?!TASK DETAILhttps://phabricator.wikimedia.org/T114904EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hooCc: WMDE-leszek, Ladsgroup, Multichill, Sjoerddebruin, Lydia_Pintscher, Pasleim, Ricordisamoa, hoo, daniel, Aklapper, Lahi, GoranSMilovanovic, QZanden, Marostegui, Minhnv-2809, Luke081515, Wikidata-bugs, aude, Mbch331, Jay8g, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Created] T179793: Consider dropping the "wb_items_per_site.wb_ips_site_page" index

2017-11-05 Thread hoo
hoo created this task.hoo added projects: Wikidata, MediaWiki-extensions-WikibaseRepository, DBA.Herald added a subscriber: Aklapper.
TASK DESCRIPTIONFrom db1070:

KEY `wb_ips_site_page` (`ips_site_page`),

This is useful for queries where we want to find a given linked page by title (like "Berlin"), but don't know the site id (like "enwiki"). We don't do these kinds of queries within the software and I can barely imagine a purpose for this.

The only way to use this via Wikibase is SiteLinkLookup::getLinks which allows for such queries to be crafted (but no one does this currently). The method states:

Note: if the conditions are not very selective the result set can be very big.
	 * Thus the caller is responsible for not executing too expensive queries in its context.

If we want, we could make the implementations of getLinks throw if $siteIds is not set, but I'm not sure that's even needed here (as per the above comment).TASK DETAILhttps://phabricator.wikimedia.org/T179793EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hooCc: Aklapper, Ladsgroup, Multichill, daniel, hoo, Lahi, GoranSMilovanovic, QZanden, Marostegui, Minhnv-2809, Luke081515, Wikidata-bugs, aude, Mbch331, Jay8g, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T114904: Migrate wb_items_per_site to using prefixed entity IDs instead of numeric IDs

2017-11-05 Thread hoo
hoo added a comment.
Giving the size of the table, changing this shouldn't be overly horrible. It's a fair bit of migration work… but I assume doing this for maintenance queries and consistency is worth it.TASK DETAILhttps://phabricator.wikimedia.org/T114904EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hooCc: WMDE-leszek, Ladsgroup, Multichill, Sjoerddebruin, Lydia_Pintscher, Pasleim, Ricordisamoa, hoo, daniel, Aklapper, Lahi, GoranSMilovanovic, QZanden, Marostegui, Minhnv-2809, Luke081515, Wikidata-bugs, aude, Mbch331, Jay8g, Krenair___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Created] T179792: Watching more pages in Wikidata

2017-11-05 Thread ChristianKl
ChristianKl created this task.ChristianKl added a project: Wikidata.Herald added a subscriber: Aklapper.
TASK DESCRIPTIONOne of Wikidata core problems at the moment is that it doesn't have enough review of edits and thus vandalism doesn't often takes a while to get reverted. 
This is partly because information is spread over more different pages than on Wikipedia.

If you imagine that you have 10 Wikidata items that got each edited by 3 people for a total of 10 edits each on the one hand and one Wikipedia article with 100 edits by 30 people, the amount of edits that both pages have is the same.
The Wikipedia article is on the watchlist of many people and on the other hand the Wikidata items are only watched by three people.

What could be done to change this?
Currently, pages are put on a watchlist if new statements get created however only the page on which the statement is made is added to the watchlist.

We need more event where pages get added to watchlists:


Adding properties to the watchlist when they are used
Add the linked property to the watchlist


That means when I add the statement "hand" "anatomical location" "free upper limb" I add all the pages to my watchlist plus the relevant talk pages. 
At the time of this writing the talk page for "anatomical location" has a "Number of page watchers who visited recent edits" of 4. That means that it's not possible to have a discussion about how the property should be used with other people who use the property. 
When we automatically add the property when it gets used to the watchlist this suddenly means it's possible to have a discussion about how "anatomical location" should be used that gets seen by the people who use the property. That's valuable for standardizing it's usage. It will get new users sooner into contact with discussions with other users.

The second issue that gets solved is the ability to clean up vandalism to the labels of highly used properties and items fast. As Wikidata gets used more it becomes also more unacceptable that it takes hours to revert vandalism on the item of a country. 
This change would increase the watchers of such highly used pages enough that the vandalism reverting will get fast.

Does this create a watchlist overload? There should be the possibility on "Preferences/Watchlist" to turn off this feature. Having the option is in line with the current ability to manage which pages get added to the watchlist. This possibility will allow high activity users to solve the problem for them.

On the other hand, this feature might increase the need for the ability to filter the watchlist by language. If you have 5,000 people following the item Germany and you have 20 seldomly used languages who add labels to it, it's likely not efficient that 5,000 people who can't read the language see the item. The work required for filtering seems doable and not directly urgent.TASK DETAILhttps://phabricator.wikimedia.org/T179792EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ChristianKlCc: Aklapper, ChristianKl, Lahi, GoranSMilovanovic, QZanden, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T170779: Wikidata search suggestions do not display on screen if character whose decomposition contains nukta is present in search query

2017-11-05 Thread hoo
hoo added a comment.

In T170779#3734809, @Smalyshev wrote:
@Snaterlicious, @hoo, @thiemowmde Do you know why the check is there and what it meant to be doing? @tstarling raised the following concern:

The search term is normalized by the server using $wgContLang->normalize(), which potentially includes transformations beyond NFC, especially if the content language is Arabic or Malayalam. So even if you do client-side NFC using the same version of Unicode as the server, there is at least a hypothetical possibility of a hang.


Replied on gerrit.TASK DETAILhttps://phabricator.wikimedia.org/T170779EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: hooCc: tstarling, Snaterlicious, gerritbot, hoo, Smalyshev, debt, Liuxinyu970226, TJones, daniel, thiemowmde, Aftabuzzaman, Mahir256, Aklapper, Lahi, Lordiis, GoranSMilovanovic, Adik2382, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, Lewizho99, Maathavan, Jdrewniak, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T175230: Wikidata identifier links don't respect nofollow configuration

2017-11-05 Thread ChristianKl
ChristianKl added a comment.
Quora links to us with do-follow and we link to them with do-follow. In this case, I don't see a problem.TASK DETAILhttps://phabricator.wikimedia.org/T175230EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ChristianKlCc: ChristianKl, Nemo_bis, Aklapper, Lahi, GoranSMilovanovic, QZanden, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179681: Add HDT dump of Wikidata

2017-11-05 Thread Addshore
Addshore added a comment.
@Smalyshev we discussed dumping the JNL files used by blaze graph directly at points during wikidata con.
I'm aware that isnt a HDT dump, but im wondering if this would help in any way.TASK DETAILhttps://phabricator.wikimedia.org/T179681EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: AddshoreCc: Addshore, Smalyshev, Ladsgroup, Arkanosis, Tarrow, Lucas_Werkmeister_WMDE, Aklapper, Lahi, GoranSMilovanovic, QZanden, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Unassigned] T143424: [Task] Explore the Entity Relevancy Scoring for Wikidata

2017-11-05 Thread thalhamm
thalhamm removed thalhamm as the assignee of this task.
TASK DETAILhttps://phabricator.wikimedia.org/T143424EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: thalhammCc: Lydia_Pintscher, Smalyshev, thalhamm, thiemowmde, Sjoerddebruin, Glorian_Yapinus, Aklapper, Lahi, GoranSMilovanovic, QZanden, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T175230: Wikidata identifier links don't respect nofollow configuration

2017-11-05 Thread Nemo_bis
Nemo_bis added a comment.
This is still happening.TASK DETAILhttps://phabricator.wikimedia.org/T175230EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Nemo_bisCc: Nemo_bis, Aklapper, Lahi, GoranSMilovanovic, QZanden, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T179681: Add HDT dump of Wikidata

2017-11-05 Thread Arkanosis
Arkanosis added a comment.
FWIW, I've just tried to convert the ttl dump of the 1st of November 2017 on a machine with 378 GiB of RAM and 0 GiB of swap and… well… it failed with std::bad_alloc after more than 21 hours of runtime. Granted, there was another process eating ~100 GiB of memory, but I thought it would be okay — so I'm proved wrong.

As I was optimistic, I ran the conversion directly from the ttl.gz file, maybe preventing some memory mapping optimization, and also added the -i flag to generate the index at the same time. I'll re-run the conversion without these in the hope of finally getting the hdt file.

So, here are the statistics I got:

$ /usr/bin/time -v rdf2hdt -f ttl -i -p wikidata-20171101-all.ttl.gz  wikidata-20171101-all.hdt
Catch exception load: std::bad_alloc
ERROR: std::bad_alloc
Command exited with non-zero status 1
Command being timed: "rdf2hdt -f ttl -i -p wikidata-20171101-all.ttl.gz wikidata-20171101-all.hdt"
User time (seconds): 64999.77
System time (seconds): 10906.79
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 21:13:25
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 200475524
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 703
Minor (reclaiming a frame) page faults: 8821385485
Voluntary context switches: 36774
Involuntary context switches: 4514261
Swaps: 0
File system inputs: 81915000
File system outputs: 2767696
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 1
/usr/bin/time -v rdf2hdt -f ttl -i -p wikidata-20171101-all.ttl.gz   64999,77s user 10906,80s system 99% cpu 21:13:25,50 total

NB: the exceptionally long runtime is the result of the conversion being single-threaded while the machine has a lot of threads but a relatively low per-thread performance (2.3 Ghz). The process wasn't under memory pressure until it crashed (no swap anyway) and wasn't waiting much for I/O — so it was all CPU-bound.TASK DETAILhttps://phabricator.wikimedia.org/T179681EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: ArkanosisCc: Smalyshev, Ladsgroup, Arkanosis, Tarrow, Lucas_Werkmeister_WMDE, Aklapper, Lahi, GoranSMilovanovic, QZanden, Wikidata-bugs, aude, Mbch331___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs