Hi Olya, Lucie, and Wikidatans, Very interesting projects. And thanks for publishing, Lucie - very helpful!
With regard to Swahili, Arabic (both African languages!) and Esperanto, and leveraging Google Translate / GNMT, I've been looking at this Google GNMT gif image - https://1.bp.blogspot.com/-jwgtcgkgG2o/WDSBrwu9jeI/AAAAAAAABbM/2Eobq-N9_nYeAdeH-sB_NZGbhyoSWgReACLcB/s1600/image01.gif - and wondering how the triplets of the Linked Open Data of Wikidata structured Knowledge Base (KB) would stream through this in multiple smaller languages? I couldn't deduce from this paper - https://arxiv.org/pdf/1803.07116.pdf - here, for example ... 2.1 Encoding the Triples The encoder part of the model is a feed-forward architecture that encodes the set of input triples into a fixed dimensionality vector, which is subsequently used to initialise the decoder. Given a set of un-ordered triples FE = {f1, f2, . . . , fR : fj = (sj , pj , oj )}, where sj , pj and oj are the onehot vector representations of the respective subject, property and object of the j-th triple, we compute an embedding hfj for the j-th triple by forward propagating as follows: hfj = q(Wh[Winsj ;Winpj ;Winoj ]) , (1) hFE = WF[hf1 ; . . . ; hfR−1 ; hfR ] , (2) where hfj is the embedding vector of each triple fj , hFE is a fixed-length vector representation for all the input triples FE. q is a non-linear activation function, [. . . ; . . .] represents vector concatenation. Win,Wh,WF are trainable weight matrices. Unlike (Chisholm et al., 2017), our encoder is agnostic with respect to the order of input triples. As a result, the order of a particular triple fj in the triples set does not change its significance towards the computation of the vector representation of the whole triples set, hFE . ... whether this would address streaming triplets through GNMT? Would this? And since Swahili, Arabic and Esperanto, are all active languages in - https://translate.google.com/ - no further coding on the GNMT side would be necessary. (I'm curious how best for WUaS to grow small languages not yet in either Wikipedia/Wikidata's 287-301 languages or in GNMT's ~100+ languages?). How could your Wikidata / Wikibabel work interface with Google GNMT more fully with time, building on your great Wikidata coding/papers? Cheers, Scott https://en.wikipedia.org/wiki/User:Scott_WUaS On Mon, Jun 18, 2018 at 5:17 AM, Gerard Meijssen <gerard.meijs...@gmail.com> wrote: > Hoi, > On average there is little or no support for subjects that have to do with > Africa. When I check the articles for politicians for instance, I find that > even current presidents let alone ministers are missing in African > Wikipedias. So it is wonderful that there have been projects that deal with > gaps but what if there is hardly anything? > > What this approach brings us is at least information. Basic information in > lists, info boxes maybe an additional line of text. > > What we apparently have not done is learn from the Cebuano experience. The > biggest issue was not the quality of the new information, it is the > integration with Wikidata. Everything is new and it did not link with what > we already knew. What we bring in this way is integrated information and as > long as data is not saved as an article, the quality provided improves as > Wikidata gains better intel. > > If anything, the experience of the Welsh Wikipedia brings us more than > gapfinder or tiger editathon because of this is more in line with this > approach. > Thanks, > GerardM > > On 18 June 2018 at 13:19, Amir E. Aharoni <amir.ahar...@mail.huji.ac.il> > wrote: > >> >> >> >> 2018-06-18 2:12 GMT+03:00 Olya Irzak <oir...@gmail.com>: >> >>> Dear Wikidata community, >>> >>> We're working on a project called Wikibabel to machine-translate parts >>> of Wikipedia into underserved languages, starting with Swahili. >>> >>> In hopes that some of our ideas can be helpful to machine translation >>> projects, we wrote a blogpost about how we prioritized which pages to >>> translate, and what categories need a human in the loop: >>> https://medium.com/@oirzak/wikibabel-equalizing-information- >>> access-on-a-budget-4038f750e90e >>> >>> Rumor has it that the Wikidata community has thought deeply about >>> information access. We'd love your feedback on our work. Please let us know >>> about past / ongoing machine translation related projects so we can learn >>> from & collaborate with them. >>> >> >> I'm not sure how has the Wikidata community think deeply about it. >> >> One project that does something related to what you're doing is GapFinder >> ( https://www.mediawiki.org/wiki/GapFinder ). As far as I know, the >> GapFinder frontend is not developed actively, but the recommendation API >> behind it is being actively maintained and developed, but you should ask >> the Research team for more info (see https://www.mediawiki.org/wiki >> /Wikimedia_Research ). >> >> Project Tiger is also doing something similar: >> https://meta.wikimedia.org/wiki/Project_Tiger_Editathon_2018 >> >> As a general comment, displaying machine-translated text in a way that >> appears that is had been written by humans is misleading and damaging. I >> don't know any Swahili, but in languages that I can read (Russian, Hebrew, >> Catalan, Spanish, French, German), the quality of machine translation is at >> its best good as an aid during writing a translation by a human, and it's >> never good for actually reading. I also don't understand why do you invest >> credits into pre-machine-translating articles that people can >> machine-translate for free, but maybe I'm missing something about how your >> project works. >> >> _______________________________________________ >> Wikidata mailing list >> Wikidata@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikidata >> >> > > _______________________________________________ > Wikidata mailing list > Wikidata@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikidata > > -- -- - Scott MacLeod - Founder & President - https://twitter.com/WorldUnivAndSch - World University and School - http://worlduniversityandschool.org - http://scottmacleod.com - CC World University and School - like CC Wikipedia with best STEM-centric CC OpenCourseWare - incorporated as a nonprofit university and school in California, and is a U.S. 501 (c) (3) tax-exempt educational organization.
_______________________________________________ Wikidata mailing list Wikidata@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata