Hi Leila,
I can point you to two methods: CL-ESA and CL-CNG.
Cross-Language Explicit Semantic Analyse (CL-ESA):
http://www.uni-weimar.de/medien/webis/publications/papers/stein_2008b.pdf
This model allows for language-independent comparison of texts without
relying on parallel corpora or
Hi Leila,
From the top of my head, I can think of this paper only I've read a while
ago:
https://eprints.soton.ac.uk/403386/1/tweb_gottschalk_demidova_multiwiki.pdf
I assume what is to be considered is the (lack of) content overlap of
articles in different languages in general as of, for
Hi Scott,
On Mon, Aug 28, 2017 at 2:01 AM, Scott Hale wrote:
> Dear Leila,
>
> ==Question==
>> Do you know of a dataset we can use as ground truth for aligning
>> sections of one article in two languages?
>>
>
> This question is super interesting to me. I am not
Hoi,
Sorry to state the obvious (for me) .. We datamine Wikipedias for
statements in Wikipedia. Consequently much information that could be /
should be in an article (in any and all languages) is reflected by
Wikidata. There is much that is not found in every language and information
on some
Dear Leila,
==Question==
> Do you know of a dataset we can use as ground truth for aligning
> sections of one article in two languages?
>
This question is super interesting to me. I am not aware of any ground
truth data, but could imagine trying to build some from
[[Template:Translated_page]].
Hi all,
==Question==
Do you know of a dataset we can use as ground truth for aligning
sections of one article in two languages? I'm thinking a tool such as
Content Translation may capture this data somewhere, or there may be
some other community initiative that has matched a subset of the