Re: [Wiki-research-l] ground truth for section alignment across languages

2017-08-30 Thread Martin Potthast
Hi Leila, I can point you to two methods: CL-ESA and CL-CNG. Cross-Language Explicit Semantic Analyse (CL-ESA): http://www.uni-weimar.de/medien/webis/publications/papers/stein_2008b.pdf This model allows for language-independent comparison of texts without relying on parallel corpora or

Re: [Wiki-research-l] ground truth for section alignment across languages

2017-08-30 Thread Lucie-Aimée Kaffee
Hi Leila, From the top of my head, I can think of this paper only I've read a while ago: https://eprints.soton.ac.uk/403386/1/tweb_gottschalk_demidova_multiwiki.pdf I assume what is to be considered is the (lack of) content overlap of articles in different languages in general as of, for

Re: [Wiki-research-l] ground truth for section alignment across languages

2017-08-29 Thread Leila Zia
Hi Scott, On Mon, Aug 28, 2017 at 2:01 AM, Scott Hale wrote: > Dear Leila, > > ==Question== >> Do you know of a dataset we can use as ground truth for aligning >> sections of one article in two languages? >> > > This question is super interesting to me. I am not

Re: [Wiki-research-l] ground truth for section alignment across languages

2017-08-28 Thread Gerard Meijssen
Hoi, Sorry to state the obvious (for me) .. We datamine Wikipedias for statements in Wikipedia. Consequently much information that could be / should be in an article (in any and all languages) is reflected by Wikidata. There is much that is not found in every language and information on some

Re: [Wiki-research-l] ground truth for section alignment across languages

2017-08-28 Thread Scott Hale
Dear Leila, ==Question== > Do you know of a dataset we can use as ground truth for aligning > sections of one article in two languages? > This question is super interesting to me. I am not aware of any ground truth data, but could imagine trying to build some from [[Template:Translated_page]].

[Wiki-research-l] ground truth for section alignment across languages

2017-08-24 Thread Leila Zia
Hi all, ==Question== Do you know of a dataset we can use as ground truth for aligning sections of one article in two languages? I'm thinking a tool such as Content Translation may capture this data somewhere, or there may be some other community initiative that has matched a subset of the