[Wikitech-l] GSOC 2014 idea

2014-02-28 Thread Roman Zaynetdinov
Hello, I am willing to participate in GSOC this year for the first time,
but I am a little bit worried about choosing the idea, I have one and I am
not sure if it suits this program. I will be very glad if you will take a
small look at my idea and tell your thoughts. Will be happy to every
feedback. Thank you.

Project Idea

What is the purpose?

Help people in reading complex texts by providing inline translation for
unknown words. For me as a non-native English speaker student sometimes is
hard to read complicated texts or articles, that's why I need to search for
translation or description every time. Why not to simplify this and change
the flow from translate and understand to translate, learn and understand?

How inline translation will appear?

While user is reading an article, he could find some unknown words or words
with confusing meaning for him. At this point he clicks on the selected
word and the inline translation appears.

What should be included in inline translation?

Thus it is not just a translator, it should include not only one
translation, but a couple or more. Also more data can be included such as
synonyms, which can be discussed during project completion.

From which source gather the data?

Wiktionary is the best candidate, it is an open source and it has a wide
database. It also suits for growing your project by adding different
languages.

Evaluation needs

There are two ways in my mind right now. First is to make a web-site built
on Node.js with open API for users. Parsoid could be used for parsing data
from Wiktionary API which is suitable for Node. A small JavaScript widget
is also required for front-end representation.

Second is to make a standalone library which can be used alone on other
resources as an add-on or in browser extensions. Unfortunately, last option
is more confusing for me at this point.

Growth opportunities

I am leaving in Finland right now and I don't know Finnish as I should to
understand locals, therefore this project can be expanded by adding more
languages support for helping people like me reading, learning and
understanding texts in foreign languages.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] GSOC 2014 idea

2014-02-28 Thread Roman Zaynetdinov
Hi Niklas, I know that in Finnish each word should be changed the same as
in Russian, that's why it causes the problems with translation. Right now I
am looking for solutions which can help find the original word. I put this
language as an example, which shows the purpose of using, of course after
implementing English others languages could be added with wider support.


2014-02-28 19:30 GMT+02:00 Niklas Laxström niklas.laxst...@gmail.com:

 2014-02-28 11:09 GMT+02:00 Roman Zaynetdinov romanz...@gmail.com:
  From which source gather the data?
 
  Wiktionary is the best candidate, it is an open source and it has a wide
  database. It also suits for growing your project by adding different
  languages.

 It's not obvious why you have reached this conclusion.

 1) There are many Wiktionaries, and they do not all work the same or
 have the same content.
 2) The Wiktionary data is relatively free form text, so it is hard to
 parse to find the relevant bits.
 3) Dozens of people have mined Wiktionary already. It would make sense
 to see if they have put the resulting database available.
 4) There are many sources of data, some of them also open, which can
 have better coverage, or coverage on speciality areas where
 Wiktionaries are lacking.
 5) I expect that best results will be achieved by using multiple data
 sources.

  Growth opportunities
 
  I am leaving in Finland right now and I don't know Finnish as I should to
  understand locals, therefore this project can be expanded by adding more
  languages support for helping people like me reading, learning and
  understanding texts in foreign languages.

 I hope you enjoyed your stay in here. I do not how much Finnish you
 have learned, but after a while it should be obvious that just
 searching for the exact string the user clicked or selected will not
 work because of the agglutinative nature of the language. I advocate
 for features which work in all languages (at least in many :). If you
 implement this for English only first, it is likely that you will have
 to rewrite it to support other languages.

   -Niklas

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] GSOC 2014 idea

2014-02-28 Thread Roman Zaynetdinov
Thanks a lot for feedback, I think I can discuss these options with my
mentor, I hope :).


2014-02-28 18:51 GMT+02:00 Gabriel Wicke gwi...@wikimedia.org:

 Hi Roman!

 On 02/28/2014 01:24 AM, Brian Wolff wrote:
  On 2/28/14, Roman Zaynetdinov romanz...@gmail.com wrote:
  Help people in reading complex texts by providing inline translation for
  unknown words. For me as a non-native English speaker student sometimes
 is
  hard to read complicated texts or articles, that's why I need to search
 for
  translation or description every time. Why not to simplify this and
 change
  the flow from translate and understand to translate, learn and
 understand?

 This sounds like a great idea.

  There are two ways in my mind right now. First is to make a web-site
 built
  on Node.js with open API for users. Parsoid could be used for parsing
 data
  from Wiktionary API which is suitable for Node. A small JavaScript
 widget
  is also required for front-end representation.

 You could basically write a node service that pulls in the Parsoid HTML
 for a given wiktionary term and extracts the info you need from the DOM
 and returns it in a JSON response to a client-side library.
 Alternatively (or as a first step), you could download the Parsoid HTML
 of the wiktionary article on the client and extract the info there. This
 could even be implemented as a gadget. We recently set liberal CORS
 headers to make this easy.

  Parsoid could be used for parsing data
  from Wiktionary API which is suitable for Node
 
  Just as a warning, parsing data from wiktionary into usable form is a
  lot harder then it looks, so don't underestimate this step. (Or at
  least it was several years ago when I last tried)

 The Parsoid rendering (e.g. [1]) has pretty much all semantic
 information in the DOM. There might still be wiktionary-specific issues
 that we don't know about yet, but tasks like extracting template
 parameters or the rendering of specific templates (IPA,..) are already
 straightforward. Also see the DOM spec [2] for background.

 Gabriel

 [1]: http://parsoid-lb.eqiad.wikimedia.org/enwiktionary/foo
  Other languages via frwiktionary, fiwiktionary, ...
 [2]: https://www.mediawiki.org/wiki/Parsoid/MediaWiki_DOM_spec

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l