Dear Sirs, I am grateful for your valuable feedback and suggestions. I have updated my proposal based on the inputs given by you. The split-up of the deliverables on the ideas page indeed helped me understand the requirements more clearly.
The link to my updated proposal is https://www.mediawiki.org/wiki/User:Karthikprasad/gsoc2012proposal I request you and everyone to kindly skim through my proposal once again and suggest changes/additions. I am very excited about this project and working with you; and truth be told, 23rd April seems like ages ahead. Thanking you, Yours sincerely, Karthik > Date: Wed, 4 Apr 2012 11:49:41 +0200 > From: "Oren Bochman" <orenboch...@gmail.com> > To: "'Wikimedia developers'" <wikitech-l@lists.wikimedia.org> > Subject: Re: [Wikitech-l] GSoC 2012: Proposal-Wikipedia Corpus Tools > Message-ID: <007f01cd1248$42ee6f40$c8cb4dc0$@com> > Content-Type: text/plain; charset="utf-8" > > You do understand correctly! > > The main idea about NLP components is with POS tagger as an example: > > 1. a fall back system that does unsupervised POS tagging. > 2. the ability to plug in an existing POS tagger as these become > available for specific languages. > > I would as supervisor would recommend working with 3 languages. > English, Hebrew, and the GSOC native language. > > If we could get QA from other native speakers we would incorporate them > into the workflow. > > I think that by using a deletion/reversion based heuristic we may also be > able to make a spam corpus to boost the accuracy of the corpuses. > > > Operation Manager > E-mail: o...@romai-horizon.com > Mobil: +36 30 866 6706 > > > > R?mai Horizon Kft. > H-1039 Budapest > Kir?lyok ?tja 291. D. ?p. fszt. 2. > Tel: +36 1 492 1492 > Fax: +36 1 266 5529 > > -----Original Message----- > From: wikitech-l-boun...@lists.wikimedia.org [mailto: > wikitech-l-boun...@lists.wikimedia.org] On Behalf Of Amir E. Aharoni > Sent: Tuesday, April 03, 2012 10:19 PM > To: Wikimedia developers > Subject: Re: [Wikitech-l] GSoC 2012: Proposal-Wikipedia Corpus Tools > > 2012/4/3 karthik prasad <karthikprasad...@gmail.com>: > > Hello, > > I am a GSoC aspirant and have compiled a proposal for one of the > > project ideas - Wikipedia Corpus Tools. [Mentor : Oren Bochman] I > > would sincerely appreciate if you could kindly go through it and > > suggest corrections/additions so that I can settle with a coherent > proposal. > > > > Link to my proposal : > > https://www.mediawiki.org/wiki/User:Karthikprasad/gsoc2012proposal > > Nice, but why only English? > > If i understand the proposal correctly, this project is supposed to be > able to work with almost any language with very little effort. > > -- > Amir Elisha Aharoni ? ?????? ????????? ?????????? > http://aharoni.wordpress.com ??We're living in pieces, I want to live in > peace.? ? T. Moore? > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > > > > > ------------------------------ > > > Date: Wed, 4 Apr 2012 12:58:11 +0300 > From: "Amir E. Aharoni" <amir.ahar...@mail.huji.ac.il> > To: Wikimedia developers <wikitech-l@lists.wikimedia.org> > Subject: Re: [Wikitech-l] GSoC 2012: Proposal-Wikipedia Corpus Tools > Message-ID: > <CACtNa8tS-PifzJS1JsF02k3qW_-7=uk-wdqnvsflglufhxn...@mail.gmail.com > > > Content-Type: text/plain; charset=UTF-8 > > 2012/4/4 Oren Bochman <orenboch...@gmail.com>: > > You do understand correctly! > > > > The main idea about NLP components is with POS tagger as an example: > > Just to make sure, POS = part of speech, isn't it? > > It's one of the most confusing TLAs in computing :) > > > If we could get QA from other native speakers we would incorporate them > into the workflow. > > Good. As long as there is a way to plug other languages and a way for > speakers of other languages to contribute QA, i'm very happy. > > -- > Amir Elisha Aharoni ? ?????? ????????? ?????????? > http://aharoni.wordpress.com > ??We're living in pieces, > I want to live in peace.? ? T. Moore? > Date: Wed, 4 Apr 2012 00:28:29 -0400 From: Gregory Varnum <gregory.var...@gmail.com> To: Wikimedia developers <wikitech-l@lists.wikimedia.org> Subject: Re: [Wikitech-l] GSoC 2012: Proposal-Wikipedia Corpus Tools Message-ID: <ac4c429f-a839-4911-be9b-c8928aa2d...@gmail.com> Content-Type: text/plain; charset=utf-8 Whoops - I meant that email to be directed to Karthik - although Amir you're welcome to read it as well. :) -greg On Apr 3, 2012, at 11:24 PM, Gregory Varnum <gregory.var...@gmail.com> wrote: > Amir, > > Thank you for your GSOC proposal! :) > > Between now and Google's submission deadline on April 6th - you are invited to further modify your proposals. The GSOC page on MW.org - https://www.mediawiki.org/wiki/GSOC - and our IRC rooms - https://www.mediawiki.org/wiki/MediaWiki_on_IRC > > Looking over your proposal - I think you've got good background information on yourself. However, I think you should flush out more details on the proposed project. Without more familiarity with corpus (and with no links to find that info) - it's hard for everyone to weigh in equally or to make sure your project gets the full consideration you'd like. > > -greg aka varnent > > > On Apr 3, 2012, at 4:18 PM, Amir E. Aharoni <amir.ahar...@mail.huji.ac.il> wrote: > >> 2012/4/3 karthik prasad <karthikprasad...@gmail.com>: >>> Hello, >>> I am a GSoC aspirant and have compiled a proposal for one of the project >>> ideas - Wikipedia Corpus Tools. [Mentor : Oren Bochman] >>> I would sincerely appreciate if you could kindly go through it and suggest >>> corrections/additions so that I can settle with a coherent proposal. >>> >>> Link to my proposal : >>> https://www.mediawiki.org/wiki/User:Karthikprasad/gsoc2012proposal >> >> Nice, but why only English? >> >> If i understand the proposal correctly, this project is supposed to be >> able to work with almost any language with very little effort. >> >> -- >> Amir Elisha Aharoni ? ?????? ????????? ?????????? >> http://aharoni.wordpress.com >> ??We're living in pieces, >> I want to live in peace.? ? T. Moore? >> >> _______________________________________________ >> Wikitech-l mailing list >> Wikitech-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l > _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l