Dear, Karthik Prasad & Other GSOC candidates.

 

I was not getting this list but I am now.

 

The GSOC proposal should be specified by the student.

 

I'll can expand the details on these projects.

I can answer specific questions you have about expectation.

 

To optimally  match you with a suitable high impact project - to what extent
are you familiar with :

*Java and other programming languages?

*PHP?

*Apache Lucene?

*Natural Language Processing?

*Corpus Linguistics?

*Word Net?

 

The listed projects would be either wrapped as services, or consumed by
downstream projects or both.

 

The corpus is the simplest but requires lots of attention to detail. When
successful, it would be picked up by lots of 

researchers and companies who do not have the resources for doing such CPU
intensive tasks.

For WMF it would provide us with a standardized body for future NLP work. A
Part Of Speech tagged corpus would 
be immediately useful for an 80% accurate word sense disambiguation in the
search engine.

 

Automatic Summaries are not a strategic priority AFAIK - 

1.       most articles provide a kind of abstract in their intro and 

2.       there are something like this already provided in the dumps for
yahoo.  

3.       I have been using a great pop up preview widget in Wiktionary for a
year or so.

 

I do think it would be a great project to learn how to become a MediaWiki
developer but is small for a GSOC. 
However I cannot speak for Jebald and other mentors in cellular and other
teams who might be interested in this.



If your easy grader is working it could be the basis of another very
exciting GSOC project aimed at article quality.

A NLP savvy "smart" article quality assessment service could improve/expand
the current bots grading articles. 
Grammar and spelling are two good indicators, features. However a full
assessment of Wikipedia articles would 
require more details - both stylistic and information based. Once you have
covered sufficient features 
building discriminators based on samples of graded articles would require
some data mining ability.

 

However since there is an Existing bot, undergoing upgrades  we would have
to check with its small dev team what it currently doing

And it would be subject to community oversight. 

 

Yours Sincerely,

 

Oren Bochman

 

MediaWiki Search Developer

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to