> I've been writing projects
> for university and for a computer lab I work at, but it's mostly small,
> one-off sysadmin things and usually the emphasis is more on "xyz server
> has to be back up before we open tomorrow" than writing good, clean code.
> So, yes, I'd welcome other suggestions.

Cool!  So, I'm assuming you're looking forward to an opportunity to write
good, clean code as a summer project.  :)

There are ways to make [Python-based extensions]  run faster if performance
> is a concern. For
> example, mod_python or mod_wscgi, or explicitly pulling the Python out
> into a standalone daemon that listens for requests from the webserver.

Personally, I'd avoid trying to make that pitch for a GSoC project.  While
you're right that Python is a pretty defensible choice when embarking on a
large project, trading one dependency for another for this size/scale of
project won't be as compelling as eliminating a dependency altogether.

Of course, as I say that, I see Platonides disagrees with me here.  Choosing
Python is not a huge disadvantage in this context, but it's not going to
have the same unanimous(-ish) approval of using PHP.

> Another possibility be writing it in C to avoid all interpreter
> overhead, and using a foreign function interface. Unfortunately, I'm not
> familiar with PHP's FFI. Google takes me to
>    http://wiki.php.net/rfc/php_native_interface
> which seems to think that as of a year ago there weren't any good ones,
> but this doesn't look too painful:
>    http://theserverpages.com/php/manual/en/zend.creating.php
I think straight PHP would be fine for this particular project.  The
downside of a C implementation is that, while its almost certainly going to
have the best performance characteristics, it also makes it more likely to
fall into disrepair and be a possible source of buffer overruns and other
security issues.

The nice thing about a PHP port (if done correctly) is that it would be a
trivial install for small wikis and Wikipedia alike.  That translates into
more usage, which in turn translates into higher likelihood that it stays

That said, there have got to be a ton of projects that could benefit from
PHP->native C bindings.  I'm going to leave it to some other folks to
suggest projects in this area.

> I'm most familiar with Python and C, for whatever that's worth coming
> from an undergrad who didn't know Python existed five years ago. I
> learned PHP to maintain the web interfaces of an in-house print system
> at work, but I haven't used it for anything as involved as what we're
> discussing here. So, in terms of productivity, yes, if I have to work in
> PHP my mentor will probably get asked a few more newbie questions.
> In terms of happiness, though, it'd be a great opportunity to dig into
> PHP and finally learn to use it as more than really smart CSS with a
> database connection. Although I prefer Python or even C because I think
> I'd be more useful, I wouldn't be very upset at all if it turned out you
> guys were willing to let me learn PHP on your time.

There's a few Python-based things that might be interesting, but I think
you'll get a lot more love for doing something in PHP or C.  Since this is a
student internship, you shouldn't be bashful about using this as a learning

I'd only caution against convincing yourself (and us) that you'll be more
interested in learning something like PHP than you truly are.  It might help
you land a spot, but it will work against you in having a successful
project, and this has such high visibility that you'll really want to be
successful.  So, if you find yourself thinking about doing this in PHP and
having your inner voice say "meh", then I'd recommend sticking to your guns
and propose doing this or something else in Python and/or C.

> > 2.  Are you zeroing in on <math> parsing and parsing in general because
> > that's an area that you're already developing expertise in and/or are
> deeply
> > interested in getting into, or is that just something that looked kinda
> > interesting to learn about relative to other opportunities you
> considered?
> I like the <math> parsing project because it seems well-suited for a
> third-year undergrad who knows LaTeX and reads a few other functional
> languages and has studied lex/yacc before in his coursework. The goals
> are clear, and I know how to break them down into smaller problems and
> how to tackle each one. It's a little isolated from the rest of
> Mediawiki, so I don't need to grok the entire code base.
> Basically, this looks like a way to make a concrete contribution despite
> being a newcomer to the project. That doesn't mean I'm not happy to
> entertain alternatives, just that they have a pretty high bar to clear.

This is a really smart way of thinking about this, so that's great that
you're thinking the right way about the project scope.  I agree with you
that finding something reasonably well-contained is going to be the best
strategy for success.

> > 3.  Are you coming at this as someone who is already deep into
> > Wikipedia/MediaWiki usage who is looking to resolve particular things
> (like
> > <math> parsing) that are painful as an end user, or are you more casually
> > involved and more interested in applying in this project because it looks
> > like we've got a lot of interesting programming problems to solve?
> The second. I just want to tackle a problem that's near but not quite
> beyond my limits, and if I can help out a site I use daily, so much the
> better.

Wonderful!  Great reason to get involved!

