On Tue, Jun 30, 2009 at 12:56 PM, Aryeh
Gregor<simetrical+wikil...@gmail.com> wrote:
> On Tue, Jun 30, 2009 at 12:16 PM, Brion Vibber<br...@wikimedia.org> wrote:
>> * PHP
>>
>> Advantage: Lots of webbish people have some experience with PHP or can
>> easily find references.
>>
>> Advantage: we're pretty much guaranteed to have a PHP interpreter
>> available. :)
>>
>> Disadvantage: PHP is difficult to lock down for secure execution.
>
> I think it would be easy to provide a very simple locked-down version,
> with most of the features gone.  You could, for instance, only permit
> variable assignment, use of built-in operators, a small whitelist of
> functions, and conditionals.  You could omit loops, function
> definitions, and abusable functions like str_repeat() (let alone
> exec(), eval(), etc.) from a first pass.  This would still be vastly
> more powerful, more readable, and faster than ParserFunctions.
>
> Hopefully, we could make this secure enough for your average
> shared-host website to run it by default with no special measures
> taken and without much risk.  Installations with more access and
> higher security requirements, like Wikimedia, could shell out to a
> process that's sandboxed on the OS level to be on the safe side.  I'd
> like to hear what Tim thinks about the possibility of securing PHP
> like this.
>
> Of course, PHP is evil, and supporting it sucks.  :(  But if we
> *really* *really* need to support users who can't shell out to other
> programs, I think it's the only real language that's a feasible
> solution.
>
>
> I'd encourage you to consider requiring exec() support for full use of
> Wikipedia templates, though.  Many really big shared hosts allow it,
> like 1and1.com.  Anyone big enough to include much Wikipedia content
> will likely be on at least a VPS anyway.  And if your host doesn't
> support exec(), then at *worst* you can still get the articles in a
> totally usable form -- just run Special:ExpandTemplates on all the
> article's templates.  You can then transclude those on a per-article
> basis; we could update Special:Export to make this easier.  The only
> problem in this case would be that you can't easily change the
> formatting of all the templates at once -- but such a small site would
> likely have few enough articles to do it by hand, if they even want
> to.
>
> I think saying that users without exec() support get to use Wikipedia
> content in a somewhat less usable form would be just fine, and it
> would *really* open up our options.  We could support basically any
> programming language in that case.
>
>> * Python
>>
>> Advantage: A Python interpreter will be present on most web servers,
>> though not necessarily all. (Windows-based servers especially.)
>>
>> Wash: Python is probably better known than Lua, but not as well as PHP
>> or JS.
>>
>> Disadvantage: Like PHP, Python is difficult to lock down securely.
>
> It doesn't matter whether it's present, does it?  If the user has
> exec() support, they could download a binary interpreter for *any*
> language to their webspace and run it from there regardless of whether
> the language is supported on the host.  So Python is on exactly the
> same level as Lua here.
>
> Much though I love Python, Lua looks like the better option.  First of
> all, it's *very* small.  sudo apt-get install lua50 on my machine uses
> up only 180 KB of disk space, and the package is 30 KB gzipped.  Our
> current tarballs are 10 MB; we could easily just chuck in Lua binaries
> for Linux x86-32 and Windows without even noticing the size increase,
> and allow users to enable it with one line in LocalSettings.php.  By
> contrast, python2.6 is around 10 MB uncompressed, 2.5 MB compressed.
> Perl is twice that size.  Windows users, or users with exec() allowed
> but open_basedir preventing access to /usr/bin, would have to obtain
> Python/Perl/etc. themselves.
>
> It looks to me like Lua would be a lot easier to sandbox.  It seems
> pretty simple to deny all I/O within the language itself, so you'd
> (hopefully) just need memory and CPU limits.  Both of those could be
> implemented on Linux with hard setrlimit() values plus nice.  Similar
> things exist on Windows, hopefully accessible by command line somehow.
>  If we're shipping binaries with MediaWiki, we could even hack the
> code if necessary, to use whatever sandboxing mechanisms the OS makes
> available, although hopefully that would be unneeded.
>
> I don't think we should fixate too much on how many people know the
> language.  It's not hard to pick up a new language if you already know
> one, and Lua has the reputation of being simple (although I haven't
> tried to learn it).  I think Lua is the best option here.
>
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>

In addition to resource limits, any scheme better make sure what's
passed into the programming language and what's passed out makes
sense.  For example, you shouldn't have it generating raw HTML and
probably shouldn't let it mess with strip markers.  Some of this may
be automatic depending how it's integrated into the parser.  One would
probably also want to limit the size of an allowed output (e.g. don't
let it send 5 MB to the user).  Depending on the integration there may
be other control sequences that one needs to catch when it returns as
well.

On a separate point, one of the limitations of stand-alone type
sandboxes is that it would make it harder for the code to call other
template pages.  One of the few virtues of the current template code
is that it is relatively modular, with more complex templates being
built out of less complex ones.  If this programming language is meant
to replace that then it would also need to be able to reference the
results of other template pages.  One solution is to pre-expand those
sections (similar to what is done now, I believe), but that can get
rather delicate once one has programming constructs like variable
assignments, looping, and recursion since the template parameters
won't necessarily be fixed at the Preprocessor stage.

-Robert Rohde

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to