[web2py:27856] Re: Fwd: My thoughts on web2py

mdipierro Sun, 02 Aug 2009 12:21:41 -0700

I am about to catch a plane so I will be short. We can talk more about
this in a couple of days.


> > First a word of warning.  I send you this mail in private because I  
> > want
> > to share my thoughts on the web2py code and because I hope I can help
> > with that somehow.  You're free to publish this mail if you want.

This is constructive criticism so I appreciate it.

> > As you might remember I came in contract with web2py a long ago when  
> > you
> > were evaluating pygments as a syntax highlighter for the web2py
> > documentation or something and you decided to implement your own  
> > because
> > you were unable to "ship" pygments as part of web2py.  As far as I
> > remember py2app or py2exe were not able to pick up the on-demand  
> > imports
> > in the lexer module.
>
> > I have been watching the development ever since because (And please
> > don't think bad of me because of that) it appeared to me that web2py  
> > was
> > a joke.  I remember reading some very early revisions of your PDF
> > documentation where I remember a code that looks something like that:
>
> >    db.define_table('something', db.Field('meh', 'date',
> >                    default=date.today()))
>
> > That line freaked me out.  This line alone showed me that web2py is
> > doing something very, very weird.  From this line I could see that it
> > means one of the following two things:
>
> > - either the code has a bug for long running applications (in this  
> > case
> >  when the application code runs longer than a day in which case the
> >  default would have to change dynamically)
>
> > - or the global scope has different semantics and is reevaluated each
> >  request.
>
> > Turns out the latter is what happens in web2py.

Correct.

> There still is a race
> > condition but only a small one.

No there is not. Why do you think there is a race condition?

> The bigger problem with this  
> > however is
> > that this means the (not so) static definitions are reevaluated each
> > request.

Correct. A small price to pay for a lot of flexibility.
We are working on lazy evaluation of tables.
There is also a speedup if you press the "compile" button in admin.

> > Especially for the database table definitions (which cause many  
> > function
> > calls) this means an enormous per-request penality if you have many of
> > them.  I even remember reading a case like this in the web2py  
> > newsgroup
> > about someone who had twenty table definitions or something and the
> > majority of the request time was spent setting up the tables.

There was a slow down problem that has been fixed by Alexey. I agree
there is still room for improvement.

> > However that also means another thing.  Standard Python  
> > optimizations no
> > longer apply.  In Python usually modules are cached in an interpreter
> > bound dictionary for reusing.  This of course does not apply for  
> > web2py.
> > However the second problem here is that only the local scope gets
> > certain optimizations in cpython such as fast-local lookup.  Instead  
> > of
> > storing the variables in a dictionary they are instead stored in an an
> > object where items are looked up by index (fast locals) rather than  
> > by a
> > string.
>
> > Now of course one can argue that this is the way it's intended.  
> > That's
> > nice but for that, the documentation is lacking.

Yes it is true that documentation is lacking.

>  There is no
> > explanation of what implications this has on the code.  And in
> > comparison with "enterprise ready" that gives me a strange feeling  
> > about
> > the project.
>
> > I'm not even exactly sure what kills web2py for me in the end, but I'm
> > pretty sure that part of the reason is the way the project is  
> > advertised.

We prefer to talk about "advocating" not "advertising" since we do not
sell it.
Mistakes were made on my side.

> > web2py does not have worse code than many popular libraries out there.

Thanks. I take is as a compliment.

> > BeautifulSoup comes to mind.  That library does some really ugly  
> > things
> > there including monkeypatching of the standard library.  It does
> > different things depending on the version of Python it's running on  
> > and
> > in the code are some really ugly idioms like a unicode subclass that  
> > has
> > a reference to the parsed tree that results in memory leaks very  
> > easily
> > because it's nearly impossible to get rid of the reference (actually
> > just by exploiting some weird Python internals that are mostly
> > undocumented).  But the author of BeautifulSoup does not advertise  
> > it as
> > a backwards-compatible, enterprise ready HTML parser.
>
> > The web2py code on the other hand has some really problematic idioms  
> > as
> > well.  The execution of models, views and whatnot in a made-up  
> > namespace
> > comes to mind.

Yes. It was a design decision to be closer to Rails than Django in
this respect. I think this is what is making web2py popular, more than
anything else.

> Due to the decision of being backwards compatible it
> > will be nearly impossible to "fix" those things.

We think of them as features, not bugs.

> However there are so
> > many implications nobody yet knows about which is terribel for large
> > projects.
>
> > For example the database table redfinition problem cited above.  Of
> > course that particular problem can be worked around with making the
> > definition code internally lazy or something but there can be more
> > nobody yet knows about.

Alexey has already done this. It achieves a 10-20% speedup. I am
rewriting the DAL completely to make it easier to port on other
backends and faster. The new one may support lazy evaluations of
tables. Time permitting,

> > Especially side-effects of code execution can do some terrible damage.
> > Decorators that would register code would instantly leak for example.
> > And some libraries provide auto-registration that you would not
> > instantly know about.

Can you provide an example of what you mean? I am pretty sure there is
no leak.

> > Also libraries like pickle do not support weird situations like this.
> > So I guess in the end I'm not (only) complaining about the web2py code
> > or decisions there, but mainly how web2py is advertised as enterprise
> > ready when nobody knows how these idioms scale or behave.

I disagree with "nobody knows how these idioms scale or behave"?
We have been doing this for 2 years. We know very well how it works
and scales.
It actually works very well because as soon as a request is completed
the environment is deleted, this is clean, avoids conflicts when
multiple apps are running, and prevents memory leaks.

> > Another part is that web2py (and this is something web2py has in  
> > common
> > with so many other web frameworks) is the craze to reimplement
> > everything.  I know you don't want to pull depedencies in but from  
> > what
> > I can see you're providing one-click installers for the framework  
> > which
> > could ship external dependencies.  Both py2exe and py2app provide
> > support for external dependencies and I can't see what "not  
> > depending on
> > webob/werkzeug" gives you there.

web2py started as a teaching tool. It was important (and still is)
that I can go over every line of code with my students know exactly
why it is there.

> > The reason why I'm addressing this is that there are so many
> > limitations, problems and weird aspects in WSGI and HTTP that are hard
> > to tackle and many people get wrong.

I agree.

>  I'm not saying you're getting it
> > wrong, I haven't even studied the code in question, but I'm pretty  
> > sure
> > it would make sense to collaborate on that.

I like the idea. Some conditions have to be met:
1) we have a new contributor license agreement on web2py-developers
2) patches must not break backward compatibility
3) patches must either make the code smaller/faster, or add a needed
functionality.

> > About the code quality itself, not the (IMO) broken concepts I picked
> > some of the snippets I found quickly searching with the help of ack:
>
> > - Missing `close()` calls with files (ignored matches from  
> > compileapp):
>
> >  gluon/fileutils.py:160:    f.write(open(tarname, 'rb').read())
> >  gluon/fileutils.py:246:        data = open(filename, 'rb').read()
> >  gluon/highlight.py:313:    data = open(sys.argv[1]).read()
> >  gluon/languages.py:37:    lang_text = open(filename, 'r').read()...
> >  gluon/languages.py:153:        data = open(file, 'r').read()
> >  gluon/main.py:107:web2py_version = open(os.path.join(web2py_path, ...
> >  gluon/rewrite.py:26:        exec open('routes.py', 'r').read() ...
> >  gluon/template.py:104:            text = open(os.path.join(path, ...
> >  gluon/template.py:118:            parent = open(t, 'rb').read()
> >  gluon/template.py:134:            child = open(t, 'rb').read()
> >  gluon/tools.py:1934:        return urllib.urlopen(url).read()
> >  gluon/widget.py:55:ProgramVersion = open('VERSION', ...
>
> > - Same in streamer.py for checking for file existence
> >  (open(static_file))

I do not see the problem. Files are automatically closed when the file
object goes out of scope.
Can you provide an example when it does not?

> > - Template engine seems to be missing proper escapes.  Having ''' in  
> > the
> >  template should be enough to break it.

I do not believe that is the case but I may be wrong. Can you provide
an example of what you say?

> > - Whitespace between "else" and colon in the template engine is  
> > unlikely
> >  to work with the current parsing rules as far as I can see.

I agree it will not work.

> > - xmlrpc system sends text/xml content type which is with a missing
> >  charset property set to us-ascii.  No charset is set so it has to be
> >  specified explicitly (proper header would be application/xml;
> >  charset=utf-8)

This has been fixed by user Mateusz. The patch will be incorporated
soon.

> > - Fileutils has a listdir method that does very inefficient deleting
> >  from the list.  Current code.
>
> >  50         for dir in dirs[:]:
> >  51             if dir.startswith('.'):
> >  52                 dirs.remove(dir)
>
> >  That creates a shallow copy, for each item it performs an O(n) lookup
> >  for the item and the remove is a O(n) operation as well.  Even
> >  unecessary because you have the position if you enumerate over the
> >  list.
> >  However you can do that with a list comprehension:
>
> >  50         dirs[:] = [x for x in dirs if not x.startswith('.')]

Thanks. We will do so.

> > - Having classes in local scope (fileutils.py for the tarfile) makes  
> > it
> >  incredible hard to reuse code.  Not a good idea, move that into the
> >  global scope instead.  Will also help performance.

Thanks. We will do so.

> > - There is no bug in shutil.copyfileobj.  That function works perfetly
> >  and does what the documentation says it should do.  Maybe you were
> >  copying from the wsgi input stream.  That of course won't work that
> >  easy because there is no end-of-file marker in the input stream.  See
> >  werkzeug's LimitedStream for ways how to fix that problem.  A similar
> >  class also exists in WebOb but I don't remember the name.

I am not sure what you are referring to here. Is this about a comment
in the code?

> > - You should not inherit from BaseException unless you are dealing  
> > with
> >  special exceptions that should not be catched by an "except  
> > Exception"
>
> > - there are realtive imports all over the place.  This is a deprecated
> >  unfeature, switch to "from gluon.html import blah" instead of the
> >  current "from html import blah".

This is an aesthetic issue. The current code works well with all
supported python interpreters. Anyway, I do not oppose changing it,
but it is tedious to do.

> > - the email code in tools is vulnerable to header injections if the
> >  input is unfiltered.  If this is intentional it should be documented
> >  in the code.

Can you provide an example? If this is the case, it is not
intentional.

> > - There is no proper undecoding of the path info in main.py but spaces
> >  are transformed into underscores.  However a plus is converted into a
> >  space as well which is against the RFC.  Plus is only an alias for  
> > %20
> >  in the querystring of a URL.

web2py does not allow spaces in incoming urls. It just replaces them
with _. It should also replace + with _ . Are you saying that is not
the case? Perhaps you see a problem that I do not see. can you provide
an example of when this would be a problem?

> > - request.vars is either a dict of lists or a dict of plain strings.
> >  This means the user has to perform instance checks each access, not
> >  very clever.  Other systems fix that by providing multi-value dicts.

This cannot be changed because would break backward compatibility.
I think it is fine as it is. This is how the cgi module parses vars.

> > - time.strftime should not be used for header-date formatting because
> >  it's affected by locales.  That means if a user is using locales it
> >  will break all headers sent that deal with dates.

We do specify the format string. How do you suggest we change it?

> > This list was created in ~15 minutes from quickly checking some files.
> > If one does a check on the code more things will come up.  Some things
> > are also underspecified or unclear in the code.  For example how the
> > code is supposed to deal with unicode (it does that in very different
> > ways) and how headers are quoted/unquoted, how URLs are decoded etc.
> > Also the code architecture is suboptimal in places.  Hardcoded support
> > for Google appengine in places where you wouldn't expect it (like
> > validators etc.)

Unfortunately that is hard to avoid. GAE does things differently.

> > For an "enterprise" framework this is a bit too much smelly code  
> > that is
> > found in no time.
> > What would I fix?  Stop advertising the framework as enterprise ready
> > until it matured more.  And do less advertising in general.  And if  
> > you
> > can, fix some of the things in the code.  Those that are caused by
> > design choices won't be fixed anyway.

As I mentioned we say "advocating" not "advertising". The "enterprise"
feature of web2py is not that it is perfect (and I do not know of any
enterprise application that is perfect, what would perfect mean
anyway).

By "enterprise" we mean:
- our target users are small and medium businesses and non-profits
(the term enterprise includes both)
- we understand the needs of users by guaranteeing backward
compatibility thus protecting their long term investment in web2py
(this is not negotiable)
- we care about usability and thus we try make web2py as easy as
possible.
- we fix bugs regularly, usually within 24 of when they are reported.

"enterprise" does not mean that code cannot be improved. There is no
code that cannot be improved.
I will look into your comments more in details and I encourage other
users to do so too.
I appreciate them and they will help us improve.

web2py was originally copyrighted as "Enterprise Web Framework" so
this stuck with the name. The name changed to Gluon because there is a
company called "EWF". The name changed again to web2py because there
is a company that owns the "Gluon" trademark. It is makes you feel
better feel free to read "Enterprise" as the "Star Trek Starship USS
Enterprise NCC-1701". Perhaps we should change the tagline. web2py: to
boldly go where no web framework has been before". ;-)

Thanks Armin,

Massimo
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To post to this group, send email to web2py@googlegroups.com
To unsubscribe from this group, send email to 
web2py+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/web2py?hl=en
-~----------~----~----~----~------~----~------~--~---

[web2py:27856] Re: Fwd: My thoughts on web2py

Reply via email to