Re: [web2py] Re: Can web2py be used for highly complex relational databases?

Richard Vézina Thu, 14 Feb 2013 14:01:17 -0800

Really interesting thread... Thanks Massimo about the clear explanation
about web2py pros and con... Also I like the lookup table cache tricks...


Richard


On Sat, Dec 29, 2012 at 5:30 PM, Massimo Di Pierro <
massimo.dipie...@gmail.com> wrote:

> The problem is that SQLA is from this prospective lower level. You can use
> scripts/extract_pgsql_models.py to introspect the database and "guess" the
> model, but while this information is sufficient for SQLA, it is not
> sufficient for web2py. Web2py needs to know more about the column than it
> is known to the database. For example a varchar field could be an hashed
> password, or a serialized list of references, etc.  Introspection does not
> help you except in trivial cases.
>
> Massimo
>
>
>
> On Saturday, 29 December 2012 15:30:10 UTC-6, Michele Comitini wrote:
>
>> This thread reminds me of a feature I would like to see in the DAL.
>>
>>  The DAL is the missing introspection features SQLAlchemy has: legacy db
>> introspection.  So with the DAL the developer needs to write always the
>> metadata to describe the db schema.
>>
>> I faced that in a fairly complex sqlite schema. I hate to write things
>> that are already there ;-) ... so I *had* to make a simple script to
>> translate DDL of sqlite schema to proper DAL data definition instructions,
>> it can be found under the scripts directory in recent web2py versions.
>> The DAL should have introspection and eventually build in memory cached
>> table definitions on the fly.  Optionally the developer could decide to
>> make those table definitions persistent to reduce start-up times and make
>> customizations.
>>
>> mic
>>
>>
>>
>> 2012/12/29 Massimo Di Pierro <massimo....@gmail.com>
>>
>> To better illustrate my previous point I built an example which may be
>>> useful to others who are concerned about performance.
>>>
>>> https://dl.dropbox.com/u/**18065445/Tmp/a1000.zip<https://dl.dropbox.com/u/18065445/Tmp/a1000.zip>
>>>
>>> Contains two apps a1000_1 and a1000_2.
>>>
>>> Both apps define 1000 tables each with 10 columns (no attributes,
>>> validators, widgets, no auth). They have one controller which expose a grid
>>> http://..../a1000_1/default/**index/t00001 gives you the grid for table
>>> t00001.
>>>
>>> a1000_1 uses normal web2py models (defined in db.py and db_tables.py)
>>> but it bytecode compiled, uses lazy_tables, and disables migrations. On my
>>> slow laptop is serves one page in 0.9 seconds. It means it takes less than
>>> 0.9ms to define each lazy table.
>>>
>>> a1000_2 defines all tables in modules/db_modules.py imports db in
>>> models/db.py. Now it takes 80ms/req. more of 50% of it is because of the
>>> complex template and cookie/session/languages logic. basically there is no
>>> overhead in creating 1000 tables.
>>>
>>> While web2py defaults to a1000_1, other frameworks default to a1000_2
>>> but web2py can do it too!
>>>
>>> What are the drawbacks of the second approach?  There are 2 and they are
>>> not web2py specific.
>>>
>>> 1) The web server occasionally creates new processes. They will need to
>>> reload the module. This takes time. Some users will experience the slow
>>> down.
>>> 2) Now the db object is shared by all request (not the connector, they
>>> still different db sessions/transactions). This means you cannot alter the
>>> attributes of models (db.t00001.f001.readable=**False) because this
>>> will affect all concurrent requests and following requests.
>>>
>>> Web2py makes you pay a price in order to have one new clean db object at
>>> every request. You pay this price so that you do not have to worry about
>>> changing its state and affecting other requests. This costs ~1ms/table on
>>> my laptop.
>>>
>>> With conditional models this is a non-issue since you only the price for
>>> tables you use in the action.
>>>
>>> Hope I am making some sense.
>>>
>>> Massimo
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Saturday, 29 December 2012 11:30:19 UTC-6, Massimo Di Pierro wrote:
>>>>
>>>> TLTR: if you can do it with others you can do it with web2py, although
>>>> default behavior may be different.
>>>>
>>>> There are some important differences between web2py DAL and SQLAlchemy:
>>>> sqlalchemy has better support for non-integer primary keys (dal supports
>>>> them too but they make automatic form handling very difficult so we want to
>>>> discourage that); 2) sqlalchemy allows multiple database sessions per
>>>> thread over the same connection (in web2py if you want to use multiple
>>>> session in the same thread, you need multiple connections, I have never
>>>> seen a use case for this). 3) frameworks which use SQLAlchemy define tables
>>>> only once when the program starts, web2py defines tables at every request.
>>>> They are both equally reliable and fast.
>>>>
>>>> 3) causes a performance penalty for web2py compared with other
>>>> frameworks when large number of tables are present. Yet this is not a DAL
>>>> limitation. This is because of the way web2py choses to use the dal. We
>>>> want to allow different http requests to see changes in the schema. This is
>>>> part of the price for the easy to use approach. Using conditional models go
>>>> a long way towards reducing this overhead. If you use the DAL from a pyhton
>>>> script or a background process of other framework (for example Tornado),
>>>> you do not have the overhead problem. Even in web2py there are ways to
>>>> avoid it completely although not documented yet (define tables at top-level
>>>> in a module, import db from the module, then call db.reconnect() in 
>>>> models).
>>>>
>>>> Let me put it in another way. In web2py you can put logic in modules or
>>>> in models. They are two types of files (one is imported one is executed).
>>>> The more logic you put in models, the more agile it is. The more logic you
>>>> put in modules, the faster it in. Other frameworks do not make this
>>>> distinction (models are modules and are imported too) and therefore they
>>>> give you no option.
>>>>
>>>> Web2py can definitively be used to build a complex system with many
>>>> tables. As with any framework the bottle neck will always be database
>>>> access to you need lots of caching. Whether it is the right system to
>>>> interface with an existing database that I cannot say but it depends on the
>>>> existing database but you will have issues with any framework you choose.
>>>>
>>>> Massimo
>>>>
>>>>
>>>>
>>>> On Saturday, December 29, 2012 11:02:34 AM UTC-6, Alex Glaros wrote:
>>>>>
>>>>> Hi Lazaro and Massimo,
>>>>>
>>>>> The Fi$cal project is not my project; I don't know how many tables it
>>>>> has, but is an example of highly relational financial software so that
>>>>> web2py members could understand my question clearly. Most projects will be
>>>>> in strict 3rd Normal Form.
>>>>>
>>>>> Let me be more transparent regarding my goals.
>>>>>
>>>>> I have a nonprofit government improvement 
>>>>> organization<http://www.gov-ideas.com/>and part of our mission is to 
>>>>> recommend software architecture to
>>>>> government. I am evaluating web2py as a candidate for prototyping
>>>>> applications. I'm very excited that Massimo's vision to lower the average
>>>>> person's programming entry barrier also works to improve government by
>>>>> allowing rapid prototyping.
>>>>>
>>>>> The next step is to assess how to best migrate prototypes to
>>>>> production. There maybe no further steps necessary, or for large projects
>>>>> like Fi$cal, there may be a consensus such as moving the entire project to
>>>>> Python/sqlAlchemy which provides the most stability, but allows copying of
>>>>> some web2py components for reuse.
>>>>>
>>>>> It is a significant responsibility to make these recommendations and I
>>>>> need many diverse advisors to select products that reduce risk in
>>>>> government software development.
>>>>>
>>>>>
>>>>>    1. At this time is there a consensus regarding the top candidates
>>>>>    for open source business-oriented, highly relational software tools?
>>>>>    2. What would the top 10 look like?
>>>>>    3. What are the reasons that each one was selected?
>>>>>    4. At what points or areas would other products surpass web2py?
>>>>>
>>>>>
>>>>> Any comments would be much appreciated,
>>>>>
>>>>> Alex
>>>>>
>>>>> On Saturday, December 29, 2012 8:47:42 AM UTC-8, Massimo Di Pierro
>>>>> wrote:
>>>>>>
>>>>>> I should add that in the system I have looked almost all tables where
>>>>>> de-normalized. All tabled storing a key would also stored the value
>>>>>> corresponding to the code. This is for two reasons: 1) if you have
>>>>>> thousands of tables you cannot join everything all the time. 2) for
>>>>>> auditing purposes it should be possible to change a value in a lookup 
>>>>>> table
>>>>>> without changing the corresponding previous values of records created
>>>>>> before.
>>>>>>
>>>>>> This means that while I needed the lookup tables for IS_IN_SET()
>>>>>> IS_IN_DB() validators, I never needed more than two joins.
>>>>>>
>>>>>> Again, the all design seems crazy to many of use but it has its plus
>>>>>> sides from an auditing point of view.
>>>>>>
>>>>>> Massimo
>>>>>>
>>>>>> On Friday, 28 December 2012 22:39:51 UTC-6, Massimo Di Pierro wrote:
>>>>>>>
>>>>>>> I have some experience with a large peoplesoft system where they
>>>>>>> claims thousands of tables. Turns out almost everything I needed was in
>>>>>>> less than 10 tables.
>>>>>>>
>>>>>>> Basically most of the tables were lookup tables used simply as
>>>>>>> key:value store for the possible values of a field. For example table
>>>>>>> GENDER {'M':'Male','F':'Female','O':'****Other'}.
>>>>>>>
>>>>>>> So 10 tables with 100 columns each and one lookup table for each
>>>>>>> column gives you more than 10,000 tables. I suspects that is your case 
>>>>>>> too.
>>>>>>>
>>>>>>> All systems that claim an insane amount of tables belong to this
>>>>>>> category.
>>>>>>>
>>>>>>> The way to handle it is to load all lookup tables in cache and use
>>>>>>> cache instead of database access to convert key<->value. In fact the 
>>>>>>> values
>>>>>>> for lookup tables almost never change.
>>>>>>>
>>>>>>> I suggest before you embark in this venture do the following
>>>>>>> exercise: make a list of all table names. For each table make list of
>>>>>>> fields in the table and count the number of records (more or less).
>>>>>>>
>>>>>>> You will find many tables with less then 100 records and less then
>>>>>>> 10 columns. You will find a few tables with more than 10 columns and 
>>>>>>> more
>>>>>>> then 100000 records. You need to find out how many tables belong to one
>>>>>>> category and how many to the other.
>>>>>>>
>>>>>>> If this is the case, as I suspect, than you can use web2py but you
>>>>>>> need to setup some clever caching system to hable the lookup tables. It
>>>>>>> would be the same with other frameworks since you don't want to join
>>>>>>> everything all the time or your database will grind to a halt.
>>>>>>>
>>>>>>> It is also possible I am completely wrong in my assumption.
>>>>>>>
>>>>>>> In the case of the peoplesoft system I studied they also were
>>>>>>> storing all past versions of each record in the same table as the 
>>>>>>> current
>>>>>>> record. Basically every record had had two dates (valid_from, 
>>>>>>> valid_until).
>>>>>>> Current records had valid_until set to 2999-12-31. records would never 
>>>>>>> be
>>>>>>> modified. The process for modifying a record consisted of creating a 
>>>>>>> copy
>>>>>>> of the current record, editing the copy, setting the valid_until=now for
>>>>>>> the previous current record, updating all references pointing to the
>>>>>>> record. Af course all  tables used the same mechanism for versioning 
>>>>>>> thus
>>>>>>> making the update process very slow and cumbersome, and all tables
>>>>>>> un-necessary large. Yet this simplifies auditing because you can go 
>>>>>>> back to
>>>>>>> any moment in time simply by filtering records in a query.
>>>>>>>
>>>>>>> The reason I am explaining all of this is that probably you are
>>>>>>> going to have to deal with something like this. The problem is not 
>>>>>>> web2py
>>>>>>> vs other framework. The problems will be that you need special logic to
>>>>>>> handle those tables which is foreign to web2py and many modern 
>>>>>>> frameworks
>>>>>>> which simply assume more moder database design practices.
>>>>>>>
>>>>>>> My suggestion is start small and see what happens. Find who are your
>>>>>>> primary target users. Find which tables they need to access and create a
>>>>>>> web interface for those tables. You will probably be able to factorize 
>>>>>>> the
>>>>>>> interaction with the database in many small apps.
>>>>>>>
>>>>>>> Massimo
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Friday, 28 December 2012 09:38:25 UTC-6, Alex Glaros wrote:
>>>>>>>>
>>>>>>>> Can web2py be used for highly complex relational databases for
>>>>>>>> large fiscal projects? Example: California's Fi$cal project -
>>>>>>>> http://www.fiscal.ca.gov/<http://www.linkedin.com/redirect?url=http%3A%2F%2Fwww%2Efiscal%2Eca%2Egov%2F&urlhash=DBJm&_t=tracking_anet>
>>>>>>>>  - with roughly 10,000 tables and many complex joins.
>>>>>>>>
>>>>>>>> What components of web2py would start to get slow or not work well
>>>>>>>> when having so many tables?
>>>>>>>>
>>>>>>>> If web2py would instead be better used to prototype the Fi$cal
>>>>>>>> system, what would be good production-version candidates to migrate to?
>>>>>>>> Pure Python using sqlAlchemy? Java? Anything that would make migration
>>>>>>>> easier such as Python-based frameworks?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Alex Glaros
>>>>>>>
>>>>>>>  --
>>>
>>>
>>>
>>>
>>
>>  --
>
>
>
>

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to web2py+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [web2py] Re: Can web2py be used for highly complex relational databases?

Reply via email to