Re: [web2py] Re: Can web2py be used for highly complex relational databases?

Massimo Di Pierro Sat, 29 Dec 2012 14:30:11 -0800

The problem is that SQLA is from this prospective lower level. You can use 
scripts/extract_pgsql_models.py to introspect the database and "guess" the 
model, but while this information is sufficient for SQLA, it is not 
sufficient for web2py. Web2py needs to know more about the column than it 
is known to the database. For example a varchar field could be an hashed 
password, or a serialized list of references, etc.  Introspection does not 
help you except in trivial cases.


Massimo



On Saturday, 29 December 2012 15:30:10 UTC-6, Michele Comitini wrote:
>
> This thread reminds me of a feature I would like to see in the DAL.
>
>  The DAL is the missing introspection features SQLAlchemy has: legacy db 
> introspection.  So with the DAL the developer needs to write always the 
> metadata to describe the db schema.  
>
> I faced that in a fairly complex sqlite schema. I hate to write things 
> that are already there ;-) ... so I *had* to make a simple script to 
> translate DDL of sqlite schema to proper DAL data definition instructions, 
> it can be found under the scripts directory in recent web2py versions.
> The DAL should have introspection and eventually build in memory cached 
> table definitions on the fly.  Optionally the developer could decide to 
> make those table definitions persistent to reduce start-up times and make 
> customizations.
>
> mic
>
>
>
> 2012/12/29 Massimo Di Pierro <massimo....@gmail.com <javascript:>>
>
>> To better illustrate my previous point I built an example which may be 
>> useful to others who are concerned about performance.
>>
>> https://dl.dropbox.com/u/18065445/Tmp/a1000.zip
>>
>> Contains two apps a1000_1 and a1000_2.
>>
>> Both apps define 1000 tables each with 10 columns (no attributes, 
>> validators, widgets, no auth). They have one controller which expose a grid 
>> http://..../a1000_1/default/index/t00001 gives you the grid for table 
>> t00001.
>>
>> a1000_1 uses normal web2py models (defined in db.py and db_tables.py) but 
>> it bytecode compiled, uses lazy_tables, and disables migrations. On my slow 
>> laptop is serves one page in 0.9 seconds. It means it takes less than 0.9ms 
>> to define each lazy table. 
>>
>> a1000_2 defines all tables in modules/db_modules.py imports db in 
>> models/db.py. Now it takes 80ms/req. more of 50% of it is because of the 
>> complex template and cookie/session/languages logic. basically there is no 
>> overhead in creating 1000 tables.
>>
>> While web2py defaults to a1000_1, other frameworks default to a1000_2 but 
>> web2py can do it too!
>>
>> What are the drawbacks of the second approach?  There are 2 and they are 
>> not web2py specific.
>>
>> 1) The web server occasionally creates new processes. They will need to 
>> reload the module. This takes time. Some users will experience the slow 
>> down.
>> 2) Now the db object is shared by all request (not the connector, they 
>> still different db sessions/transactions). This means you cannot alter the 
>> attributes of models (db.t00001.f001.readable=False) because this will 
>> affect all concurrent requests and following requests. 
>>
>> Web2py makes you pay a price in order to have one new clean db object at 
>> every request. You pay this price so that you do not have to worry about 
>> changing its state and affecting other requests. This costs ~1ms/table on 
>> my laptop.
>>
>> With conditional models this is a non-issue since you only the price for 
>> tables you use in the action.
>>
>> Hope I am making some sense.
>>
>> Massimo
>>
>>
>>
>>
>>
>>
>>
>> On Saturday, 29 December 2012 11:30:19 UTC-6, Massimo Di Pierro wrote:
>>>
>>> TLTR: if you can do it with others you can do it with web2py, although 
>>> default behavior may be different.
>>>
>>> There are some important differences between web2py DAL and SQLAlchemy: 
>>> sqlalchemy has better support for non-integer primary keys (dal supports 
>>> them too but they make automatic form handling very difficult so we want to 
>>> discourage that); 2) sqlalchemy allows multiple database sessions per 
>>> thread over the same connection (in web2py if you want to use multiple 
>>> session in the same thread, you need multiple connections, I have never 
>>> seen a use case for this). 3) frameworks which use SQLAlchemy define tables 
>>> only once when the program starts, web2py defines tables at every request. 
>>> They are both equally reliable and fast.
>>>
>>> 3) causes a performance penalty for web2py compared with other 
>>> frameworks when large number of tables are present. Yet this is not a DAL 
>>> limitation. This is because of the way web2py choses to use the dal. We 
>>> want to allow different http requests to see changes in the schema. This is 
>>> part of the price for the easy to use approach. Using conditional models go 
>>> a long way towards reducing this overhead. If you use the DAL from a pyhton 
>>> script or a background process of other framework (for example Tornado), 
>>> you do not have the overhead problem. Even in web2py there are ways to 
>>> avoid it completely although not documented yet (define tables at top-level 
>>> in a module, import db from the module, then call db.reconnect() in models).
>>>
>>> Let me put it in another way. In web2py you can put logic in modules or 
>>> in models. They are two types of files (one is imported one is executed). 
>>> The more logic you put in models, the more agile it is. The more logic you 
>>> put in modules, the faster it in. Other frameworks do not make this 
>>> distinction (models are modules and are imported too) and therefore they 
>>> give you no option.  
>>>
>>> Web2py can definitively be used to build a complex system with many 
>>> tables. As with any framework the bottle neck will always be database 
>>> access to you need lots of caching. Whether it is the right system to 
>>> interface with an existing database that I cannot say but it depends on the 
>>> existing database but you will have issues with any framework you choose.
>>>
>>> Massimo
>>>
>>>
>>>
>>> On Saturday, December 29, 2012 11:02:34 AM UTC-6, Alex Glaros wrote:
>>>>
>>>> Hi Lazaro and Massimo,
>>>>
>>>> The Fi$cal project is not my project; I don't know how many tables it 
>>>> has, but is an example of highly relational financial software so that 
>>>> web2py members could understand my question clearly. Most projects will be 
>>>> in strict 3rd Normal Form.
>>>>
>>>> Let me be more transparent regarding my goals.
>>>>
>>>> I have a nonprofit government improvement 
>>>> organization<http://www.gov-ideas.com/>and part of our mission is to 
>>>> recommend software architecture to 
>>>> government. I am evaluating web2py as a candidate for prototyping 
>>>> applications. I'm very excited that Massimo's vision to lower the average 
>>>> person's programming entry barrier also works to improve government by 
>>>> allowing rapid prototyping. 
>>>>
>>>> The next step is to assess how to best migrate prototypes to 
>>>> production. There maybe no further steps necessary, or for large projects 
>>>> like Fi$cal, there may be a consensus such as moving the entire project to 
>>>> Python/sqlAlchemy which provides the most stability, but allows copying of 
>>>> some web2py components for reuse.
>>>>
>>>> It is a significant responsibility to make these recommendations and I 
>>>> need many diverse advisors to select products that reduce risk in 
>>>> government software development. 
>>>>
>>>>
>>>>    1. At this time is there a consensus regarding the top candidates 
>>>>    for open source business-oriented, highly relational software tools? 
>>>>    2. What would the top 10 look like? 
>>>>    3. What are the reasons that each one was selected? 
>>>>    4. At what points or areas would other products surpass web2py?
>>>>
>>>>
>>>> Any comments would be much appreciated,
>>>>
>>>> Alex
>>>>
>>>> On Saturday, December 29, 2012 8:47:42 AM UTC-8, Massimo Di Pierro 
>>>> wrote:
>>>>>
>>>>> I should add that in the system I have looked almost all tables where 
>>>>> de-normalized. All tabled storing a key would also stored the value 
>>>>> corresponding to the code. This is for two reasons: 1) if you have 
>>>>> thousands of tables you cannot join everything all the time. 2) for 
>>>>> auditing purposes it should be possible to change a value in a lookup 
>>>>> table 
>>>>> without changing the corresponding previous values of records created 
>>>>> before.
>>>>>
>>>>> This means that while I needed the lookup tables for IS_IN_SET() 
>>>>> IS_IN_DB() validators, I never needed more than two joins.
>>>>>
>>>>> Again, the all design seems crazy to many of use but it has its plus 
>>>>> sides from an auditing point of view.
>>>>>
>>>>> Massimo
>>>>>
>>>>> On Friday, 28 December 2012 22:39:51 UTC-6, Massimo Di Pierro wrote:
>>>>>>
>>>>>> I have some experience with a large peoplesoft system where they 
>>>>>> claims thousands of tables. Turns out almost everything I needed was in 
>>>>>> less than 10 tables.
>>>>>>
>>>>>> Basically most of the tables were lookup tables used simply as 
>>>>>> key:value store for the possible values of a field. For example table 
>>>>>> GENDER {'M':'Male','F':'Female','O':'**Other'}.
>>>>>>
>>>>>> So 10 tables with 100 columns each and one lookup table for each 
>>>>>> column gives you more than 10,000 tables. I suspects that is your case 
>>>>>> too.
>>>>>>
>>>>>> All systems that claim an insane amount of tables belong to this 
>>>>>> category.
>>>>>>
>>>>>> The way to handle it is to load all lookup tables in cache and use 
>>>>>> cache instead of database access to convert key<->value. In fact the 
>>>>>> values 
>>>>>> for lookup tables almost never change.
>>>>>>
>>>>>> I suggest before you embark in this venture do the following 
>>>>>> exercise: make a list of all table names. For each table make list of 
>>>>>> fields in the table and count the number of records (more or less).
>>>>>>
>>>>>> You will find many tables with less then 100 records and less then 10 
>>>>>> columns. You will find a few tables with more than 10 columns and more 
>>>>>> then 
>>>>>> 100000 records. You need to find out how many tables belong to one 
>>>>>> category 
>>>>>> and how many to the other.
>>>>>>
>>>>>> If this is the case, as I suspect, than you can use web2py but you 
>>>>>> need to setup some clever caching system to hable the lookup tables. It 
>>>>>> would be the same with other frameworks since you don't want to join 
>>>>>> everything all the time or your database will grind to a halt.
>>>>>>
>>>>>> It is also possible I am completely wrong in my assumption.
>>>>>>
>>>>>> In the case of the peoplesoft system I studied they also were storing 
>>>>>> all past versions of each record in the same table as the current 
>>>>>> record. 
>>>>>> Basically every record had had two dates (valid_from, valid_until). 
>>>>>> Current 
>>>>>> records had valid_until set to 2999-12-31. records would never be 
>>>>>> modified. 
>>>>>> The process for modifying a record consisted of creating a copy of the 
>>>>>> current record, editing the copy, setting the valid_until=now for the 
>>>>>> previous current record, updating all references pointing to the record. 
>>>>>> Af 
>>>>>> course all  tables used the same mechanism for versioning thus making 
>>>>>> the 
>>>>>> update process very slow and cumbersome, and all tables un-necessary 
>>>>>> large. 
>>>>>> Yet this simplifies auditing because you can go back to any moment in 
>>>>>> time 
>>>>>> simply by filtering records in a query.
>>>>>>
>>>>>> The reason I am explaining all of this is that probably you are going 
>>>>>> to have to deal with something like this. The problem is not web2py vs 
>>>>>> other framework. The problems will be that you need special logic to 
>>>>>> handle 
>>>>>> those tables which is foreign to web2py and many modern frameworks which 
>>>>>> simply assume more moder database design practices.
>>>>>>
>>>>>> My suggestion is start small and see what happens. Find who are your 
>>>>>> primary target users. Find which tables they need to access and create a 
>>>>>> web interface for those tables. You will probably be able to factorize 
>>>>>> the 
>>>>>> interaction with the database in many small apps.
>>>>>>
>>>>>> Massimo
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Friday, 28 December 2012 09:38:25 UTC-6, Alex Glaros wrote:
>>>>>>>
>>>>>>> Can web2py be used for highly complex relational databases for large 
>>>>>>> fiscal projects? Example: California's Fi$cal project - 
>>>>>>> http://www.fiscal.ca.gov/<http://www.linkedin.com/redirect?url=http%3A%2F%2Fwww%2Efiscal%2Eca%2Egov%2F&urlhash=DBJm&_t=tracking_anet>
>>>>>>>  - with roughly 10,000 tables and many complex joins.
>>>>>>>
>>>>>>> What components of web2py would start to get slow or not work well 
>>>>>>> when having so many tables? 
>>>>>>>
>>>>>>> If web2py would instead be better used to prototype the Fi$cal 
>>>>>>> system, what would be good production-version candidates to migrate to? 
>>>>>>> Pure Python using sqlAlchemy? Java? Anything that would make migration 
>>>>>>> easier such as Python-based frameworks?
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Alex Glaros
>>>>>>
>>>>>>  -- 
>>  
>>  
>>  
>>
>
>

--

Re: [web2py] Re: Can web2py be used for highly complex relational databases?

Reply via email to