Re: [web2py] Re: Using SQLFORM.grid with large datasets

2013-04-18 Thread Niphlod
the count affects only the grid for "pagination" purposes. There's no other 
reason to do a count to display a grid.
This means that if you pass cache_count=10, you'll end up not having any 
"1,2,3" links at the bottom to fetch the next "page" of results.
Putting that aside, you can pass whatever "thingy" you fit appropriate for 
the job (fixed int, cache tuple, custom code).

On Thursday, April 18, 2013 11:26:20 AM UTC+2, Johann Spies wrote:
>
> Thanks. 
>
> I will test your suggestions.
>
> I am currently working with Postgresql 9.1 and it takes nearly 5 minutes 
> to count a table > 42 000 000 records.
>
> It will if we can have the option in the grid to not do a count of the 
> result as it is part of the problem in my case.  If for instance the query 
> is db.table.id > 0  I would prefer to use the other way to get an 
> approximate of the total records in the table.
>
>
> If a query would request more records than the total in the table, 
> postgresql would not have a problem.
>
> e.g.  in a table with 15 entries I have tested a query with 'limit 15 
> offset 10' and there was no problem.  Only last 5 records were shown.
>
> Regards
> Johann
>
>
> On 17 April 2013 23:28, Niphlod > wrote:
>
>> check trunk as soon as the Pull Request gets merged
>>
>>
>> def test3(dbset, request_vars):
>> ##you can retrieve the current query with
>> ##dbset._select(), i.e. dbset is a "db(query)" object
>> 
>> ##request_vars are the current request.vars
>> ###so you can check for keywods etc etc etc
>> ###checking for groupby, distinct, etc is up to you!
>> 
>> ##given that is a callable you may cache it "externally"
>> ## with a return dbset.count(cache=(cache.ram, 60))
>> ## and do all your crazy things
>> 
>> ## method that works on postgresql only for a full-table-count
>> result = db.executesql("SELECT reltuples::integer FROM pg_class 
>> WHERE oid = 'public.awesome'::regclass;")
>> return result[0][0]
>>
>> def test2():
>> #mode1
>> cache_count = 127
>> #mode2
>> cache_count = (cache.ram, 60)
>> #mode3
>> cache_count = test3
>> grid = SQLFORM.grid(db.awesome, cache_count=cache_count)
>> return dict(grid=grid)
>>
>>
>> Can I leave documentation up to someone willing to test it and report 
>> back ? :P
>>
>>
>> PS: fixed 8 hours later.web2py is 
>> awesome
>>
>> PS2: on a test table with 4M rows, two 'string' fields, postgresql 9.2 
>> takes 1.9sec to do a "standard" count(*). 
>> Things definitely improved (although it remains a heavy operation for 
>> MVCC databases) on the count(*) since previous versions.
>>
>>
>>  -- 
>>  
>> --- 
>> You received this message because you are subscribed to the Google Groups 
>> "web2py-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to web2py+un...@googlegroups.com .
>> For more options, visit https://groups.google.com/groups/opt_out.
>>  
>>  
>>
>
>
>
> -- 
> Because experiencing your loyal love is better than life itself, 
> my lips will praise you.  (Psalm 63:3)
>  

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to web2py+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




Re: [web2py] Re: Using SQLFORM.grid with large datasets

2013-04-18 Thread Johann Spies
Thanks.

I will test your suggestions.

I am currently working with Postgresql 9.1 and it takes nearly 5 minutes to
count a table > 42 000 000 records.

It will if we can have the option in the grid to not do a count of the
result as it is part of the problem in my case.  If for instance the query
is db.table.id > 0  I would prefer to use the other way to get an
approximate of the total records in the table.


If a query would request more records than the total in the table,
postgresql would not have a problem.

e.g.  in a table with 15 entries I have tested a query with 'limit 15
offset 10' and there was no problem.  Only last 5 records were shown.

Regards
Johann


On 17 April 2013 23:28, Niphlod  wrote:

> check trunk as soon as the Pull Request gets merged
>
>
> def test3(dbset, request_vars):
> ##you can retrieve the current query with
> ##dbset._select(), i.e. dbset is a "db(query)" object
>
> ##request_vars are the current request.vars
> ###so you can check for keywods etc etc etc
> ###checking for groupby, distinct, etc is up to you!
>
> ##given that is a callable you may cache it "externally"
> ## with a return dbset.count(cache=(cache.ram, 60))
> ## and do all your crazy things
>
> ## method that works on postgresql only for a full-table-count
> result = db.executesql("SELECT reltuples::integer FROM pg_class WHERE
> oid = 'public.awesome'::regclass;")
> return result[0][0]
>
> def test2():
> #mode1
> cache_count = 127
> #mode2
> cache_count = (cache.ram, 60)
> #mode3
> cache_count = test3
> grid = SQLFORM.grid(db.awesome, cache_count=cache_count)
> return dict(grid=grid)
>
>
> Can I leave documentation up to someone willing to test it and report back
> ? :P
>
>
> PS: fixed 8 hours later.web2py is
> awesome
>
> PS2: on a test table with 4M rows, two 'string' fields, postgresql 9.2
> takes 1.9sec to do a "standard" count(*).
> Things definitely improved (although it remains a heavy operation for MVCC
> databases) on the count(*) since previous versions.
>
>
>  --
>
> ---
> You received this message because you are subscribed to the Google Groups
> "web2py-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to web2py+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>



-- 
Because experiencing your loyal love is better than life itself,
my lips will praise you.  (Psalm 63:3)

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to web2py+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




[web2py] Re: Using SQLFORM.grid with large datasets

2013-04-17 Thread Niphlod
check trunk as soon as the Pull Request gets merged


def test3(dbset, request_vars):
##you can retrieve the current query with
##dbset._select(), i.e. dbset is a "db(query)" object

##request_vars are the current request.vars
###so you can check for keywods etc etc etc
###checking for groupby, distinct, etc is up to you!

##given that is a callable you may cache it "externally"
## with a return dbset.count(cache=(cache.ram, 60))
## and do all your crazy things

## method that works on postgresql only for a full-table-count
result = db.executesql("SELECT reltuples::integer FROM pg_class WHERE 
oid = 'public.awesome'::regclass;")
return result[0][0]

def test2():
#mode1
cache_count = 127
#mode2
cache_count = (cache.ram, 60)
#mode3
cache_count = test3
grid = SQLFORM.grid(db.awesome, cache_count=cache_count)
return dict(grid=grid)


Can I leave documentation up to someone willing to test it and report back 
? :P


PS: fixed 8 hours later.web2py is 
awesome

PS2: on a test table with 4M rows, two 'string' fields, postgresql 9.2 
takes 1.9sec to do a "standard" count(*). 
Things definitely improved (although it remains a heavy operation for MVCC 
databases) on the count(*) since previous versions.

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to web2py+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




[web2py] Re: Using SQLFORM.grid with large datasets

2013-04-17 Thread Niphlod
currently not but it is planned to be configurable (at the very minimum, 
cacheable). 

Your method BTW works only if all the table is shown, doesn't take into 
account a possible condition passed to the grid (such as db.table.user_id 
== auth.user_id) or keyword queries.

I'll start working on that as soon as possible, shouldn't be hard.
Talking about "extensibility", what you'd like to pass as a parameter ?
I'm thinking to allow:
 - a callable (to which query and keywords are passed) --> you can pass the 
result of your custom query in this case, just watch out for the 
shortcomings explained earlier
 - an integer (which will trigger the "cache this count" for n seconds)

Would that be enough?

On Wednesday, April 17, 2013 12:58:27 PM UTC+2, Johann Spies wrote:
>
> By large datasets I mean a database of which one of the tables contain 
> more than 42 million records.
>
> Using SQLFORM.grid on such a dataset is very slow because the process 
> wants to count the total number of records.
>
> I am using Postgresql.  I can get an estimated size of the table by 
> replacing count(*) with 
>
> # SELECT reltuples::integer FROM pg_class WHERE oid = 
> 'isi.ritem'::regclass;
>  reltuples
> ---
>   42183232
> (1 row)
>
> And I get the result in a fraction of a second. When working on such a 
> scale the exact number of rows (count is also not always accurate on an 
> active database) is not a necessity.
>
> Is there a way I can replace the count(*) in the grid with the query 
> illustrated above?
>
> Regards
> Johann
> -- 
> Because experiencing your loyal love is better than life itself, 
> my lips will praise you.  (Psalm 63:3)
>  

-- 

--- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to web2py+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.