Well, in the line of stuff I'm using templates to build, a 15% difference
between hash and array access (for possibly millions upon millions of
database rows) is pretty significant.  I'm already buffering and iterating
through the rows, so that's not a problem.  The hash building does cause a
huge difference in mem-usage. (obvious).

The constant folding really does the trick, and with a little setup on my
part (hidden in my own subclass of Template.pm that has some pre-built
defaults etc.) the template syntax is totally transparent.  DBI currently
has a "misfeature" (something that actually works the way you would like
until it is fixed and then your software breaks) in that the
fetchrow_hashref() actually doesn't give you the same hashref each time as
fetchrow_arrayref() does... So until that is fixed, I have to double-up on
the hash copying into my "iterator" ...   Which is one of the reasons why my
particular DBI situation is a tad inefficient without constant-folding at
the moment...

But maybe I'll set my eyes on DBI's implementation and see if constant
folding might better be done there...   Though beyond the scope of this
mailing list, I could see potential for a "fetchrow_resultset()" that gives
you an hash-like access (and indeed could be tied) but would use constant
folding transparently...

or... Maybe I'll make my own incredibly unsafe and insecure (but very quick)
Stash replacement that access the data directly.....

Awe... never mind.  I always try to over-engineer things.  ;-)


But, for now, I'm sticking with the constant folding; I'm very impressed
with the performance difference (it's measurable in both cpu and memory
usage terms).  Anything that can be "factored" out of the run-time
interpretation phase and brought (transparently) into the template parsing
stage (that only happens once at the beginning, usually) is an obvious gain
when the interpretation phase happens millions of times.

Thanks for the insightful comments!

-Bryan



> -----Original Message-----
> From: Perrin Harkins [mailto:[EMAIL PROTECTED]]
> Sent: Monday, July 08, 2002 10:23 PM
> To: Shannon, Bryan
> Cc: '[EMAIL PROTECTED]'
> Subject: Re: [Templates] DBI Hash access faster? Recommendations?
> 
> 
> Shannon, Bryan wrote:
> > To make the template more meaningful, I'm using 
> fetchrow_hashref() in DBI to
> > provide template variables that are easy for template 
> writers to write;
> > 
> > This works quite well, but for large queries, the program 
> spends most of its
> > time in Template::Stash::XS::get() (about 36% percent of 
> its time, according
> > to dprofpp)
> 
> This is unlikely to change much if you use arrays.  The Stash access 
> shows up on your profile because it does quite a bit of work: 
> expanding 
> the dot notation, testing what kind of variable or object it has, etc.
> 
> > ... Also, the overhead of using a hash for each row of data gets
> > impeding with large rowcounts.
> 
> If memory overhead is the problem, you can avoid fetching all 
> the data 
> at once by using an iterator that fetches one row at a time. 
> (Technically DBI fetches multiple rows at once with most 
> databases, but 
> you can avoid having more than what it's configured to fetch 
> in memory 
> at one time.)  I think there's an example in the DBI plugin that will 
> show you how to do it.
> 
> > So, what I'm thinking about doing is creating something 
> that will allow me
> > to supply a "map" that will allow template writer's to still use [%
> > data.field_a %] [% data.fieldb %] etc, but will use the map 
> to map field_a
> > into it's column-ordered position in an array; This way I can use
> > fetchrow_arrayref() instead of fetchrow_hashref() (for the sake of
> > memory/speed efficiency).
> 
> IIRC, the reason fetchrow_hashref is slow has more to do with 
> looking up 
> the names of the columns, than with the fact that it's a 
> hash.  If you 
> just use fetchrow_arrayref (or better yet, bind_cols) and 
> then put those 
>   values into your own hashref, it may be faster.
> 
> > This would sort of be a Stash equivalent of a
> > Perl pseudo-hash
> 
> Part of the reason pseudo-hashes got dropped from Perl is that they 
> weren't that much faster than hashes.  Seriously, my benchmarks have 
> shown only about 15% speed difference between arrays and hashes, and 
> that was ages ago.  I think you are much too worried about 
> the speed of 
> hash lookups.
> 
> > I'm not letting templates use DBI themselves, the templates 
> just provide the
> > presentation for the rows, and the actual program engine 
> decides how to best
> > perform the query... It just provides the data as hashes right now.
> 
> Good plan.  I think you should stick with it that way, but 
> maybe write 
> your own iterator class around a statement handle to take care of the 
> memory problem you're having.  You can pass that iterator 
> object to the 
> template instead of a list of hashes.
> 
> - Perrin
> 


Reply via email to