Re: [U2] Unidata index and short-circuit evaluation

Tony Gravagno Fri, 08 Mar 2013 15:48:48 -0800

Side comment:

I understand what Wil is saying and I think he has a valid point. But
I believe the value of the point is now insignificant. The tiny bit of
contention that Will brings up here is about just how much disk access
is done by any given process. Eliminate disk reads and the process
speeds up - or so it used to be.

I confess that over the last decade my certainty about such matters
has continued to dwindle, as well as my concern. It used to be easy
when there was a single fixed drum with some number of platters and
heads, sectors were 512 bytes, rotation rate was 5400rpm, and you
could measure memory + L2 cache and get some idea of where your data
was and what the latency to data was going to be like.

These days sites are running many different kinds of drives. SSD (HHD
and now SSHD) drives now have as much capacity as disk-based drives,
making "disk" almost as fast as "cache" (depending on what kind) - and
the cost has come down to nearly the same price as hard drives. These
tiny disks spinning in a box are almost as obsolete as 9-track tape.
Now couple that with virtualization where you have no idea which part
of the virtual machine is in cache. Couple that with RAID where
multiple drives and controllers add some latency, but also reduce some
disk hits with striping. That's just in the hardware, and I didn't
even mention caching controllers.

When working with traditional MV blobs we can also map processes to
memory, allocate more frames to processes to eliminate hits to
overflow and file space, and thus reduce the number actual file reads.
With U2 and other systems the OS caches for us, whether on disk swap
space or in memory.

So what IS a "disk read" anymore? A read from disk is completely
unrelated to actual disk activity these days. As I said above, we're
not even really talking about "disk" anymore. Sure, at a higher level
we just want to reduce the number of READ statements, regardless of
where the data comes from in the universe (oh, I didn't mention
virtualized data in the cloud either...) but these days, a READ
statement is more like a "virtual" read, just telling the system to
get the data from wherever it is now - it's not a directive to go to
"disk".

I've lost touch with all of the places where data can be. But I also
realized a while back that it's futile to beat my head against a wall
trying to chase disk reads around for better performance this week,
because I'm never really going to have a good answer, and it's just
going to change next week anyway.

Well, that's how I see it. YMMV
Comments?
T 

> From: Wjhonson 
> I didn't miss it.
> The point of the request, was from the beginning to the ending.
> Of course the first *portion* will be quick and use few disk reads.
> I was discussing the full example.
> 

> From: Wol

> Wjhonson wrote:
> > If your file is small enough, and your system idle enough that the
file
> > remains *in memory* for all possible scenarios below, than you may
> > not notice speed issues.
> >
> > However, the monster in the kitchen, is the number of DISK READS
you
> > are doing.  If your prior reads get cycled out before they are
read again,
> > then you should run a single combined select which will do all
accesses
> > at the same instant.

> You missed the fact that the first select is based on an index. That
> should not go anywhere near the data anyway. So doing it before or
at
> the same time as the other selects is irrelevant.
> 
> But yes. I based my recommendations on minimizing the number of
> disk accesses ...

_______________________________________________
U2-Users mailing list
U2-Users@listserver.u2ug.org
http://listserver.u2ug.org/mailman/listinfo/u2-users

Re: [U2] Unidata index and short-circuit evaluation

Reply via email to