But... using an external select & then reading in the records means grabbing the records from disk or cache twice.

Like most things, it's a judgement call. Are you going to select the entire file and evaluate each record in basic? No brainer; use an internal select. Sre you sorting? then maybe the external select makes sense. Selecting a subset for processing? hmm... the next question I'd ask is, "can I use an index?". Yes? External select. No? Internal.

If it's a big enough file relative to my box size, and I need to sort a subset, I might even build a temp file & populate it. SSELECT'ing 800Mb versus doing an internal select & then sorting the 2Mb I actually need? I might go for choice # 2.

Unlike the 4-letter word which we shall not mention here, this is a situational call.

I did, however, agree with the notion that it's ok to allow programmers to use the 4-letter word when the power is out, but only then.



"Our greatest duty in this life is to help others. And please, if you can't help them, could you at least not hurt them?" - H.H. the Dalai Lama "When buying & selling are controlled by legislation, the first thing to be bought & sold are the legislators" - P.J. O'Rourke
Dan Fitzgerald





From: "Allen E. Elwood" <[EMAIL PROTECTED]>
Reply-To: u2-users@listserver.u2ug.org
To: <u2-users@listserver.u2ug.org>
Subject: RE: [U2] Fw: More U2 programming hints
Date: Tue, 4 Oct 2005 14:45:04 -0700

This is the way it was explained to me way back in '88. The internal select
is slower on the whole file, but immediate in response.  It works the same
as LIST.  If I list a file with 2,000,000 records I get immediate response.

If I want to process an entire file, then external select is slower on
response, i.e. I have to wait for 2 million records to be selected before
processing begins, but is quicker in processing all records.

The internal is slower due to the system having to stop what it's doing,
find the next group, break out the individual ID's from that group, and then
return it to the program - over and over again as it makes it's way through
the file.

hth!

Allen

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Stevenson,
Charles
Sent: Tuesday, October 04, 2005 14:24
To: u2-users@listserver.u2ug.org
Cc: Louis Windsor
Subject: RE: [U2] Fw: More U2 programming hints


This is a bit disconcerting.
BASIC SELECT should be faster than EXECUTE "SELECT..."
Maybe the smart people can weigh in on this:

> From: Louis Windsor
>
> A few years ago we used the BASIC SELECT FILE as opposed to
> the EXECUTE "SELECT FILE".
>
> We updated UniVerse (don't ask from what version to what
> version as I don't remember) and overnight ALL our programs
> ran five or six times longer.

Completely contrary to my experience and counter-intuitive, too.

> We were told (by VMark) that the BASIC SELECT now selected
> each group but it could be optioned to work the "old" way.

Hmmm, do I vaguely, hazily remember something about that?  Maybe on this
list? Maybe in release notes?  No uvconfig option jumps out at me.
I don't think flavor would matter, or $OPTIONS [-]VAR.SELECT.
$OPTIONS FSELECT  would slow the BASIC SELECT down to approximately the
same as EXECUTE "SELECT...",  but not make it slower.
Louis, do you, perchance, use $OPTIONS FSELECT?  Maybe buried in a
$include file common to every program?

> I wrote a conversion program to change ALL BASIC SELECTs to
> executed SELECTs in the source and recompiled and that is the
> way we have done it ever since.
>
> I don't know if things are different now but we have grown to
> prefer EXECUTEd selects as selection criteria can be included.

Louis, can you run a simple benchmark and see if this is still true?
Or show us an example of your own?

  INTERNAL:
    OPEN "[really big file]" TO F ELSE STOP
    CRT 'I1', TIMEDATE(), SYSTEM(9)
    SELECT F
    CRT 'I2', TIMEDATE(), SYSTEM(9)
    LOOP WHILE READNEXT ID
       READ REC FROM F, ID ELSE NULL
    REPEAT
    CRT 'I3', TIMEDATE(), SYSTEM(9)

  EXECUTED:
    OPEN "[really big file]" TO F ELSE STOP
    CRT 'E1', TIMEDATE(), SYSTEM(9)
    EXECUTE "SELECT [really big file]"
    CRT 'E2', TIMEDATE(), SYSTEM(9)
    LOOP WHILE READNEXT ID
       READ REC FROM F, ID ELSE NULL
    REPEAT
    CRT 'E3', TIMEDATE(), SYSTEM(9)

(Run each a couple times, to allow for i/o differences in loading data
buffer cache.)

There should be virtually no elapsed time between I1:I2 above, but long
elapsed time between E1:E2.
I expect I2:I3 to approximately equal E2:E3.


Let me explain why this is counter-intuitive.

Normally, the BASIC SELECT statement itself does not actually do any
select on the file.  It merely sets things up behind the scenes so that
subsequent READNEXTs each get the next id from the file opened to
F.FILE, ("next" meaning as stored on disk).
UV keeps track of where it is in the file, unbeknownst to you.  Sorta
like it keeps track of where it is for REMOVE or attribute-level
EXTRACTs.



Exceptions to internal being faster than executed:

1.SSELECT FILEVAR  (i.e., 2 S's, SortSelect).
   You gotta read the whole file First to sort the keys.
   (and it's an alpha-type sort, even for numeric keys.)

2. $OPTIONS FESLECT
   Makes SELECT FILEVAR populate @SELECTED and to do so means traversing
the file.

3. Louis Windsor.  Poor bloke, they're out to get him.


Chuck Stevenson
-------
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/
-------
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/
-------
u2-users mailing list
u2-users@listserver.u2ug.org
To unsubscribe please visit http://listserver.u2ug.org/

Reply via email to