I have a client that needs roughly 250,000 items removed monthly from a file
containing 5-6 million records.

Since there is no need to use the keys (records) for any other purpose
except for deleting because they are old, a standard data/basic SELECT
statement is just about as fast as you can get. It isn't encumbered by the
number of selected keys as it's not retaining them for any purpose.

This simple program looks like this:

    OPEN "MASTER" TO F.MASTER ELSE STOP
    OPEN "REFERENCE" TO F.REFERENCE ELSE STOP
    DEL.DTE=ICONV("01/02/2003","D")
    SELECT F.MASTER
10 READNEXT ID ELSE STOP
    READV DTE FROM F.MASTER, ID, 5 ELSE GOTO 10
    IF DTE GT DEL.DATE THEN GOTO 10
    DELETE F.MASTER, ID
    DELETE F.REFERENCE, ID
    GOTO 10
END

Of course if there's additional use for the deleted keys, then use another
approach.

my 1 cent.
----- Original Message -----
From: "ashish ratna" <[EMAIL PROTECTED]>
To: "Anthony Youngman" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Monday, March 15, 2004 7:55 AM
Subject: RE: Help Needed regarding performance improvement of delete query


Hi Wol,

The scenario is that-

We have a master file having more than 3-4 million records and have
corresponding reference file which contains reference data for this master
file.

Now we start our purge program which selects records from master file on the
basis of date. Corresponding data should be deleted from the other file
(reference file).

For this requirement we have adopted the approach that- select the record
from master file on the basis of date. Save the list of these records, then
on the basis of this list select the records from reference file.

Issue is that this list contains more than 0.5 million and I want to take
few (say 10,000 at a time) record ids from this list for further processing.

Any pointers for this problem will be very helpful.

Thanks in advance.

Ashish.



-----Original Message-----
From: Anthony Youngman [mailto:[EMAIL PROTECTED]
Sent: Monday, March 15, 2004 4:50 PM
To: ashish ratna
Subject: RE: Help Needed regarding performance improvement of delete
query


Ahhh

I thought you were selecting records and deleting them. So the first
thousand would have disappeared, and you would obviously get a different
thousand next time round because the first lot would have gone :-)
Sounds like that's not the case.

In that case, do you have a field that is repeated across many records,
but where each individual value (or range of values) wouldn't return too
big a bunch of records? Or do you have a numeric id - you could declare
an index on that ...

Let's take that numeric id idea - and then you'll have to build on it
for yourself. Declare an i-descriptor as @ID[3]. That logically
partitions your file into a thousand pieces. Declare an index on this.
The first term in your select will then be "WITH IDESC EQ
whatever-number-you-want". That'll reduce the load on the database for
each pass, you'll just need to wrap it in a loop where "whatever-number"
goes from 0 to 999

Actually, what I would probably do is declare my i-descriptor as
INT(@ID/30) and then run this purge daily with "whatever-number" as
today's day. Obviously, it'll do nothing on the 31sts, and in March
it'll do two month's work on the 29th and 30th, but you've reduced the
"hit" on the system considerably.

Without knowing what you're doing in more detail, it's difficult to give
you any proper advice, but certainly I'll try and think of any tips that
you can build upon, like this. But you need to work out what's right for
you :-)

NB Any reason for taking this off the mailing list? By all means cc it
to me, but if you keep it on the list there are other people who may be
able to help too - I'm very good at overviews, but I fall short on the
logic - I think there is a way to get the next thousand records, but I
haven't got a clue what the syntax is ...

Cheers,
Wol

-----Original Message-----
From: ashish ratna [mailto:[EMAIL PROTECTED]
Sent: 15 March 2004 10:33
To: Anthony Youngman
Subject: RE: Help Needed regarding performance improvement of delete
query

Hi,

Thanks for the nice suggestions.
I have another question that, using SAMPLE once I process 1000 records
(using SAMPLE 1000), how can I select next 1000 records in next run
(i.e. 1001 to 2000 records and so on)?

I was trying few combinations but didn't succeeded. Can you tell me the
syntax for that?

Thanks again.

Regards,
Ashish.



-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Behalf Of Anthony Youngman
Sent: Monday, March 15, 2004 2:32 PM
To: U2 Users Discussion List
Subject: RE: Help Needed regarding performance improvement of delete
query


This might help speed things up a bit ...

Firstly, of course, is your file properly sized?

Secondly, (and in this case you will need to run the SELECT / DELETE
sequence several times) try putting a SAMPLE 1000 (or whatever number
makes sense) at the end of your select.

Basically, this will mean that the SELECT runs until it finds that
number of records and then stops. So each sequence won't load the system
so badly. Creating a huge select list will stress your ram badly ...
looping through this sequence won't stress the system so badly, though
you really do need to use indices to reduce the stress even more ...

Create an index on various fields that you're using as your select
criteria. If you're selecting old records, then you need to select on
date, and this really will make life both easy and fast. The more
closely you can guarantee that a select, acting on a single index, will
pick up only or mostly records that you are going to delete, the better.
That will SERIOUSLY reduce the time taken and the performance hit.

Cheers,
Wol

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
On Behalf Of ashish ratna
Sent: 15 March 2004 08:25
To: [EMAIL PROTECTED]
Subject: Help Needed regarding performance improvement of delete query

Hi All,

We are working for purging of old data from the database. But we are
facing performance problems in this.

We are using select query which is created dynamically on the basis of
number of records. We want to know if there is any limit for size of
query in Universe.

Although in universe help pdf it is mentioned that there is no limit for
the length of select query. But when we run the program on the file with
records more than 0.5 million it gave the error-

"Pid 14433 received a SIGSEGV for stack growth failure. Possible causes:
insufficient memory or swap space, or stack size exceeded maxssiz.

Memory fault(coredump)"

If there is no limitation on the size of query then please suggest some
other possible solution which can help us reducing the time of query and
completing the process successfully without giving the error.

Thanks in advance.

Regards,

Ashish.





************************************************************************
***********

This transmission is intended for the named recipient only. It may
contain private and confidential information. If this has come to you in
error you must not act on anything disclosed in it, nor must you copy
it, modify it, disseminate it in any way, or show it to anyone. Please
e-mail the sender to inform us of the transmission error or telephone
ECA International immediately and delete the e-mail from your
information system.

Telephone numbers for ECA International offices are: Sydney +61 (0)2
9911 7799, Hong Kong + 852 2121 2388, London +44 (0)20 7351 5000 and New
York +1 212 582 2333.

************************************************************************
***********

--
u2-users mailing list
[EMAIL PROTECTED]
http://www.oliver.com/mailman/listinfo/u2-users


--
u2-users mailing list
[EMAIL PROTECTED]
http://www.oliver.com/mailman/listinfo/u2-users

-- 
u2-users mailing list
[EMAIL PROTECTED]
http://www.oliver.com/mailman/listinfo/u2-users

Reply via email to