Re: Managing EOF caching

Ken Anderson Mon, 23 Jan 2006 09:47:10 -0800

Jerry,

I understand the confusion - particularly since WO is doing these updates with a primary key in the where clause, and therefore, the DB should use that first, right?

Unfortunately, most SQL databases are not tuned this way. If there are many more attributes in the where clause than meet a finite number of circumstances, they usually resort to what's called a 'table scan', which means it just goes from top to bottom like an array. In my experience with large data sets, this can be catastrophic.

You have to look at it from the database point of view - if you know the primary key of the row, and you know you only want 1 row, why the heck did you put all these other fields in the where clause? To them, it means they're going to have to construct a complex data set, building subsets of subsets, trying to optimize using indexes but often failing.

In my experience, you should have a single key field, and a single lock field. While many people advocate the datetime field for a lock, I don't like it. Databases sometimes have a hard time comparing date fields, and sometimes they get shuffled inside your app and may end up with a different time zone, missing milliseconds, or have a resolution problem between the database representation and NSTimestamp. I advocate an integer, ever increasing, as the lock field.

To put this into practice, I subclass EOEditingContext and override saveChanges(). In it, I create an instance of a Transaction EO, and tell all the EO's being inserted or updated to takeValueForKey:@"transaction". When the save completes, all the EOs that were updated or inserted have the same integer in the trans_id field (the foreign key to the db_transaction table or whatever you call it).

My db updates always look like:

update table set blah=blah, trans_id = 1203 where oid = 42 and trans_id = 1093

For large tables, I'll put an index on trans_id so that the database can just get an intersection of the 2 indexes and voila! your record is updated (or not).

Ken

On Jan 20, 2006, at 5:15 PM, Jerry W. Walker wrote:

Hi, Ken,

Clearly, the results of Mike Schrag's small test confirmed your assertion about the databases doing more work. It wasn't entirely clear to me that they would or why they would.

I would challenge your assertion that this is not WO related, however, since nearly all WO applications depend on an underlying database and significant optimizations of the interaction with that database can be helpful in most cases and critical in a few.

My original assertion that the impact of reducing the locking fields (or reducing the concomitant clauses in the SQL WHERE clause) would have little impact stemmed from two sources, 1) my relative ignorance of the underlying internal database mechanisms, and 2) my resistance to premature optimization in development, particularly those stemming from untested heuristics.

Given the recent chat session we had about this topic, I would ask that you share some of that wisdom about database internals (and why this is a good idea) with the larger community. :-)

Furthermore, I would love to see your particular response to this problem offered to the community as well.

Fond regards,
Jerry

On Jan 20, 2006, at 4:39 PM, Ken Anderson wrote:

This is all database related, not WebObjects related. Larger where clauses will undoubtedly cause the database to perform more work.

On Jan 20, 2006, at 2:34 PM, Mike Schrag wrote:

Locking on one date column (which i think was the example earlier?) ends up with:

Main.main: 10000 with all columns locked = 25574ms
Main.main: 10000 with one column locked = 14125ms

I think it just really depends on the type of app you have ... If you're doing 100,000 updates, you probably care that it takes an extra 190 seconds (or whatever) to do the update. But if you're just updating periodically "normal" datasets, I really don't think the extra milliseconds are going to get you that much.

ms

On Jan 20, 2006, at 1:20 PM, Jerry W. Walker wrote:

Hi, Mike,

Thanks for taking the time to do this.

If I understand your test correctly, in each of the two cases, you've looped through the 10,000 EOs updating a single one of your 12 attributes. You then did a saveChanges to update the database, resulting in 10,000 individual SQL updates hitting the database, correct?

So, the result was 10,000 round trips to the DB, one for each row updated?

If all of that is correct, and if it's a trivial change to test, would it be possible to test the difference between a SINGLE locked column vs all columns locked?

You've taken the time to do something helpful here and I don't want to reward you by suggesting you need to do a lot more work, but the test I'm suggesting would put the results more in line with the original contention, that locking a single attribute saves a significant amount over locking all the attributes (in your case, 12).

If you would rather not take the tests any further, would you mind sending me a tarball of your current test project? I would be pleased to do it.

If all of the above is correct, your results say that the difference for a single update is approximately:

27.753 sec / 10,000 = 2.7 ms / update

vs.

7.111 / 10,000 = .7 ms / update

So we've achieved a 2 ms savings per update between no checking and checking 12 columns.

If that's the case, my untested (ignorant?) opinion that the difference is insignificant (and is probably less than 10 ms) stands corrected. Much as I hate to be wrong, my yearning for facts over both my ignorant opinions and other WO "old wive's tales" leads me to offer you unequivocal thanks for your efforts.

Regards,
Jerry

On Jan 20, 2006, at 12:54 PM, Mike Schrag wrote:

I haven't done any timings, but I find it hard to believe that this technique will effect any significant savings for most databases. As Art indicates, the only effect of reducing the number of lock columns is to correspondingly reduce the number of columns in the WHERE clause for updates and deletes.
I've always wondered this, too, so I threw together a quick test. I created two identical entities which have 3 ints, 3 booleans, 3 strings, and 3 dates as properties, but one locks all attrs and one locks none. I then inserted 10,000 of each into the db, followed by an update of the same attribute for each with the same change in value.

This was run on the latest version of FrontBase on an Intel iMac with WO 5.3:

Main.main: 10000 with all columns locked = 25753ms
Main.main: 10000 with no columns locked = 7111ms

So in this test, it is 3.6x faster to not lock any columns in this example, with this database, one this hardware. Obviously db benchmarks are never this simple, so take this with a grain of salt.

ms

--
__ Jerry W. Walker,
WebObjects Developer/Instructor for High Performance Industrial Strength Internet Enabled Systems

   jerrywwalker@gee-em-aye-eye-ell.com
   203 278-4085       office

_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list ([email protected])
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/webobjects-dev/lists%40anderhome.com

This email sent to [EMAIL PROTECTED]

--
__ Jerry W. Walker,
WebObjects Developer/Instructor for High Performance Industrial Strength Internet Enabled Systems

   jerrywwalker@gee-em-aye-eye-ell.com
   203 278-4085       office

 _______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list      ([email protected])
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com


This email sent to [email protected]

Re: Managing EOF caching

Reply via email to