On 06.11.2008, at 14:33, Yung-Luen Lan wrote:

Do you have any trick or pattern to recycle EC? Call System.gc()?

Work in batches, call ec.dispose(), set ec = null, create a new one.

What do you expect? It has to create 150k insert statements. One for each object, then copy these to the JDBC driver, that one executes them one by one on the database ... and so on. It's just a plain inefficient way of
creating rows in the table.

I'm not sure about this. Maybe group those SQL statements into one
transaction could help?
(Or is that already been done with EOF?)

That is already done.

Again: you are running out of memory. Read the error message.

First of all: EOF is not build for bulk operations. If you want to do
something like that, you need to find other ways. What I prefer to do for
something like that:

- use ERXFetchSpecificationBatchIterator and iterate over the objects in
small batches of 100 or 200 rows

- create the CVS file on disk with an output stream that doesn't keep the
whole thing in memory

- deliver the file when the operation is done

Yeah, those are actually what I do taking your suggestions--except the
first one. I'm still learning what is a fetch specification. Thanks
again.

Oh, I see. If you iterate over hundreds of thousands of objects, you need to clear out the ec (see above).

Think about what is going on in that case. You have a 150000 objects in Java land, you might have relationships, you create string representations for each and every import statement, maybe more than one string per statements - remember String is immutable and the GC always comes too late -, you pass that to the JDBC adaptor as one transaction so that one keeps it's own copy of the statements (not sure, but likely if you expect the worst case), and so on. If each of your objects is around 1k of size, you 150000k or around 146MB and that is just to keep the objects around. What did you say how much
memory do you give your Java apps?

I totally agree your point. Let me break this into two parts--space and time.

Memory: it appears that EC isn't suited for insert a lot EO into
database because the memory footstep of EC work like this: hold
objects in memory, discard or save them to db at once. Other ORM tool
like activerecord or python.db seems don't have the concept of EC;

Yes. And it's not a thing of the editing context that the memory runs out but a thing of Java, your memory settings for the app and so on.

But really, bulk operations need some special handling in any decent persistance framework.

Performance: I did some benchmark on my database. 150,000 insertion on
the same table:

Raw SQL, transaction: 23s
Raw SQL, no transaction: 154s
EC, saveChanges every 1000 EO inserted: 273s

Comparing raw SQL without transaction, EC method is not bad at all.
(only 1.7x slower) I don't care about to reduce the wait time from
five minutes to half. Totally acceptable. :-)

Yeah, I thought so too. It's decent. Depending on the database structure that is. If you need real performance, you need to go to database specific things.

What we do in some cases is:

1. Create file with copy statements that use the PostgreSQL copy command into a temp table

2. Use "insert into the_table select * from the_temp_table  ..."

This performs way faster than anything else on PostgreSQL. It just depends what you need there.

Ah, if my previous post offense people, I apologize. What I really
mean is obsolutely not "why webobject is bad and old", but "This
should be done easily in 21 century. I must do something wrong. What's
the correct way?"

Don't do it that way. ;-)

The thing is, that EOF keeps an object graph in memory. If you just want to dump rows into a table, get rid of the object graph if you don't need it. That's where the memory and time overhead goes.

There is one thing to learn from here: do not expect EOF to be fast automatically if you are dealing with hundreds of thousands or millions of objects. Whenever I expect something to exceed a couple hundred objects, I create batches, do batch fetching / faulting, recycle editing contexts if I have to, or go to the low level. It's just something to be aware of. In many places unfortunately and you'll probably find more over time.

There are many many tricks you can and should use when doing bulk operations with EOF. It's something where you need to dig into, maybe ask here, watch the generated SQL in the logfile and so on. It's a big topic.

Personally I never found EOF to be particularly slow, one complaint lots of people expressed over time, but you definitely have to work WITH the tool not AGAINST it. And some natural or naive approaches are just plain fighting the tool. And if you fight WO, you have to be either incredibly good, or it will win. I'm not good enough to win against WO, therefore I try to not fight it ... ;-)

cug
_______________________________________________
Do not post admin requests to the list. They will be ignored.
Webobjects-dev mailing list      (Webobjects-dev@lists.apple.com)
Help/Unsubscribe/Update your Subscription:
http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com

This email sent to [EMAIL PROTECTED]

Reply via email to