Thanks! This is great. I¹ll try it out!
On 12/19/07 3:01 PM, "Kieran Kelleher" <[EMAIL PROTECTED]> wrote: > And BTW, these massive insert operations are usually done in a background > thread (either autonomous or with LongResponse meta-refreshing page) and IIRC > (without looking at my code but I usually have one or a pool of OSC's just for > EOF intensive background threads) I am probably using a different > ObjectStoreCoordinator to the regular OSC(s) being used for user R-R handling. > Using the default OSC for a few 100 thousand may affect application > responsiveness. > > On Dec 19, 2007, at 1:32 PM, Kieran Kelleher wrote: > >> Do them one at a time using adaptor operations in a loop ( I regularly run >> some processes that insert 500,000 to 1,000,000 rows using this method and it >> is pretty fast. I forget what the numbers were, but it is acceptable. The >> code pattern was copied from ERX's updateRows and ChannelOperation, so the >> credit goes to Anjo for the channel handling logic. (see below this snippet >> for primary key handling.....). Anyway, copy these two methods to a static >> class and try it. Test it with a millisecond timer before you discount it as >> too slow. For me it is plenty fast enough ...... I forgot how much faster >> than regular editing context inserting and saving .... maybe 200 times faster >> or something ... I forget.......... also at the bottom of the email I have >> pasted in a method from a piece of code that uses these two static utility >> methods just to give you an idea of usage .... this is code from one of my >> apps, but it will give you an idea of how to do what you want ........ I >> colored the snippet where the work happens in red (not sure if mailing list >> will carry that or not) >> >> HTH, Kieran >> >> /** >> * Inserts a raw row. You are responsible to ensure it has a new valid >> primary key >> * @see EOAdaptorChannel#insertRow(NSDictionary, EOEntity) >> * @param ec >> * @param rawRow >> * @param entityName >> * @return 1 >> */ >> public static int insertRawRowForEntity(EOEditingContext ec, NSDictionary >> rawRow, String entityName){ >> EOEntity entity = entityNamed(entityName); >> String modelName = entity.model().name(); >> >> boolean wasOpen = true; >> EOAdaptorChannel channel = null; >> int rows = 0; >> ec.lock(); >> try { >> EODatabaseContext dbc = >> EOUtilities.databaseContextForModelNamed(ec, modelName); >> dbc.lock(); >> try { >> channel = dbc.availableChannel().adaptorChannel(); >> wasOpen = channel.isOpen(); >> if(!wasOpen) { >> channel.openChannel(); >> } >> channel.adaptorContext().beginTransaction(); >> try { >> channel.insertRow(rawRow, entity); >> channel.adaptorContext().commitTransaction(); >> } catch(RuntimeException ex) { >> channel.adaptorContext().rollbackTransaction(); >> throw ex; >> } >> } finally { >> if(!wasOpen) { >> channel.closeChannel(); >> } >> dbc.unlock(); >> } >> } finally { >> ec.unlock(); >> } >> // TODO : The JDBCChannel concrete subclass has a rowsProcessedCount, >> so perhaps use that >> return 1; >> } >> >> >> >> Before starting the batch insert, I grab an array of primary keys based on >> how many I need and just pull them form the array as I loop thru inserting. >> >> /** >> * @param ec >> * @param entityName >> * @return >> * @see >> com.webobjects.eoaccess.EOAdaptorChannel#primaryKeysForNewRowsWithEntity(int, >> EOEntity) >> */ >> public static NSArray primaryKeysForNewRowsWithEntity(EOEditingContext >> ec, int count, String entityName) { >> EOEntity entity = entityNamed(entityName); >> String modelName = entity.model().name(); >> NSArray primaryKeysArray = null; >> >> boolean wasOpen = true; >> EOAdaptorChannel channel = null; >> int rows = 0; >> ec.lock(); >> try { >> EODatabaseContext dbc = >> EOUtilities.databaseContextForModelNamed(ec, modelName); >> dbc.lock(); >> try { >> channel = dbc.availableChannel().adaptorChannel(); >> wasOpen = channel.isOpen(); >> if(!wasOpen) { >> channel.openChannel(); >> } >> channel.adaptorContext().beginTransaction(); >> try { >> primaryKeysArray = >> channel.primaryKeysForNewRowsWithEntity(count, entity); >> channel.adaptorContext().commitTransaction(); >> } catch(RuntimeException ex) { >> channel.adaptorContext().rollbackTransaction(); >> throw ex; >> } >> } finally { >> if(!wasOpen) { >> channel.closeChannel(); >> } >> dbc.unlock(); >> } >> } finally { >> ec.unlock(); >> } >> return primaryKeysArray; >> } >> >> >> >> ****** EXAMPLE OF USAGE (lifted from an existing app) ************** >> >> /** >> * OK, the EO direction was way too slow. Here is the faster more "bare >> metal" approach. >> * First simply deleteRowsDescribedByQualifier. >> * Then fetch all geoPoints. >> * Then insertRows >> * All done with creating EO's only because we need the speed performance >> * due to having 10's of thousands of rows per each geospec >> * TODO: Logic to update a radius by comparing to geoPoints and setting >> all isOverlapExclude to false >> */ >> public void resetGeoPointsInRadius(){ >> if (isNewObject()) { >> throw new RuntimeException("We cannot perform raw row operations >> on related geospatial points for a CTGeographySpec that is not yet saved" >> + ". The owner object in question is " + ownerObject()); >> >> } //~ if (isNewObject()) >> >> StopWatch stopWatch = new StopWatch(); >> stopWatch.start(); >> >> // First delete all related geopoints >> int rowsAffected = >> ERXEOAccessUtilities.deleteRowsDescribedByQualifier(editingContext(), >> CTGeoSpatialPoint.ENTITY_NAME, geoSpatialPointsQualifier()); >> if (log.isDebugEnabled()) >> log.debug("DELETE rowsAffected = " + rowsAffected); >> stopWatch.split(); >> if (log.isDebugEnabled()) >> log.debug("split time after deleting old geospatial points = " + >> stopWatch); >> stopWatch.unsplit(); >> >> // Sometimes we cannot get a geocoded centroid, so check this first >> if (centroidGeoPoint() != null) { >> // Recreate all related items in radius >> NSArray geoPointRawRows = >> GeoPoint.Utilities.pointsInRadius(centroidGeoPoint(), radiusMiles(), >> geoPointTypeEntityName()); >> >> // For speed requirements, we insert geospatial points directly >> as raw rows >> Object oidGeographySpec = rawPrimaryKey(); >> Number geoPointType = geoPointType(); >> Integer isOverLapExcludeDefault = new Integer(0); >> // grab primary keys for the new rows >> NSArray geoSpatialPointPrimaryKeys = >> WKEOUtils.primaryKeysForNewRowsWithEntity(editingContext(), >> geoPointRawRows.count(), CTGeoSpatialPoint.ENTITY_NAME); >> >> for (int i = 0; i < geoPointRawRows.count(); i++) { >> NSDictionary currentGeoPointRawRow = (NSDictionary) >> geoPointRawRows.objectAtIndex(i); >> NSMutableDictionary geoSpatialRawRow = new >> NSMutableDictionary(); >> geoSpatialRawRow.takeValueForKey(oidGeographySpec, >> CTGeoSpatialPoint.KEY_OID_GEOGRAPHY_SPEC); >> >> geoSpatialRawRow.takeValueForKey(currentGeoPointRawRow.valueForKey("oid"), >> CTGeoSpatialPoint.KEY_OID_GEO_POINT); >> geoSpatialRawRow.takeValueForKey(geoPointType, >> CTGeoSpatialPoint.KEY_GEO_POINT_TYPE); >> >> geoSpatialRawRow.takeValueForKey(currentGeoPointRawRow.valueForKey("distance" >> ), CTGeoSpatialPoint.KEY_DISTANCE); >> geoSpatialRawRow.takeValueForKey(isOverLapExcludeDefault, >> CTGeoSpatialPoint.KEY_IS_OVERLAP_EXCLUDE); >> >> NSDictionary pkDict = (NSDictionary) >> geoSpatialPointPrimaryKeys.objectAtIndex(i); >> // Merge the pk dict >> geoSpatialRawRow.addEntriesFromDictionary(pkDict); >> >> if (log.isDebugEnabled() && i < 10) { >> log.debug("geoSpatialRawRow = " + geoSpatialRawRow); >> } >> WKEOUtils.insertRawRowForEntity(editingContext(), >> geoSpatialRawRow, CTGeoSpatialPoint.ENTITY_NAME); >> } >> >> stopWatch.stop(); >> if (log.isDebugEnabled()) >> log.debug("Time to reset program geospec for " + program() + >> " = " + stopWatch); >> } else { >> if (log.isDebugEnabled()) >> log.debug("Could not get centroidGeoPoint for " + this); >> } //~ if (centroidGeoPoint() != null) >> >> } >> >> >> >> >> On Dec 19, 2007, at 12:45 PM, Charles Koppelman wrote: >> >>> So ERX provides us with a great way to do massive updates >>> (ERXEOAccessUtilities.updateRowsDescribedByQualifier). Is there a way that >>> anyone on this list has come up with to do massive INSERTs? >>> >>> Looking at the code, it seems that inserts are done exclusively through >>> insertRow methods, meaning one insert statement per row. However, SQL >>> supports multiple inserts into the same table like: >>> >>> INSERT INTO MY_TABLE >>> (ID, COLUMN_A, COLUMN_B) >>> VALUES (1, 'A', 'B'), >>> (2, 'C', 'D'), >>> (3, 'E', 'F'); >>> >>> Has anyone implemented this? >>> >>> Charles Koppelman >>> >>> Programmer >>> Youth For Understanding, USA >>> >>> _______________________________________________ >>> Do not post admin requests to the list. They will be ignored. >>> Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) >>> Help/Unsubscribe/Update your Subscription: >>> http://lists.apple.com/mailman/options/webobjects-dev/kieran_lists%40mac.com >>> >>> This email sent to [EMAIL PROTECTED] >>> >> > >
_______________________________________________ Do not post admin requests to the list. They will be ignored. Webobjects-dev mailing list (Webobjects-dev@lists.apple.com) Help/Unsubscribe/Update your Subscription: http://lists.apple.com/mailman/options/webobjects-dev/archive%40mail-archive.com This email sent to [EMAIL PROTECTED]