Hi Peter, I'm making a few assumptions on batch sizes in the flow, mainly a single flowfile (representing a batch) goes into JsonToSql, then many flowfiles are produced.
An inline ExecuteStream Command processor placed before the PutSQL processor may work. The ESC would contain a script that would delete the records from the old table, then pass the 'original' flow file to PutSQL. This would be almost within the same transaction. An additional validation flow could help; I could see the potential for database downtime occurring mid-batch. For the second issue (to prevent mixing of batches), you could set the flowfile threshold count to "1" on the input to PutSQL. This way back pressure is on except when empty. Additionally, a 1 minute run schedule on the penultimate processor would further ensure you pull 1 flowfile at a time. The delete-first-flowfile is an interesting idea. A putfile/getfile combo where new flowfiles overwrite older flowfiles of the same filename to disk would accomplish this. I don't think there is currently an in-memory version of this but... putting on my Rube Goldberg hat, I'm envisioning 2 updateAttribute processors and 1 RouteOnAttribute processor in a loop. Leveraging flowfile-priority (newest first) in the connections and RunScheduling on the updateAttribute processors, one *could* engineer conditions where non-new flowfiles do not get an attribute "refreshed" in-time and are subsequently routed out (or aged-out) of the loop. A few new challenges arise in this thought experiment, namely applying proper backpressure to the loop as a whole, and failures due to heavily loaded instances taking more time than expected. I'll think about this a bit more, run some tests and get back to you. Thanks, Lee On Aug 26, 2016, at 7:45 AM, Peter Wicks (pwicks) <pwi...@micron.com> wrote: I have a source SQL table that I’m reading with a SQL select statement. I want to kill and fill a destination SQL table with this source data on an interval. My non kill-and-fill pattern is: ExecuteSQL -> Avro To JSON -> JSON To SQL -> PutSQL. I’m trying to come up with a good way to delete existing data first before loading new data. One option I’ve considered is to mark the original Avro file with a UUID and add this attribute as a field in the destination table; then do a split off, ReplaceText, and delete all rows where the UUID doesn’t match this batch. I think this could work, but I’m worried about timing the SQL DELETE. I kind of want the kill and the fill steps to happen in a single transaction. The other issue is what happens if PutSQL has to go down for a while due to database downtime and I get several kill-and-fill batches piled up. Is there a way I can use backpressure to make sure only a single file gets converted from JSON to SQL at a time in order to avoid mixing batches? I also considered FlowFile expiration, but is there a way I can tell it NiFI to only expire a FlowFile when a new FlowFile has entered the queue? Ex: 1 flow file in queue, no expiration occurs. 2nd (newer) FlowFile enters queue then first file will expire itself. Thanks, Peter