Hi,

I'm interested in a use case where there is a long running job producing 
results as it goes that may die and therefore must be restarted, making sure to 
continue from the last known-good point.

For this use case, it seems best to use the "IPC Streaming Format" and write 
out the batches as they are generated.

However, once the job is finished it would also be beneficial to have random 
access into the file. It seems like this is possible by:


  1.  Manually creating a file with the correct magic number/padding bytes and 
then seq'ing past them.
  2.  Writing batches out as they appear.
  3.  Doing a pass over the record batches to gather the information required 
to generate the footer data.

Whilst this seems possible, it doesn't seem like it is a use case that has come 
up before. However, this does surprise me because adding index information to a 
"completed" file seems like a genuinely useful thing to want to do.

Has anyone encountered something similar before?

Is there an easier way to achieve this? i.e. does this functionality, or parts 
of, exist in another language that I can bind to in Python?

Best,

Sam


IMPORTANT NOTICE: The information transmitted is intended only for the person 
or entity to which it is addressed and may contain confidential and/or 
privileged material. Any review, re-transmission, dissemination or other use 
of, or taking of any action in reliance upon, this information by persons or 
entities other than the intended recipient is prohibited. If you received this 
in error, please contact the sender and delete the material from any computer. 
Although we routinely screen for viruses, addressees should check this e-mail 
and any attachment for viruses. We make no warranty as to absence of viruses in 
this e-mail or any attachments.

Reply via email to