Re: Placement of temporary files by FileBasedSink

2016-10-27 Thread Chamikara Jayalath
> I believe (and hope) that behavior of IOChannelFactory.match() matches > the > > behavior of gsutil. > > > > On Thu, Oct 27, 2016 at 1:48 PM Chamikara Jayalath > > > wrote: > > > > BTW I'm in favor of using a sub-directory and possibly aski

Re: Placement of temporary files by FileBasedSink

2016-10-27 Thread Chamikara Jayalath
BTW I'm in favor of using a sub-directory and possibly asking users to update their glob pattern while also allowing users to optionally specify a temporary path in the future, as you propose. Thanks, Cham On Thu, Oct 27, 2016 at 1:45 PM Chamikara Jayalath wrote: > On Thu, Oct 27, 2016

Re: Placement of temporary files by FileBasedSink

2016-10-27 Thread Chamikara Jayalath
m with the same ACLs. It is also more UI friendly, > > easier > > > > to > > > > > clean up, and does more to explicitly indicate that this is really > > one > > > > > sharded file. Perhaps there's a pitfall I am overlooking? > > > > &

Re: Placement of temporary files by FileBasedSink

2016-10-20 Thread Chamikara Jayalath
Can this be prevented by moving temporary files (copy + delete individually) at finalization instead of copying all of them and performing a bulk delete ? You can support task failures by ignoring renames when the destination exists. Python SDK currently does this (and puts temp files in a sub-dire

Re: Subclassing iobase.Source and ptransform.PTransform in the Python SDK

2016-03-08 Thread Chamikara Jayalath
Hi Joseph, Python SDK currently does not support creating new sources. Sources that are currently available are backed by Google Dataflow service. Theoretically it should be possible to get new sources working just for DirectPipelineRunner by hacking the SDK but this has not been tested properly.