I'm wanting to read a Sequence/Map file from Hadoop stored on Google Cloud
Storage via a " gs://bucket/link/SequenceFile-* " via the Python SDK.

I cannot locate any good adapters for this, and the one Hadoop Filesystem
reader seems to only read from a "hdfs://" url.

I'm wanting to use Dataflow and GCS exclusively to start mixing in Beam
pipelines with our current Hadoop Pipelines.

Is this a feature that is supported or will be supported in the future?
Does anyone have any good suggestions for this that is performant?

I'd also like to be able to write back out to a SequenceFile if possible.

Thanks!

Reply via email to