Re: Help in reading hadoop HDFS file into PCollection

Leonardo Campos | GameDuell Sat, 01 Sep 2018 23:39:02 -0700

Hi,

Maybe you could share some code, so we could have a better picture ofwhat is going on.

Last time I had to read from HDFS (normally in our pipelines HDFS isjust a sink), we used FileIO:

https://beam.apache.org/documentation/sdks/javadoc/2.3.0/index.html?org/apache/beam/sdk/io/FileIO.html

It gives you ReadableFile(s). Which we read as regular files, in ourcase, we converted each line to an object we expected for the nextTransformer.


Cheers,
Leonardo Campos

On 02.09.2018 00:05, Mahesh Vangala wrote:

Hello all -

I have installed a pseudo-distributed yarn and spark.
My beam pipeline reads a TextIO from file and it runs fine when I
launch the pipeline using --master spark://master.

However, I am having difficulties in getting this run with --masteryarn.I am pretty sure using TextIO from a local file in yarn is causingissues.

I did look into beam api beam.sdk.io.hadoop and spark, but no luck in
finding right info.
If you could nudge me in the right direction, that'd be great!
Thank you for your help.

Regards,
Mahesh

--MAHESH VANGALA

(PH) 443-326-1957
(WEB) MVANGALA.COM [1]

Links:
------
[1] http://mvangala.com

Re: Help in reading hadoop HDFS file into PCollection

Reply via email to