Hi,
Maybe you could share some code, so we could have a better picture of
what is going on.
Last time I had to read from HDFS (normally in our pipelines HDFS is
just a sink), we used FileIO:
https://beam.apache.org/documentation/sdks/javadoc/2.3.0/index.html?org/apache/beam/sdk/io/FileIO.html
It gives you ReadableFile(s). Which we read as regular files, in our
case, we converted each line to an object we expected for the next
Transformer.
Cheers,
Leonardo Campos
On 02.09.2018 00:05, Mahesh Vangala wrote:
Hello all -
I have installed a pseudo-distributed yarn and spark.
My beam pipeline reads a TextIO from file and it runs fine when I
launch the pipeline using --master spark://master.
However, I am having difficulties in getting this run with --master
yarn.
I am pretty sure using TextIO from a local file in yarn is causing
issues.
I did look into beam api beam.sdk.io.hadoop and spark, but no luck in
finding right info.
If you could nudge me in the right direction, that'd be great!
Thank you for your help.
Regards,
Mahesh
--MAHESH VANGALA
(PH) 443-326-1957
(WEB) MVANGALA.COM [1]
Links:
------
[1] http://mvangala.com