Yang, I appreciate your help! Please let me know if I can provide with any
other info.
I resubmitted my executable jar file as a step to the flink EMR and here’s are
all the exceptions. I see two of them.
I fished them out of /var/log/Hadoop/<STEP-ID>/syslog
2020-01-21 16:31:37,587 ERROR
org.apache.flink.streaming.runtime.tasks.StreamTask (Split Reader: Custom File
Source -> Sink: Unnamed (11/16)): Error during di
sposal of stream operator.
java.lang.NullPointerException
at
org.apache.flink.streaming.api.functions.source.ContinuousFileReaderOperator.dispose(ContinuousFileReaderOperator.java:165)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.disposeAllOperators(StreamTask.java:582)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:481)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)
2020-01-21 16:31:37,591 INFO org.apache.flink.runtime.taskmanager.Task (Split
Reader: Custom File Source -> Sink: Unnamed (8/16)): Split Reader: Custom File
Source -> Sink: Unnamed (8/16) (865a5a078c3e40cbe2583afeaa0c601e) switched from
RUNNING to FAILED.
java.lang.UnsupportedOperationException: Recoverable writers on Hadoop are only
supported for HDFS and for Hadoop version 2.7 or newer
at
org.apache.flink.runtime.fs.hdfs.HadoopRecoverableWriter.<init>(HadoopRecoverableWriter.java:57)
at
org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.createRecoverableWriter(HadoopFileSystem.java:202)
at
org.apache.flink.core.fs.SafetyNetWrapperFileSystem.createRecoverableWriter(SafetyNetWrapperFileSystem.java:69)
at
org.apache.flink.streaming.api.functions.sink.filesystem.Buckets.<init>(Buckets.java:112)
at
org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink$RowFormatBuilder.createBuckets(StreamingFileSink.java:242)
at
org.apache.flink.streaming.api.functions.sink.filesystem.StreamingFileSink.initializeState(StreamingFileSink.java:327)
at
org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178)
at
org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:160)
at
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96)
at
org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:281)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:878)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:392)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)
From: Yang Wang <[email protected]>
Date: Saturday, January 18, 2020 at 7:58 PM
To: Senthil Kumar <[email protected]>
Cc: "[email protected]" <[email protected]>
Subject: Re: Location of flink-s3-fs-hadoop plugin (Flink 1.9.0 on EMR)
I think this exception is not because the hadoop version isn't high enough.
It seems that the "s3" URI scheme could not be recognized by
`S3FileSystemFactory`. So it fallbacks to
the `HadoopFsFactory`.
Could you share the debug level jobmanager/taskmanger logs so that we could
confirm whether the
classpath and FileSystem are loaded correctly.
Best,
Yang
Senthil Kumar <[email protected]<mailto:[email protected]>> 于2020年1月17日周五
下午10:57写道:
Hello all,
Newbie here!
We are running in Amazon EMR with the following installed in the EMR Software
Configuration
Hadoop 2.8.5
JupyterHub 1.0.0
Ganglia 3.7.2
Hive 2.3.6
Flink 1.9.0
I am trying to get a Streaming job from one S3 bucket into an another S3 bucket
using the StreamingFileSink
I got the infamous exception:
Caused by: java.lang.UnsupportedOperationException: Recoverable writers on
Hadoop are only supported for HDFS and for Hadoop version 2.7 or newer
According to this, I needed to install flink-s3-fs-hadoop-1.9.0.jar in
/usr/lib/flink/lib
https://stackoverflow.com/questions/55517566/amazon-emr-while-submitting-job-for-apache-flink-getting-error-with-hadoop-recov<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstackoverflow.com%2Fquestions%2F55517566%2Famazon-emr-while-submitting-job-for-apache-flink-getting-error-with-hadoop-recov&data=02%7C01%7Csenthilku%40vmware.com%7Cabf42a493619411f2b8b08d79c8b7e76%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637149995250956538&sdata=bHwqfmuyzvc8DZuLRs4FZ4Cil%2Fbd7yaIEerD%2FTKe5eo%3D&reserved=0>
That did not work.
Further googling, revealed for Flink 1.9.0 and above: (according to this)
https://ci.apache.org/projects/flink/flink-docs-stable/ops/filesystems/<https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fci.apache.org%2Fprojects%2Fflink%2Fflink-docs-stable%2Fops%2Ffilesystems%2F&data=02%7C01%7Csenthilku%40vmware.com%7Cabf42a493619411f2b8b08d79c8b7e76%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C1%7C637149995250956538&sdata=CrsF6vY%2BmSMwqZF9%2FGSiRKdrESYITF7OEYpNMN%2BdI94%3D&reserved=0>
it seems that I need to install the jar file in the plugins directory
(/usr/lib/flink/plugins/s3-fs-hadoop)
That did not work either.
At this point, I am not sure what to do and would appreciate some help!
Cheers
Kumar