I was getting the following error without it:-
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
No lease on /.gz.parquet (inode ): File does not exist. [Lease. Holder:
DFSClient_NONMAPREDUCE_, pendingcreates: 1]
I think that is due to deadlock.
I am a bit curious: why is the synchronization on finalLock is needed ?
Thanks
> On Oct 23, 2015, at 8:25 AM, Anubhav Agarwal wrote:
>
> I have a spark job that creates 6 million rows in RDDs. I convert the RDD
> into Data-frame and write it to HDFS. Currently it takes 3 minutes to write
> it
I have a spark job that creates 6 million rows in RDDs. I convert the RDD
into Data-frame and write it to HDFS. Currently it takes 3 minutes to write
it to HDFS.
I am using spark 1.5.1 with YARN.
Here is the snippet:-
RDDList.parallelStream().forEach(mapJavaRDD -> {
if (mapJava
I have a spark job that creates 6 million rows in RDDs. I convert the RDD
into Data-frame and write it to HDFS. Currently it takes 3 minutes to write
it to HDFS.
Here is the snippet:-
RDDList.parallelStream().forEach(mapJavaRDD -> {
if (mapJavaRDD != null) {