Spark UNEVENLY distributing data

Alchemist Sat, 19 May 2018 15:41:08 -0700

I am trying to parallelize a simple Spark program processes HBASE data in 
parallel.// Get Hbase RDD
    JavaPairRDD<ImmutableBytesWritable, Result> hBaseRDD = jsc
            .newAPIHadoopRDD(conf, TableInputFormat.class,
                    ImmutableBytesWritable.class, Result.class);
    long count = hBaseRDD.count(); Only two lines I see in the logs.  Zookeeper 
starts and Zookeeper stops
The problem is my program is as SLOW as the largest bar. Found that ZK is 
taking long time before shutting.18/05/19 17:26:55 INFO zookeeper.ClientCnxn: 
Session establishment complete on server :2181, sessionid = 0x163662b64eb046d, 
negotiated timeout = 40000 18/05/19 17:38:00 INFO zookeeper.ZooKeeper: Session: 
0x163662b64eb046d closed

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Spark UNEVENLY distributing data

Reply via email to