I am trying to parallelize a simple Spark program processes HBASE data in
parallel.// Get Hbase RDD
JavaPairRDD hBaseRDD = jsc
.newAPIHadoopRDD(conf, TableInputFormat.class,
ImmutableBytesWritable.class, Result.class);
long count = hBaseRDD.count(); Only two
Hi,
w.r.t. ElementTrackingStore, since it is backed by KVStore, there should be
other classes which occupy significant memory.
Can you pastebin the top 10 entries among the heap dump ?
Thanks
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Nobody has any idea... ?
Is filtering after aggregation in structured streaming supported but maybe
buggy? See following line in the example from earlier mail...
...
.where(F.expr("distinct_username >= 2"))
...
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
--