Arr: Array[Float] = new Array(1000)
fill data into tmpArr
val rddBatch = sparkContext.parallelize(batchArr, 100)
rddBatch.cache()
rddBatch.first()
globalRddList.append(rddBatch)
}
```
Best regards,
maqy
Hi Jinxin,
Thanks for your suggestions, I will try to use foreachpartition later.
Best regards,
maqy
发件人: Tang Jinxin
发送时间: 2020年4月23日 7:31
收件人: maqy
抄送: Andrew Melo; user@spark.apache.org
主题: 回复:Can I collect Dataset[Row] to driver without converting it toArray [Row]?
Hi maqy,
Thanks for
Hi Jinxin,
Thanks for your suggestions, I will try to use foreachpartition later.
Best regards,
maqy
发件人: Tang Jinxin
发送时间: 2020年4月23日 7:31
收件人: maqy
抄送: Andrew Melo; user@spark.apache.org
主题: 回复:Can I collect Dataset[Row] to driver without converting it toArray [Row]?
Hi maqy,
Thanks for
Hi Jinxin,
Thanks for your suggestions, I will try to use foreachpartition later.
Best regards,
maqy
发件人: Tang Jinxin
发送时间: 2020年4月23日 7:31
收件人: maqy
抄送: Andrew Melo; user@spark.apache.org
主题: 回复:Can I collect Dataset[Row] to driver without converting it toArray [Row]?
Hi maqy,
Thanks for
, and after a few minutes, the shell will report
this error.
Best regards,
maqy
发件人: Tang Jinxin
发送时间: 2020年4月22日 23:16
收件人: maqy
抄送: user@spark.apache.org
主题: 回复:[Spark SQL] [Beginner] Dataset[Row] collect to driver
throwjava.io.EOFException: Premature EOF: no length prefix available
Maybe
network(use collect()) is too large, and the
deserialization seems to take some time.
Best wishes,
maqy
发件人: Andrew Melo
发送时间: 2020年4月22日 21:02
收件人: maqy
抄送: Michael Artz; user@spark.apache.org
主题: Re: Can I collect Dataset[Row] to driver without converting it toArray
[Row]?
On Wed, Apr 22, 2020 at
Today I meet the same problem using rdd.collect (), the format of rdd is
Tuple2 [Int, Int]. And this problem will appear when the amount of data reaches
about 100GB.
I guess there may be something wrong with deserialization. Has anyone else
encountered this problem?
Best regards,
maqy
I will traverse this Dataset to convert it to Arrow and send it to Tensorflow
through Socket.
I tried to use toLocalIterator() to traverse the dataset instead of collect to
the driver, but toLocalIterator() will create a lot of jobs and will bring a
lot of time consumption.
Best regards,
maqy
driver and keep its data format?
Best regards,
maqy