In driver, can I gc myArray after get a rdd by sparkContext.parallelize(myArray,100)

2020-08-31 Thread maqy
Arr: Array[Float] = new Array(1000)  fill data into tmpArr  val rddBatch = sparkContext.parallelize(batchArr, 100)  rddBatch.cache()  rddBatch.first()  globalRddList.append(rddBatch) } ``` Best regards, maqy

回复: 回复:Can I collect Dataset[Row] to driver without converting it toArray [Row]?

2020-04-23 Thread maqy
Hi Jinxin,  Thanks for your suggestions, I will try to use foreachpartition later.   Best regards, maqy 发件人: Tang Jinxin 发送时间: 2020年4月23日 7:31 收件人: maqy 抄送: Andrew Melo; user@spark.apache.org 主题: 回复:Can I collect Dataset[Row] to driver without converting it toArray [Row]? Hi maqy, Thanks for

回复: 回复:Can I collect Dataset[Row] to driver without converting it toArray [Row]?

2020-04-23 Thread maqy
Hi Jinxin,  Thanks for your suggestions, I will try to use foreachpartition later.   Best regards, maqy 发件人: Tang Jinxin 发送时间: 2020年4月23日 7:31 收件人: maqy 抄送: Andrew Melo; user@spark.apache.org 主题: 回复:Can I collect Dataset[Row] to driver without converting it toArray [Row]? Hi maqy, Thanks for

回复: 回复:Can I collect Dataset[Row] to driver without converting it toArray [Row]?

2020-04-23 Thread maqy
Hi Jinxin,  Thanks for your suggestions, I will try to use foreachpartition later.   Best regards, maqy 发件人: Tang Jinxin 发送时间: 2020年4月23日 7:31 收件人: maqy 抄送: Andrew Melo; user@spark.apache.org 主题: 回复:Can I collect Dataset[Row] to driver without converting it toArray [Row]? Hi maqy, Thanks for

回复: 回复:[Spark SQL] [Beginner] Dataset[Row] collect to driver throwjava.io.EOFException: Premature EOF: no length prefix available

2020-04-22 Thread maqy
, and after a few minutes, the shell will report this error.   Best regards, maqy 发件人: Tang Jinxin 发送时间: 2020年4月22日 23:16 收件人: maqy 抄送: user@spark.apache.org 主题: 回复:[Spark SQL] [Beginner] Dataset[Row] collect to driver throwjava.io.EOFException: Premature EOF: no length prefix available Maybe

回复: Can I collect Dataset[Row] to driver without converting it toArray [Row]?

2020-04-22 Thread maqy
network(use collect()) is too large, and the deserialization seems to take some time.   Best wishes, maqy 发件人: Andrew Melo 发送时间: 2020年4月22日 21:02 收件人: maqy 抄送: Michael Artz; user@spark.apache.org 主题: Re: Can I collect Dataset[Row] to driver without converting it toArray [Row]? On Wed, Apr 22, 2020 at

回复: [Spark SQL] [Beginner] Dataset[Row] collect to driver throwjava.io.EOFException: Premature EOF: no length prefix available

2020-04-22 Thread maqy
Today I meet the same problem using rdd.collect (), the format of rdd is Tuple2 [Int, Int]. And this problem will appear when the amount of data reaches about 100GB. I guess there may be something wrong with deserialization. Has anyone else encountered this problem? Best regards, maqy

回复: Can I collect Dataset[Row] to driver without converting it to Array [Row]?

2020-04-22 Thread maqy
I will traverse this Dataset to convert it to Arrow and send it to Tensorflow through Socket. I tried to use toLocalIterator() to traverse the dataset instead of collect to the driver, but toLocalIterator() will create a lot of jobs and will bring a lot of time consumption. Best regards, maqy

Can I collect Dataset[Row] to driver without converting it to Array [Row]?

2020-04-22 Thread maqy
driver and keep its data format? Best regards, maqy