It seems you want to dedupe your data after the merge so set(a+b) should
also work..you may ditch the list comprehensiion operation.
On 5 Aug 2015 23:55, "gen tang" wrote:
> Hi,
> Thanks a lot for your reply.
>
>
> It seems that it is because of the slowness of the second code.
> I rewrite code a
Hi,
Thanks a lot for your reply.
It seems that it is because of the slowness of the second code.
I rewrite code as list(set([i.items for i in a] + [i.items for i in b])).
The program returns normal.
By the way, I find that when the computation is running, UI will show
scheduler delay. However, i
On Mon, Aug 3, 2015 at 9:00 AM, gen tang wrote:
> Hi,
>
> Recently, I met some problems about scheduler delay in pyspark. I worked
> several days on this problem, but not success. Therefore, I come to here to
> ask for help.
>
> I have a key_value pair rdd like rdd[(key, list[dict])] and I tried t
Hi,
Recently, I met some problems about scheduler delay in pyspark. I worked
several days on this problem, but not success. Therefore, I come to here to
ask for help.
I have a key_value pair rdd like rdd[(key, list[dict])] and I tried to
merge value by "adding" two list
if I do reduceByKey as fo