from:"孫澤恩"

Re: How to merge fragmented IDs into one cluster if one/more IDs are shared

2017-10-05 Thread 孫澤恩

Hi there, About GraphX, i thing that the graph process is parse you data into (VertexA) - [Edge1] - (VertexB). As we see the Graph class of GraphX contains edges and vertices. Such that, in the first line of your data would be parse to uuid_3_1,uuid_3_2,uuid_3_3,uuid_3_3 as vertices.

How to read LZO file in Spark?

2017-09-27 Thread 孫澤恩

Hi All, Currently, I follow this blog http://blog.cloudera.com/blog/2009/11/hadoop-at-twitter-part-1-splittable-lzo-compression/ that I could use hdfs dfs -text to read the LZO file. But I want to

Re: partitionBy causing OOM

2017-09-25 Thread 孫澤恩

Hi, Amit, Maybe you can change this configuration spark.sql.shuffle.partitions. The default is 200 change this property could change the task number when you are using DataFrame API. > On 26 Sep 2017, at 1:25 AM, Amit Sela wrote: > > I'm trying to run a simple pyspark