Re: [GraphFrame, Pyspark] Weighted Edge in PageRank

2016-12-01 Thread Weiwei Zhang
ighted edges. > > > _____ > From: Weiwei Zhang <wzhan...@dons.usfca.edu> > Sent: Thursday, December 1, 2016 2:41 PM > Subject: [GraphFrame, Pyspark] Weighted Edge in PageRank > To: user <user@spark.apache.org> > > > > Hi guys, > > I am trying to

[GraphFrame, Pyspark] Weighted Edge in PageRank

2016-12-01 Thread Weiwei Zhang
Hi guys, I am trying to compute the pagerank for the locations in the following dummy dataframe, *srcdes shared_gas_stations* A B 2 A C 10 C E 3 D E 12 E G 5 ... I have tried the

Configure Spark Resource on AWS CLI Not Working

2016-02-26 Thread Weiwei Zhang
Hi there, I am trying to configure memory for spark using AWS CLI. However, I got the following message: *A client error (ValidationException) occurred when calling the RunJobFlow operation: Cannot specify args for application 'Spark' when release label is used.* In the aws 'create-cluster'

Re: Behind the scene of RDD to DataFrame

2016-02-21 Thread Weiwei Zhang
conversions (from scala types to catalyst types) are involved but no > shuffling. > > Hemant Bhanawat <https://www.linkedin.com/in/hemant-bhanawat-92a3811> > www.snappydata.io > > On Sun, Feb 21, 2016 at 11:48 AM, Weiwei Zhang <wzhan...@dons.usfca.edu> > wrote: &g

Behind the scene of RDD to DataFrame

2016-02-20 Thread Weiwei Zhang
Hi there, Could someone explain to me what is behind the scene of rdd.toDF()? More importantly, will this step involve a lot of shuffles and cause the surge of the size of intermediate files? Thank you. Best Regards, Vivian

Pyspark SQL Join Failure

2015-12-19 Thread Weiwei Zhang
Hi all, I got this error when I tried to use the 'join' function to left outer join two data frames in pyspark 1.4.1. Please kindly point out the places where I made mistakes. Thank you. Traceback (most recent call last): File "/Users/wz/PycharmProjects/PysparkTraining/Airbnb/src/driver.py",

Is Feature Transformations supported by Spark export to PMML

2015-10-15 Thread Weiwei Zhang
Hi Folks, I am trying to find out if the Spark export to PMML has support for feature transformations. I know in R, I need to specify local transformations and attributes using the "pmml" and "pmmlTransformation" libraries. The example I read on Spark, simply apply "toPMML" function and it