Re: Use derived column for other derived column in the same statement

2019-04-21 Thread Shraddha Shah
Also the same thing for groupby agg operation, how can we use one aggregated result (say min(amount)) to derive another aggregated column? On Sun, Apr 21, 2019 at 11:24 PM Rishi Shah wrote: > Hello All, > > How can we use a derived column1 for deriving another column in the same > dataframe

Use derived column for other derived column in the same statement

2019-04-21 Thread Rishi Shah
Hello All, How can we use a derived column1 for deriving another column in the same dataframe operation statement? something like: df = df.withColumn('derived1', lit('something')) .withColumn('derived2', col('derived1') == 'something') -- Regards, Rishi Shah

Usage of Explicit Future in Spark program

2019-04-21 Thread Chetan Khatri
Hello Spark Users, Someone has suggested by breaking 5-5 unpredictable transformation blocks to Future[ONE STRING ARGUMENT] and claim this can tune the performance. I am wondering this is a use of explicit Future! in Spark? Sample code is below: def writeData( tableName: String):

Re: Writing to Aerospike from Spark with bulk write with user authentication fails

2019-04-21 Thread Mich Talebzadeh
Just as an add on I see this in aerospike log Apr 21 2019 17:52:24 GMT: INFO (security): (security.c:5483) permitted | client: 50.140.197.220:33466 | authenticated user: mich | action: login | detail: user=mich *Apr 21 2019 17:52:25 GMT: INFO (security): (security.c:5483) not authenticated |

Writing to Aerospike from Spark with bulk write with user authentication fails

2019-04-21 Thread Mich Talebzadeh
Aerospike Enterprise version if anyone has worked with user authentication! As far as I know one can create a client with aerospike authentication as follows that works for single put import com.aerospike.spark.sql._ import com.aerospike.client.Bin import com.aerospike.client.Key import

Re: Spark job running for long time

2019-04-21 Thread rajat kumar
Hi Yeikel, I can not copy anything from the system. But I have seen explain output. It was doing sortMergeJoin for all tables. There are 10 tables , all of them doing left outer join. Out of 10 tables, 1 table is of 50MB and second table is of 200MB. Rest are big tables. Also the data is in