from:"Anil Langote"

Re: how to use lit() in spark-java

2018-03-23 Thread Anil Langote

You have import functions dataset.withColumn(columnName,functions.lit("constant")) Thank you Anil Langote Sent from my iPhone _ From: 崔苗 Sent: Friday, March 23, 2018 8:33 AM Subject: how to use lit() in spark-java To: Hi Guys, I want to add a constant

Re: Spark Inner Join on pivoted datasets results empty dataset

2017-10-19 Thread Anil Langote

Is there any limit on number of columns used in inner join ? Thank you Anil Langote Sent from my iPhone _ From: Anil Langote mailto:anillangote0...@gmail.com>> Sent: Thursday, October 19, 2017 5:01 PM Subject: Spark Inner Join on pivoted datasets results empty d

Spark Inner Join on pivoted datasets results empty dataset

2017-10-19 Thread Anil Langote

100 records. Is there anything i am missing here is there any better way to pivot the multiple columns i can not do combine because my aggregation columns are array of doubles. The pivot1 & pivot2 dataset derived by same parent dataset the group by columns are same all i am doing is inner join on these two dataset with same group by columns why it doesn't work? Thank you Anil Langote

Issue with caching

2017-01-27 Thread Anil Langote

with same configuration it takes 40 mins why this is happening ? Best Regards, Anil Langote +1-425-633-9747

Re: Efficient look up in Key Pair RDD

2017-01-08 Thread Anil Langote

you Anil Langote +1-425-633-9747 From: ayan guha Date: Sunday, January 8, 2017 at 10:26 PM To: Anil Langote Cc: Holden Karau , user Subject: Re: Efficient look up in Key Pair RDD Have you tried something like GROUPING SET? That seems to be the exact thing you are looking for

Re: Efficient look up in Key Pair RDD

2017-01-08 Thread Anil Langote

use case. Best Regards, Anil Langote +1-425-633-9747 > On Jan 8, 2017, at 8:17 PM, Holden Karau wrote: > > To start with caching and having a known partioner will help a bit, then > there is also the IndexedRDD project, but in general spark might not be the > best tool for the j

Efficient look up in Key Pair RDD

2017-01-08 Thread Anil Langote

given key? Thank you Anil Langote

Spark Aggregator for array of doubles

2017-01-04 Thread Anil Langote

or in Java can be done in scala only how can define the aggregator which takes array of doubles as input, note that I have parquet file as my input. Any pointers are highly appreciated, I read that spark UDAF is slow and aggregators are the way to go. Best Regards, Anil Langote +1-425-633-9747

Re: Parquet with group by queries

2016-12-21 Thread Anil Langote

count(*), col1, col2, col3, aggregationFunction(doublecol) from table group by col1,col2,col3 having count(*) >1 The about queries group by columns will change similarly I have to run 100 queries on same data set. Best Regards, Anil Langote +1-425-633-9747 > On Dec 21, 2016, at 11:41 AM

Parquet with group by queries

2016-12-21 Thread Anil Langote

in this regard is appreciated. Best Regards, Anil Langote +1-425-633-9747

Re: DataSet is not able to handle 50,000 columns to sum

2016-11-11 Thread Anil Langote

are suggesting. Best Regards, Anil Langote +1-425-633-9747 > On Nov 11, 2016, at 7:10 PM, ayan guha wrote: > > You can explore grouping sets in SQL and write an aggregate function to add > array wise sum. > > It will boil down to something like > > Select attr1,attr2.

DataSet is not able to handle 50,000 columns to sum

2016-11-11 Thread Anil Langote

sults against the keys. Same process will be repeated for next combinations. Thank you Anil Langote +1-425-633-9747

Running yarn with spark not working with Java 8

2016-08-25 Thread Anil Langote

Hi All, I have cluster with 1 master and 6 slaves which uses pre-built version of hadoop 2.6.0 and spark 1.6.2. I was running hadoop MR and spark jobs without any problem with openjdk 7 installed on all the nodes. However when I upgraded openjdk 7 to openjdk 8 on all nodes, spark submit and spark-

Append is not working with data frame

2016-04-20 Thread Anil Langote

days of data. ∂ Thank you Anil Langote > On Apr 20, 2016, at 1:12 PM, Wei Chen wrote: > > Found it. In case someone else if looking for this: > cvModel.bestModel.asInstanceOf[org.apache.spark.ml.classification.LogisticRegressionModel].weights > > On Tue, Apr 19, 2016 at

Re: how to use lit() in spark-java

Re: Spark Inner Join on pivoted datasets results empty dataset

Spark Inner Join on pivoted datasets results empty dataset

Issue with caching

Re: Efficient look up in Key Pair RDD

Re: Efficient look up in Key Pair RDD

Efficient look up in Key Pair RDD

Spark Aggregator for array of doubles

Re: Parquet with group by queries

Parquet with group by queries

Re: DataSet is not able to handle 50,000 columns to sum

DataSet is not able to handle 50,000 columns to sum

Running yarn with spark not working with Java 8

Append is not working with data frame

14 matches

Site Navigation

Mail list logo

Footer information