You have import functions
dataset.withColumn(columnName,functions.lit("constant"))
Thank you
Anil Langote
Sent from my iPhone
_
From: 崔苗
Sent: Friday, March 23, 2018 8:33 AM
Subject: how to use lit() in spark-java
To:
Hi Guys,
I want to add a constant
Is there any limit on number of columns used in inner join ?
Thank you
Anil Langote
Sent from my iPhone
_
From: Anil Langote mailto:anillangote0...@gmail.com>>
Sent: Thursday, October 19, 2017 5:01 PM
Subject: Spark Inner Join on pivoted datasets results empty d
100 records.
Is there anything i am missing here is there any better way to pivot the
multiple columns i can not do combine because my aggregation columns are
array of doubles.
The pivot1 & pivot2 dataset derived by same parent dataset the group by
columns are same all i am doing is inner join on these two dataset with
same group by columns why it doesn't work?
Thank you
Anil Langote
with same configuration it takes
40 mins why this is happening ?
Best Regards,
Anil Langote
+1-425-633-9747
you
Anil Langote
+1-425-633-9747
From: ayan guha
Date: Sunday, January 8, 2017 at 10:26 PM
To: Anil Langote
Cc: Holden Karau , user
Subject: Re: Efficient look up in Key Pair RDD
Have you tried something like GROUPING SET? That seems to be the exact thing
you are looking for
use case.
Best Regards,
Anil Langote
+1-425-633-9747
> On Jan 8, 2017, at 8:17 PM, Holden Karau wrote:
>
> To start with caching and having a known partioner will help a bit, then
> there is also the IndexedRDD project, but in general spark might not be the
> best tool for the j
given key?
Thank you
Anil Langote
or in Java can be
done in scala only how can define the aggregator which takes array of doubles
as input, note that I have parquet file as my input.
Any pointers are highly appreciated, I read that spark UDAF is slow and
aggregators are the way to go.
Best Regards,
Anil Langote
+1-425-633-9747
count(*), col1, col2, col3, aggregationFunction(doublecol) from table
group by col1,col2,col3 having count(*) >1
The about queries group by columns will change similarly I have to run 100
queries on same data set.
Best Regards,
Anil Langote
+1-425-633-9747
> On Dec 21, 2016, at 11:41 AM
in this regard is appreciated.
Best Regards,
Anil Langote
+1-425-633-9747
are suggesting.
Best Regards,
Anil Langote
+1-425-633-9747
> On Nov 11, 2016, at 7:10 PM, ayan guha wrote:
>
> You can explore grouping sets in SQL and write an aggregate function to add
> array wise sum.
>
> It will boil down to something like
>
> Select attr1,attr2.
sults against the keys.
Same process will be repeated for next combinations.
Thank you
Anil Langote
+1-425-633-9747
Hi All,
I have cluster with 1 master and 6 slaves which uses pre-built version of
hadoop 2.6.0 and spark 1.6.2. I was running hadoop MR and spark jobs
without any problem with openjdk 7 installed on all the nodes. However when
I upgraded openjdk 7 to openjdk 8 on all nodes, spark submit and
spark-
days of data.
∂
Thank you
Anil Langote
> On Apr 20, 2016, at 1:12 PM, Wei Chen wrote:
>
> Found it. In case someone else if looking for this:
> cvModel.bestModel.asInstanceOf[org.apache.spark.ml.classification.LogisticRegressionModel].weights
>
> On Tue, Apr 19, 2016 at
14 matches
Mail list logo