Can this be done using DFs?
scala> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
scala> val d = HiveContext.table("test.dummy")
d: org.apache.spark.sql.DataFrame = [id: int, clustered: int, scattered:
int, randomised: int, random_string: string, small_vc: string, padding:
string]
scala> var m = d.agg(max($"id"))
m: org.apache.spark.sql.DataFrame = [max(id): int]
How can I join these two? In other words I want to get all rows with id = m
here?
d.filter($"id" = m) ?
Thanks
On 25/02/2016 22:58, Mohammad Tariq wrote:
AFAIK, this isn't supported yet. A ticket
<https://issues.apache.org/jira/browse/SPARK-4226> is in progress though.
[image: http://] <http://about.me/mti>
Tariq, Mohammad
about.me/mti
[image: http://]
On Fri, Feb 26, 2016 at 4:16 AM, Mich Talebzadeh <
[email protected]> wrote:
>
>
> Hi,
>
>
>
> I guess the following confirms that Spark does bot support sub-queries
>
>
>
> val d = HiveContext.table("test.dummy")
>
> d.registerTempTable("tmp")
>
> HiveContext.sql("select * from tmp where id IN (select max(id) from tmp)")
>
> It crashes
>
> The SQL works OK in Hive itself on the underlying table!
>
> select * from dummy where id IN (select max(id) from dummy);
>
>
>
> Thanks
>