Kevin gave you the answer you need, but I'd like to comment on your subject
line. SQL is a limited form of FP. Sure, there are no anonymous functions
and other limitations, but it's declarative, like good FP programs should
be, and it offers an important subset of the operators ("combinators") you
want.

Also, on a practical note, use the DataFrame API whenever you can, rather
than dropping down to the RDD API, because the DataFrame API is far more
performant. It's a classic case where restricting your options enables more
aggressive optimizations behind the scenes. Michal Armbrust's talk at Spark
Summit East nicely made this point.
http://www.slideshare.net/databricks/structuring-spark-dataframes-datasets-and-streaming

dean

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe <http://typesafe.com>
@deanwampler <http://twitter.com/deanwampler>
http://polyglotprogramming.com

On Mon, Feb 22, 2016 at 6:45 PM, Kevin Mellott <kevin.r.mell...@gmail.com>
wrote:

> In your example, the *rs* instance should be a DataFrame object. In other
> words, the result of *HiveContext.sql* is a DataFrame that you can
> manipulate using *filter, map, *etc.
>
>
> http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.hive.HiveContext
>
>
> On Mon, Feb 22, 2016 at 5:16 PM, Mich Talebzadeh <
> mich.talebza...@cloudtechnologypartners.co.uk> wrote:
>
>> Hi,
>>
>> I have data stored in Hive tables that I want to do simple manipulation.
>>
>> Currently in Spark I perform the following with getting the result set
>> using SQL from Hive tables, registering as a temporary table in Spark
>>
>> Now Ideally I can get the result set into a DF and work on DF to slice
>> and dice the data using functional programming with filter, map. split etc.
>>
>> I wanted to get some ideas on how to go about it.
>>
>> thanks
>>
>> val HiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
>>
>> HiveContext.sql("use oraclehadoop")
>> val rs = HiveContext.sql("""SELECT t.calendar_month_desc, c.channel_desc,
>> SUM(s.amount_sold) AS TotalSales
>> FROM smallsales s, times t, channels c
>> WHERE s.time_id = t.time_id
>> AND   s.channel_id = c.channel_id
>> GROUP BY t.calendar_month_desc, c.channel_desc
>> """)
>> *rs.registerTempTable("tmp")*
>>
>>
>> HiveContext.sql("""
>> SELECT calendar_month_desc AS MONTH, channel_desc AS CHANNEL, TotalSales
>> from tmp
>> ORDER BY MONTH, CHANNEL
>> """).collect.foreach(println)
>> HiveContext.sql("""
>> SELECT channel_desc AS CHANNEL, MAX(TotalSales)  AS SALES
>> FROM tmp
>> GROUP BY channel_desc
>> order by SALES DESC
>> """).collect.foreach(println)
>>
>>
>> --
>>
>> Dr Mich Talebzadeh
>>
>> LinkedIn  
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>> http://talebzadehmich.wordpress.com
>>
>> NOTE: The information in this email is proprietary and confidential. This 
>> message is for the designated recipient only, if you are not the intended 
>> recipient, you should destroy it immediately. Any information in this 
>> message shall not be understood as given or endorsed by Cloud Technology 
>> Partners Ltd, its subsidiaries or their employees, unless expressly so 
>> stated. It is the responsibility of the recipient to ensure that this email 
>> is virus free, therefore neither Cloud Technology partners Ltd, its 
>> subsidiaries nor their employees accept any responsibility.
>>
>>
>>
>

Reply via email to