Re: Hive From Spark: Jdbc VS sparkContext

2017-10-13 Thread Kabeer Ahmed
My take on this might sound a bit different. Here are few points to consider below: 1. Going through Hive JDBC means that the application is restricted by the # of queries that can be compiled. HS2 can only compile one SQL at a time and if users have bad SQL, it can take a long time just to

Re: Documentation on "Automatic file coalescing for native data sources"?

2017-05-20 Thread Kabeer Ahmed
Thank you Takeshi.As far as I see from the code pointed, the default number of bytes to pack in a partition is set to 128MB - size of the parquet block size. Daniel,It seems you do have a need to modify the number of bytes you want to pack per partition. I am curious to know the scenario. Please

Re: Spark to HBase Fast Bulk Upload

2016-09-19 Thread Kabeer Ahmed
Hi, Without using Spark there are a couple of options. You can refer to the link: http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/. The gist is that you convert the data into HFiles and use the bulk upload option to get the data quickly into HBase. HTH Kabeer. On

Re: read parquetfile in spark-sql error

2016-07-25 Thread Kabeer Ahmed
I hope the below sample helps you: val parquetDF = hiveContext.read.parquet("hdfs://.parquet") parquetDF.registerTempTable("parquetTable") sql("SELECT * FROM parquetTable").collect().foreach(println) Kabeer. Sent from Nylas

Re: write and call UDF in spark dataframe

2016-07-21 Thread Kabeer Ahmed
Divya: https://databricks.com/blog/2015/09/16/spark-1-5-dataframe-api-highlights-datetimestring-handling-time-intervals-and-udafs.html The link gives a complete example of registering a udAf - user defined aggregate function. This is a complete example and this example should give you a

Re: Adding hive context gives error

2016-03-07 Thread Kabeer Ahmed
I use SBT and I have never included spark-sql. The simple 2 lines in SBT are as below: libraryDependencies ++= Seq( "org.apache.spark" %% "spark-core" % "1.5.0", "org.apache.spark" %% "spark-hive" % "1.5.0" ) However, I do note that you are using Spark-sql include and the Spark version

Re: UDAF support for DataFrames in Spark 1.5.0?

2016-02-18 Thread Kabeer Ahmed
I use Spark 1.5 with CDH5.5 distribution and I see that support is present for UDAF. From the link: https://databricks.com/blog/2015/09/16/spark-1-5-dataframe-api-highlights-datetimestring-handling-time-intervals-and-udafs.html, I read that this is an experimental feature. So it makes sense not