My take on this might sound a bit different. Here are few points to consider
below:
1. Going through Hive JDBC means that the application is restricted by the #
of queries that can be compiled. HS2 can only compile one SQL at a time and if
users have bad SQL, it can take a long time just to
Thank you Takeshi.As far as I see from the code pointed, the default number of bytes to pack in a partition is set to 128MB - size of the parquet block size. Daniel,It seems you do have a need to modify the number of bytes you want to pack per partition. I am curious to know the scenario. Please
Hi,
Without using Spark there are a couple of options. You can refer to the link:
http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/.
The gist is that you convert the data into HFiles and use the bulk upload
option to get the data quickly into HBase.
HTH
Kabeer.
On
I hope the below sample helps you:
val parquetDF = hiveContext.read.parquet("hdfs://.parquet")
parquetDF.registerTempTable("parquetTable")
sql("SELECT * FROM parquetTable").collect().foreach(println)
Kabeer.
Sent from
Nylas
Divya:
https://databricks.com/blog/2015/09/16/spark-1-5-dataframe-api-highlights-datetimestring-handling-time-intervals-and-udafs.html
The link gives a complete example of registering a udAf - user defined
aggregate function. This is a complete example and this example should give you
a
I use SBT and I have never included spark-sql. The simple 2 lines in SBT are as
below:
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.5.0",
"org.apache.spark" %% "spark-hive" % "1.5.0"
)
However, I do note that you are using Spark-sql include and the Spark version
I use Spark 1.5 with CDH5.5 distribution and I see that support is present for
UDAF. From the link:
https://databricks.com/blog/2015/09/16/spark-1-5-dataframe-api-highlights-datetimestring-handling-time-intervals-and-udafs.html,
I read that this is an experimental feature. So it makes sense not