Re: Spark-Phoenix Plugin

Jaanai Zhang Mon, 06 Aug 2018 07:44:33 -0700

you can get better performance if directly read/write HBase. you also use
spark-phoenix, this is an example, reading data from CSV file and writing
into Phoenix table:


def main(args: Array[String]): Unit = {

  val sc = new SparkContext("local", "phoenix-test")
  val path = "/tmp/data"
  val hbaseConnectionString = "host1,host2,host3"
  val customSchema = StructType(Array(
    StructField("O_ORDERKEY", StringType, true),
    StructField("O_CUSTKEY", StringType, true),
    StructField("O_ORDERSTATUS", StringType, true),
    StructField("O_TOTALPRICE", StringType, true),
    StructField("O_ORDERDATE", StringType, true),
    StructField("O_ORDERPRIORITY", StringType, true),
    StructField("O_CLERK", StringType, true),
    StructField("O_SHIPPRIORITY", StringType, true),
    StructField("O_COMMENT", StringType, true)))

  //    import com.databricks.spark.csv._
  val sqlContext = new SQLContext(sc)

  val df = sqlContext.read
    .format("com.databricks.spark.csv")
    .option("delimiter", "|")
    .option("header", "false")
    .schema(customSchema)
    .load(path)

  val start = System.currentTimeMillis()
  df.write.format("org.apache.phoenix.spark")
    .mode("overwrite")
    .option("table", "DATAX")
    .option("zkUrl", hbaseConnectionString)
    .save()

  val end = System.currentTimeMillis()
  print("taken time:" + ((end - start) / 1000) + "s")
}




----------------------------------------
   Yun Zhang
   Best regards!


2018-08-06 20:10 GMT+08:00 Brandon Geise <[email protected]>:

> Thanks for the reply Yun.
>
>
>
> I’m not quite clear how this would exactly help on the upsert side?  Are
> you suggesting deriving the type from Phoenix then doing the
> encoding/decoding and writing/reading directly from HBase?
>
>
>
> Thanks,
>
> Brandon
>
>
>
> *From: *Jaanai Zhang <[email protected]>
> *Reply-To: *<[email protected]>
> *Date: *Sunday, August 5, 2018 at 9:34 PM
> *To: *<[email protected]>
> *Subject: *Re: Spark-Phoenix Plugin
>
>
>
> You can get data type from Phoenix meta, then encode/decode data to
> write/read data. I think this way is effective, FYI :)
>
>
>
>
> ----------------------------------------
>
>    Yun Zhang
>
>    Best regards!
>
>
>
>
>
> 2018-08-04 21:43 GMT+08:00 Brandon Geise <[email protected]>:
>
> Good morning,
>
>
>
> I’m looking at using a combination of Hbase, Phoenix and Spark for a
> project and read that using the Spark-Phoenix plugin directly is more
> efficient than JDBC, however it wasn’t entirely clear from examples when
> writing a dataframe if an upsert is performed and how much fine-grained
> options there are for executing the upsert.  Any information someone can
> share would be greatly appreciated!
>
>
>
>
>
> Thanks,
>
> Brandon
>
>
>

Re: Spark-Phoenix Plugin

Reply via email to