Hi, Yin

But our data is customized sequence file which can be read by our customized 
load in pig

And I want to use spark to reuse these load function to read data and transfer 
them to the RDD.

Best Regards,
Kevin.

From: Yin Huai [mailto:yh...@databricks.com]
Sent: 2015年3月24日 11:53
To: Dai, Kevin
Cc: Paul Brown; user@spark.apache.org
Subject: Re: Use pig load function in spark

Hello Kevin,

You can take a look at our generic load 
function<https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#generic-loadsave-functions>.

For example, you can use

val df = sqlContext.load("/myData", "parquet")
To load a parquet dataset stored in "/myData" as a 
DataFrame<https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#dataframes>.

You can use it to load data stored in various formats, like json (Spark 
built-in), parquet (Spark built-in), 
avro<https://github.com/databricks/spark-avro>, and 
csv<https://github.com/databricks/spark-csv>.

Thanks,

Yin

On Mon, Mar 23, 2015 at 7:14 PM, Dai, Kevin 
<yun...@ebay.com<mailto:yun...@ebay.com>> wrote:
Hi, Paul

You are right.

The story is that we have a lot of pig load function to load our different data.

And now we want to use spark to read and process these data.

So we want to figure out a way to reuse our existing load function in spark to 
read these data.

Any idea?

Best Regards,
Kevin.

From: Paul Brown [mailto:p...@mult.ifario.us<mailto:p...@mult.ifario.us>]
Sent: 2015年3月24日 4:11
To: Dai, Kevin
Subject: Re: Use pig load function in spark


The answer is "Maybe, but you probably don't want to do that.".

A typical Pig load function is devoted to bridging external data into Pig's 
type system, but you don't really need to do that in Spark because it is 
(thankfully) not encumbered by Pig's type system.  What you probably want to do 
is to figure out a way to use native Spark facilities (e.g., textFile) coupled 
with some of the logic out of your Pig load function necessary to turn your 
external data into an RDD.


—
p...@mult.ifario.us<mailto:p...@mult.ifario.us> | Multifarious, Inc. | 
http://mult.ifario.us/

On Mon, Mar 23, 2015 at 2:29 AM, Dai, Kevin 
<yun...@ebay.com<mailto:yun...@ebay.com>> wrote:
Hi, all

Can spark use pig’s load function to load data?

Best Regards,
Kevin.


Reply via email to