Hi, Yin But our data is customized sequence file which can be read by our customized load in pig
And I want to use spark to reuse these load function to read data and transfer them to the RDD. Best Regards, Kevin. From: Yin Huai [mailto:yh...@databricks.com] Sent: 2015年3月24日 11:53 To: Dai, Kevin Cc: Paul Brown; user@spark.apache.org Subject: Re: Use pig load function in spark Hello Kevin, You can take a look at our generic load function<https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#generic-loadsave-functions>. For example, you can use val df = sqlContext.load("/myData", "parquet") To load a parquet dataset stored in "/myData" as a DataFrame<https://spark.apache.org/docs/1.3.0/sql-programming-guide.html#dataframes>. You can use it to load data stored in various formats, like json (Spark built-in), parquet (Spark built-in), avro<https://github.com/databricks/spark-avro>, and csv<https://github.com/databricks/spark-csv>. Thanks, Yin On Mon, Mar 23, 2015 at 7:14 PM, Dai, Kevin <yun...@ebay.com<mailto:yun...@ebay.com>> wrote: Hi, Paul You are right. The story is that we have a lot of pig load function to load our different data. And now we want to use spark to read and process these data. So we want to figure out a way to reuse our existing load function in spark to read these data. Any idea? Best Regards, Kevin. From: Paul Brown [mailto:p...@mult.ifario.us<mailto:p...@mult.ifario.us>] Sent: 2015年3月24日 4:11 To: Dai, Kevin Subject: Re: Use pig load function in spark The answer is "Maybe, but you probably don't want to do that.". A typical Pig load function is devoted to bridging external data into Pig's type system, but you don't really need to do that in Spark because it is (thankfully) not encumbered by Pig's type system. What you probably want to do is to figure out a way to use native Spark facilities (e.g., textFile) coupled with some of the logic out of your Pig load function necessary to turn your external data into an RDD. — p...@mult.ifario.us<mailto:p...@mult.ifario.us> | Multifarious, Inc. | http://mult.ifario.us/ On Mon, Mar 23, 2015 at 2:29 AM, Dai, Kevin <yun...@ebay.com<mailto:yun...@ebay.com>> wrote: Hi, all Can spark use pig’s load function to load data? Best Regards, Kevin.