Do you only want to use Scala? Because otherwise, I think with pyspark and pandas read table you should be able to accomplish what you want to accomplish.
Thank you, Irving Duran On 02/16/2018 06:10 PM, Lian Jiang wrote: > Hi, > > I have a user case: > > I want to download S&P500 stock data from Yahoo API in parallel using > Spark. I have got all stock symbols as a Dataset. Then I used below > code to call Yahoo API for each symbol: > > > > case class Symbol(symbol: String, sector: String) > > case class Tick(symbol: String, sector: String, open: Double, close: > Double) > > > // symbolDS is Dataset[Symbol], pullSymbolFromYahoo returns Dataset[Tick] > > > symbolDs.map { k => > > pullSymbolFromYahoo(k.symbol, k.sector) > > } > > > This statement cannot compile: > > > Unable to find encoder for type stored in a Dataset. Primitive types > (Int, String, etc) and Product types (case classes) are supported by > importing spark.implicits._ Support for serializing other types will > be added in future releases. > > > > My questions are: > > > 1. As you can see, this scenario is not traditional dataset handling > such as count, sql query... Instead, it is more like a UDF which apply > random operation on each record. Is Spark good at handling such scenario? > > > 2. Regarding the compilation error, any fix? I did not find a > satisfactory solution online. > > > Thanks for help! > > > >
signature.asc
Description: OpenPGP digital signature