Hi Mich - Can you run a filter command on df1 prior to your map for any rows where p(3).toString != '-' then run your map command?
Thanks Mike On Tue, Sep 27, 2016 at 5:06 PM, Mich Talebzadeh <mich.talebza...@gmail.com> wrote: > Thanks guys > > Actually these are the 7 rogue rows. The column 0 is the Volume column > which means there was no trades on those days > > > *cat stock.csv|grep ",0"*SAP SE,SAP, 23-Dec-11,-,-,-,40.56,0 > SAP SE,SAP, 21-Apr-11,-,-,-,45.85,0 > SAP SE,SAP, 30-Dec-10,-,-,-,38.10,0 > SAP SE,SAP, 23-Dec-10,-,-,-,38.36,0 > SAP SE,SAP, 30-Apr-08,-,-,-,32.39,0 > SAP SE,SAP, 29-Apr-08,-,-,-,33.05,0 > SAP SE,SAP, 28-Apr-08,-,-,-,32.60,0 > > So one way would be to exclude the rows that there was no volume of trade > that day when cleaning up the csv file > > *cat stock.csv|grep -v **",0"* > > and that works. Bearing in mind that putting 0s in place of "-" will skew > the price plot. > > BTW I am using Spark csv as well > > val df1 = spark.read.option("header", true).csv(location) > > This is the class and the mapping > > > case class columns(Stock: String, Ticker: String, TradeDate: String, Open: > Float, High: Float, Low: Float, Close: Float, Volume: Integer) > val df2 = df1.map(p => columns(p(0).toString, p(1).toString, > p(2).toString, p(3).toString.toFloat, p(4).toString.toFloat, > p(5).toString.toFloat, p(6).toString.toFloat, p(7).toString.toInt)) > > > In here I have > > p(3).toString.toFloat > > How can one check for rogue data in p(3)? > > > Thanks > > > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > On 27 September 2016 at 21:49, Mich Talebzadeh <mich.talebza...@gmail.com> > wrote: > >> >> I have historical prices for various stocks. >> >> Each csv file has 10 years trade one row per each day. >> >> These are the columns defined in the class >> >> case class columns(Stock: String, Ticker: String, TradeDate: String, >> Open: Float, High: Float, Low: Float, Close: Float, Volume: Integer) >> >> The issue is with Open, High, Low, Close columns that all are defined as >> Float. >> >> Most rows are OK like below but the red one with "-" defined as Float >> causes issues >> >> Date Open High Low Close Volume >> 27-Sep-16 80.91 80.93 79.87 80.85 1873158 >> 23-Dec-11 - - - 40.56 0 >> >> Because the prices are defined as Float, these rows cause the application >> to crash >> scala> val rs = df2.filter(changeToDate("TradeDate") >= >> monthsago).select((changeToDate("TradeDate").as("TradeDate") >> ),(('Close+'Open)/2).as("AverageDailyPrice"), 'Low.as("Day's Low"), >> 'High.as("Day's High")).orderBy("TradeDate").collect >> 16/09/27 21:48:53 ERROR Executor: Exception in task 0.0 in stage 61.0 >> (TID 260) >> java.lang.NumberFormatException: For input string: "-" >> >> >> One way is to define the prices as Strings but that is not >> meaningful. Alternatively do the clean up before putting csv in HDFS but >> that becomes tedious and error prone. >> >> Any ideas will be appreciated. >> >> >> Dr Mich Talebzadeh >> >> >> >> LinkedIn * >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* >> >> >> >> http://talebzadehmich.wordpress.com >> >> >> *Disclaimer:* Use it at your own risk. Any and all responsibility for >> any loss, damage or destruction of data or any other property which may >> arise from relying on this email's technical content is explicitly >> disclaimed. The author will in no case be liable for any monetary damages >> arising from such loss, damage or destruction. >> >> >> > >