Understood. In that case Ted’s suggestion to check the length should solve the problem.
> On Feb 20, 2016, at 2:09 PM, Mich Talebzadeh <m...@peridale.co.uk> wrote: > > Hi, > > That is a good question. > > When data is exported from CSV to Linux, any character that cannot be > transformed is replaced by ?. That question mark is not actually the expected > “?” J > > So the only way I can get rid of it is by drooping the first character using > substring(1). I checked I did the same in Hive sql > > The actual field in CSV is “£2,500.oo” that translates into “?2,500.00” > > HTH > > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> > > http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> > > NOTE: The information in this email is proprietary and confidential. This > message is for the designated recipient only, if you are not the intended > recipient, you should destroy it immediately. Any information in this message > shall not be understood as given or endorsed by Peridale Technology Ltd, its > subsidiaries or their employees, unless expressly so stated. It is the > responsibility of the recipient to ensure that this email is virus free, > therefore neither Peridale Technology Ltd, its subsidiaries nor their > employees accept any responsibility. > > > From: Chandeep Singh [mailto:c...@chandeep.com] > Sent: 20 February 2016 13:47 > To: Mich Talebzadeh <m...@peridale.co.uk> > Cc: user @spark <user@spark.apache.org> > Subject: Re: Checking for null values when mapping > > Looks like you’re using substring just to get rid of the ‘?’. Why not use > replace for that as well? And then you wouldn’t run into issues with index > out of bound. > > val a = "?1,187.50" > val b = "" > > println(a.substring(1).replace(",", "”)) > —> 1187.50 > > println(a.replace("?", "").replace(",", "”)) > —> 1187.50 > > println(b.replace("?", "").replace(",", "”)) > —> No error / output since both ‘?' and ‘,' don’t exist. > > >> On Feb 20, 2016, at 8:24 AM, Mich Talebzadeh <m...@peridale.co.uk >> <mailto:m...@peridale.co.uk>> wrote: >> >> >> I have a DF like below reading a csv file >> >> >> val df = >> HiveContext.read.format("com.databricks.spark.csv").option("inferSchema", >> "true").option("header", "true").load("/data/stg/table2") >> >> val a = df.map(x => (x.getString(0), x.getString(1), >> x.getString(2).substring(1).replace(",", >> "").toDouble,x.getString(3).substring(1).replace(",", "").toDouble, >> x.getString(4).substring(1).replace(",", "").toDouble)) >> >> >> For most rows I am reading from csv file the above mapping works fine. >> However, at the bottom of csv there are couple of empty columns as below >> >> [421,02/10/2015,?1,187.50,?237.50,?1,425.00] >> [,,,,] >> [Net income,,?182,531.25,?14,606.25,?197,137.50] >> [,,,,] >> [year 2014,,?113,500.00,?0.00,?113,500.00] >> [Year 2015,,?69,031.25,?14,606.25,?83,637.50] >> >> However, I get >> >> a.collect.foreach(println) >> 16/02/20 08:31:53 ERROR Executor: Exception in task 0.0 in stage 123.0 (TID >> 161) >> java.lang.StringIndexOutOfBoundsException: String index out of range: -1 >> >> I suspect the cause is substring operation say x.getString(2).substring(1) >> on empty values that according to web will throw this type of error >> >> >> The easiest solution seems to be to check whether x above is not null and do >> the substring operation. Can this be done without using a UDF? >> >> Thanks >> >> Dr Mich Talebzadeh >> >> LinkedIn >> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >> >> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> >> >> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/> >> >> NOTE: The information in this email is proprietary and confidential. This >> message is for the designated recipient only, if you are not the intended >> recipient, you should destroy it immediately. Any information in this >> message shall not be understood as given or endorsed by Peridale Technology >> Ltd, its subsidiaries or their employees, unless expressly so stated. It is >> the responsibility of the recipient to ensure that this email is virus free, >> therefore neither Peridale Technology Ltd, its subsidiaries nor their >> employees accept any responsibility.