Understood. In that case Ted’s suggestion to check the length should solve the 
problem.

> On Feb 20, 2016, at 2:09 PM, Mich Talebzadeh <m...@peridale.co.uk> wrote:
> 
> Hi,
>  
> That is a good question.
>  
> When data is exported from CSV to Linux, any character that cannot be 
> transformed is replaced by ?. That question mark is not actually the expected 
> “?” J
>  
> So the only way I can get rid of it is by drooping the first character using 
> substring(1). I checked I did the same in Hive sql
>  
> The actual field in CSV is “£2,500.oo” that translates into “?2,500.00”
>  
> HTH
>  
>  
> Dr Mich Talebzadeh
>  
> LinkedIn  
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>  
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>  
> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>  
> NOTE: The information in this email is proprietary and confidential. This 
> message is for the designated recipient only, if you are not the intended 
> recipient, you should destroy it immediately. Any information in this message 
> shall not be understood as given or endorsed by Peridale Technology Ltd, its 
> subsidiaries or their employees, unless expressly so stated. It is the 
> responsibility of the recipient to ensure that this email is virus free, 
> therefore neither Peridale Technology Ltd, its subsidiaries nor their 
> employees accept any responsibility.
>  
>  
> From: Chandeep Singh [mailto:c...@chandeep.com] 
> Sent: 20 February 2016 13:47
> To: Mich Talebzadeh <m...@peridale.co.uk>
> Cc: user @spark <user@spark.apache.org>
> Subject: Re: Checking for null values when mapping
>  
> Looks like you’re using substring just to get rid of the ‘?’. Why not use 
> replace for that as well? And then you wouldn’t run into issues with index 
> out of bound.
>  
> val a = "?1,187.50"  
> val b = ""
>  
> println(a.substring(1).replace(",", "”))
> —> 1187.50
>  
> println(a.replace("?", "").replace(",", "”))
> —> 1187.50
>  
> println(b.replace("?", "").replace(",", "”))
> —> No error / output since both ‘?' and ‘,' don’t exist.
>  
>  
>> On Feb 20, 2016, at 8:24 AM, Mich Talebzadeh <m...@peridale.co.uk 
>> <mailto:m...@peridale.co.uk>> wrote:
>>  
>>  
>> I have a DF like below reading a csv file
>>  
>>  
>> val df = 
>> HiveContext.read.format("com.databricks.spark.csv").option("inferSchema", 
>> "true").option("header", "true").load("/data/stg/table2")
>>  
>> val a = df.map(x => (x.getString(0), x.getString(1), 
>> x.getString(2).substring(1).replace(",", 
>> "").toDouble,x.getString(3).substring(1).replace(",", "").toDouble, 
>> x.getString(4).substring(1).replace(",", "").toDouble))
>>  
>>  
>> For most rows I am reading from csv file the above mapping works fine. 
>> However, at the bottom of csv there are couple of empty columns as below
>>  
>> [421,02/10/2015,?1,187.50,?237.50,?1,425.00]
>> [,,,,]
>> [Net income,,?182,531.25,?14,606.25,?197,137.50]
>> [,,,,]
>> [year 2014,,?113,500.00,?0.00,?113,500.00]
>> [Year 2015,,?69,031.25,?14,606.25,?83,637.50]
>>  
>> However, I get 
>>  
>> a.collect.foreach(println)
>> 16/02/20 08:31:53 ERROR Executor: Exception in task 0.0 in stage 123.0 (TID 
>> 161)
>> java.lang.StringIndexOutOfBoundsException: String index out of range: -1
>>  
>> I suspect the cause is substring operation  say x.getString(2).substring(1) 
>> on empty values that according to web will throw this type of error
>>  
>>  
>> The easiest solution seems to be to check whether x above is not null and do 
>> the substring operation. Can this be done without using a UDF?
>>  
>> Thanks
>>  
>> Dr Mich Talebzadeh
>>  
>> LinkedIn  
>> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
>>  
>> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>
>>  
>> http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>
>>  
>> NOTE: The information in this email is proprietary and confidential. This 
>> message is for the designated recipient only, if you are not the intended 
>> recipient, you should destroy it immediately. Any information in this 
>> message shall not be understood as given or endorsed by Peridale Technology 
>> Ltd, its subsidiaries or their employees, unless expressly so stated. It is 
>> the responsibility of the recipient to ensure that this email is virus free, 
>> therefore neither Peridale Technology Ltd, its subsidiaries nor their 
>> employees accept any responsibility.

Reply via email to