You can use filter and isNotNull on Column
<https://spark.apache.org/docs/latest/api/scala/#org.apache.spark.sql.Column>
before the map.

On 20 February 2016 at 08:24, Mich Talebzadeh <m...@peridale.co.uk> wrote:

>
>
> I have a DF like below reading a csv file
>
>
>
>
>
> val df =
> HiveContext.read.format("com.databricks.spark.csv").option("inferSchema",
> "true").option("header", "true").load("/data/stg/table2")
>
>
>
> val a = df.map(x => (x.getString(0), x.getString(1),
> *x.getString(2).substring(1)*.replace(",",
> "").toDouble,x.getString(3).substring(1).replace(",", "").toDouble,
> x.getString(4).substring(1).replace(",", "").toDouble))
>
>
>
>
>
> For most rows I am reading from csv file the above mapping works fine.
> However, at the bottom of csv there are couple of empty columns as below
>
>
>
> [421,02/10/2015,?1,187.50,?237.50,?1,425.00]
>
> [,,,,]
>
> [Net income,,?182,531.25,?14,606.25,?197,137.50]
>
> [,,,,]
>
> [year 2014,,?113,500.00,?0.00,?113,500.00]
>
> [Year 2015,,?69,031.25,?14,606.25,?83,637.50]
>
>
>
> However, I get
>
>
>
> a.collect.foreach(println)
>
> 16/02/20 08:31:53 ERROR Executor: Exception in task 0.0 in stage 123.0
> (TID 161)
>
> java.lang.StringIndexOutOfBoundsException: String index out of range: -1
>
>
>
> I suspect the cause is substring operation  say
> x.getString(2).substring(1) on empty values that according to web will
> throw this type of error
>
>
>
>
>
> The easiest solution seems to be to check whether x above is not null and
> do the substring operation. Can this be done without using a UDF?
>
>
>
> Thanks
>
>
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this
> message shall not be understood as given or endorsed by Peridale Technology
> Ltd, its subsidiaries or their employees, unless expressly so stated. It is
> the responsibility of the recipient to ensure that this email is virus
> free, therefore neither Peridale Technology Ltd, its subsidiaries nor their
> employees accept any responsibility.
>
>
>
>
>

Reply via email to