You can use filter and isNotNull on Column <https://spark.apache.org/docs/latest/api/scala/#org.apache.spark.sql.Column> before the map.
On 20 February 2016 at 08:24, Mich Talebzadeh <m...@peridale.co.uk> wrote: > > > I have a DF like below reading a csv file > > > > > > val df = > HiveContext.read.format("com.databricks.spark.csv").option("inferSchema", > "true").option("header", "true").load("/data/stg/table2") > > > > val a = df.map(x => (x.getString(0), x.getString(1), > *x.getString(2).substring(1)*.replace(",", > "").toDouble,x.getString(3).substring(1).replace(",", "").toDouble, > x.getString(4).substring(1).replace(",", "").toDouble)) > > > > > > For most rows I am reading from csv file the above mapping works fine. > However, at the bottom of csv there are couple of empty columns as below > > > > [421,02/10/2015,?1,187.50,?237.50,?1,425.00] > > [,,,,] > > [Net income,,?182,531.25,?14,606.25,?197,137.50] > > [,,,,] > > [year 2014,,?113,500.00,?0.00,?113,500.00] > > [Year 2015,,?69,031.25,?14,606.25,?83,637.50] > > > > However, I get > > > > a.collect.foreach(println) > > 16/02/20 08:31:53 ERROR Executor: Exception in task 0.0 in stage 123.0 > (TID 161) > > java.lang.StringIndexOutOfBoundsException: String index out of range: -1 > > > > I suspect the cause is substring operation say > x.getString(2).substring(1) on empty values that according to web will > throw this type of error > > > > > > The easiest solution seems to be to check whether x above is not null and > do the substring operation. Can this be done without using a UDF? > > > > Thanks > > > > Dr Mich Talebzadeh > > > > LinkedIn * > https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > > NOTE: The information in this email is proprietary and confidential. This > message is for the designated recipient only, if you are not the intended > recipient, you should destroy it immediately. Any information in this > message shall not be understood as given or endorsed by Peridale Technology > Ltd, its subsidiaries or their employees, unless expressly so stated. It is > the responsibility of the recipient to ensure that this email is virus > free, therefore neither Peridale Technology Ltd, its subsidiaries nor their > employees accept any responsibility. > > > > >