Hi Shim, Now it works perfectly. Thank you so much. Actually, I am from Java background and learning the Scala.
Thanks and Regards, --------------------------------- Md. Rezaul Karim PhD Researcher, Insight Centre for Data Analytics National University of Ireland Galway E-mail: rezaul.ka...@insight-centre.org Web: www.insight-centre.org Phone: +353892311519 On Thursday, November 17, 2016 2:00 PM, Hyung Sung Shim <hss...@nflabs.com> wrote: Hello Muhammad. Please check your bank-full.csv file first and you can filter item length in your scala code for example val bank = bankText.map(s => s.split(";")).filter(s => (s.size)>5).filter(s => s(0) != "\"age\"") Hope this helps. 2016-11-17 21:26 GMT+09:00 Dayong <will...@gmail.com>: Try to debug your code in IDE. You should look at your array s since it complains about array index. Thanks,Wd On Nov 16, 2016, at 10:44 PM, Muhammad Rezaul Karim <reza_cse...@yahoo.com> wrote: Hi All, I have the following Scala code (taken from https://zeppelin.apache.org/ docs/0.6.2/quickstart/ tutorial.html#data-retrieval) that deals with the sample Bank-details data: ------------------------------ -----------------------------* ------------------------------ ------------------------------ -------- val bankText = sc.textFile("/home/asif/ zeppelin-0.6.2-bin-all/bin/ bank-full.csv") case class Bank(age:Integer, job:String, marital:String, education:String, balance:Integer) // split each line, filter out header (starts with "age"), and map it into Bank case class val bank = bankText.map(s=>s.split(";")). filter(s=>s(0)!="\"age\""). map( s=>Bank(s(0), s(1).replaceAll("\"", ""), s(2).replaceAll("\"", ""), s(3).replaceAll("\"", ""), s(5).replaceAll("\"", "") ) ) // convert to DataFrame and create temporal table bank.toDF().registerTempTable( "bank") ------------------------------ -----------------------------* ------------------------------ ------------------------------ --------The above code segment runs successfully. However, when I am trying to execute the following line of code: bank.collect(), I am getting the following error: org.apache.spark. SparkException: Job aborted due to stage failure: Task 1 in stage 6.0 failed 1 times, most recent failure: Lost task 1.0 in stage 6.0 (TID 7, localhost): java.lang. ArrayIndexOutOfBoundsException : 2 at $anonfun$3.apply(<console>:91) at $anonfun$3.apply(<console>:89) Moreover, I cannot execute the below SQL queries, but getting the same error message (i.e., ArrayIndexOutOfBoundsException : 2 ): 1. %sql select age, count(1) from bank where age < 30 group by age order by age2. %sql select age, count(1) from bank where age < ${maxAge=30} group by age order by age3. %sql select age, count(1) from bank where marital="${marital=single, single|divorced|married}" group by age order by age Note: However, when I am executing the following SQL statements I am not getting any error: 1. %sql select age from bank 2. %sql select * from bank I don't understand what's wrong I am doing here! Please help me, someone, to get rid of it. Thanks and Regards, ------------------------------ --- Md. Rezaul Karim PhD Researcher, Insight Centre for Data Analytics National University of Ireland Galway E-mail: rezaul.karim@insight-centre. org Web: www.insight-centre.org Phone: +353892311519