Re: splitting columns into new columns

2017-07-17 Thread ayan guha
Hi Please use explode, which is written to solve exactly your problem. Consider below: >>> s = ["ERN~58XX7~^EPN~5X551~|1000"] >>> df = sc.parallelize(s).map(lambda t: t.split('|')).toDF(['phone','id']) >>> df.registerTempTable("t") >>> resDF = sqlContext.sql("select id,explode(phone)

Re: splitting columns into new columns

2017-07-17 Thread nayan sharma
Hi Pralabh, Thanks for your help. val xx = columnList.map(x => x->0).toMap val opMap = dataFrame.rdd.flatMap { row => columnList.foldLeft(xx) { case (y, col) => val s = row.getAs[String](col).split("\\^").length if (y(col) < s) y.updated(col, s) else y }.toList } val colMaxSizeMap =

Re: splitting columns into new columns

2017-07-17 Thread Pralabh Kumar
Hi Nayan Please find the solution of your problem which work on spark 2. val spark = SparkSession.builder().appName("practice").enableHiveSupport().getOrCreate() val sc = spark.sparkContext val sqlContext = spark.sqlContext import spark.implicits._ val dataFrame =

Re: splitting columns into new columns

2017-07-17 Thread nayan sharma
If I have 2-3 values in a column then I can easily separate it and create new columns with withColumn option. but I am trying to achieve it in loop and dynamically generate the new columns as many times the ^ has occurred in column values Can it be achieve in this way. > On 17-Jul-2017, at

Re: splitting columns into new columns

2017-07-16 Thread ayan guha
You are looking for explode function. On Mon, 17 Jul 2017 at 4:25 am, nayan sharma wrote: > I’ve a Dataframe where in some columns there are multiple values, always > separated by ^ > > phone|contact| > ERN~58XX7~^EPN~5X551~|C~MXXX~MSO~^CAxxE~~3XXX5| > >

splitting columns into new columns

2017-07-16 Thread nayan sharma
I’ve a Dataframe where in some columns there are multiple values, always separated by ^ phone|contact| ERN~58XX7~^EPN~5X551~|C~MXXX~MSO~^CAxxE~~3XXX5| phone1|phone2|contact1|contact2| ERN~5XXX7|EPN~5891551~|C~MXXXH~MSO~|CAxxE~~3XXX5| How can this be achieved using loop