Have you tried to examine what clean_cols contains -- I'm suspect of this part mkString(“, “). Try this: val clean_cols : Seq[String] = df.columns...
if you get a type error you need to work on clean_cols (I suspect yours is of type String at the moment and presents itself to Spark as a single column names with commas embedded). Not sure why the .drop call hangs but in either case drop returns a new dataframe -- it's not a setter call.... On Thu, Jul 16, 2015 at 10:57 AM, <saif.a.ell...@wellsfargo.com> wrote: > Hi, > > In a hundred columns dataframe, I wish to either *select all of them > except* or *drop the ones I dont want*. > > I am failing in doing such simple task, tried two ways > > val clean_cols = df.columns.filterNot(col_name => > col_name.startWith(“STATE_”).mkString(“, “) > df.select(clean_cols) > > But this throws exception: > org.apache.spark.sql.AnalysisException: cannot resolve 'asd_dt, > industry_area,...’ > at > org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:63) > at > org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:52) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:286) > at > org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:286) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:285) > at > org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$transformExpressionUp$1(QueryPlan.scala:108) > at > org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2$$anonfun$apply$2.apply(QueryPlan.scala:123) > > The other thing I tried is > > df.columns.filter(col_name => col_name.startWith(“STATE_”) > for (col <- cols) df.drop(col) > > But this other thing doesn’t do anything or hangs up. > > Saif > > > >