Have you tried to examine what clean_cols contains -- I'm suspect of this
part mkString(“, “).
Try this:
val clean_cols : Seq[String] = df.columns...

if you get a type error you need to work on clean_cols (I suspect yours is
of type String at the moment and presents itself to Spark as a single
column names with commas embedded).

Not sure why the .drop call hangs but in either case drop returns a new
dataframe -- it's not a setter call....

On Thu, Jul 16, 2015 at 10:57 AM, <saif.a.ell...@wellsfargo.com> wrote:

>  Hi,
>
> In a hundred columns dataframe, I wish to either *select all of them
> except* or *drop the ones I dont want*.
>
> I am failing in doing such simple task, tried two ways
>
> val clean_cols = df.columns.filterNot(col_name =>
> col_name.startWith(“STATE_”).mkString(“, “)
> df.select(clean_cols)
>
> But this throws exception:
> org.apache.spark.sql.AnalysisException: cannot resolve 'asd_dt,
> industry_area,...’
> at
> org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
> at
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:63)
> at
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:52)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:286)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformUp$1.apply(TreeNode.scala:286)
> at
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:51)
> at
> org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:285)
> at 
> org.apache.spark.sql.catalyst.plans.QueryPlan.org$apache$spark$sql$catalyst$plans$QueryPlan$$transformExpressionUp$1(QueryPlan.scala:108)
> at
> org.apache.spark.sql.catalyst.plans.QueryPlan$$anonfun$2$$anonfun$apply$2.apply(QueryPlan.scala:123)
>
> The other thing I tried is
>
> df.columns.filter(col_name => col_name.startWith(“STATE_”)
> for (col <- cols) df.drop(col)
>
> But this other thing doesn’t do anything or hangs up.
>
> Saif
>
>
>
>

Reply via email to