Hi folks, having some seemingly noob issues with the dataframe API.

I have a DF which came from the csv package.

1. What would be an easy way to cast a column to a given type -- my DF
columns are all typed as strings coming from a csv. I see a schema getter
but not setter on DF

2. I am trying to use the syntax used in various blog posts but can't
figure out how to reference a column by name:

scala> df.filter("customer_id"!="")
<console>:23: error: overloaded method value filter with alternatives:
  (conditionExpr: String)org.apache.spark.sql.DataFrame <and>
  (condition: org.apache.spark.sql.Column)org.apache.spark.sql.DataFrame
 cannot be applied to (Boolean)
              df.filter("customer_id"!="")

​
3. what would be the recommended way to drop a row containing a null value
-- is it possible to do this:
scala> df.filter("customer_id" IS NOT NULL)

Reply via email to