I'm just getting started with Spark SQL and DataFrames in 1.3.0.

I notice that the Spark API shows a different syntax for referencing
columns in a dataframe than the Spark SQL Programming Guide.

For instance, the API docs for the select method show this:
df.select($"colA", $"colB")


Whereas the programming guide shows this:
df.filter(df("name") > 21).show()

I tested and both the $"column" and df(column) syntax works, but I'm
wondering which is *preferred*.  Is one the original and one a new feature
we should be using?

Thanks,
Diana
(Spark Curriculum Developer for Cloudera)

Reply via email to