HI,
I am trying to build a example code given at
https://spark.apache.org/docs/latest/sql-programming-guide.html#interoperating-with-rdds
code is:
// Import factory methods provided by DataType.import
org.apache.spark.sql.types.DataType;// Import StructType and
StructFieldimport org.apache.spark.sql.types.StructType;import
org.apache.spark.sql.types.StructField;// Import Row.import
org.apache.spark.sql.Row;
// sc is an existing JavaSparkContext.SQLContext sqlContext = new
org.apache.spark.sql.SQLContext(sc);
// Load a text file and convert each line to a
JavaBean.JavaRDD<String> people =
sc.textFile("examples/src/main/resources/people.txt");
// The schema is encoded in a stringString schemaString = "name age";
// Generate the schema based on the string of schemaList<StructField>
fields = new ArrayList<StructField>();for (String fieldName:
schemaString.split(" ")) {
fields.add(DataType.createStructField(fieldName,
DataType.StringType, true));}StructType schema =
DataType.createStructType(fields);
// Convert records of the RDD (people) to Rows.JavaRDD<Row> rowRDD = people.map(
new Function<String, Row>() {
public Row call(String record) throws Exception {
String[] fields = record.split(",");
return Row.create(fields[0], fields[1].trim());
}
});
// Apply the schema to the RDD.DataFrame peopleDataFrame =
sqlContext.createDataFrame(rowRDD, schema);
// Register the DataFrame as a
table.peopleDataFrame.registerTempTable("people");
// SQL can be run over RDDs that have been registered as
tables.DataFrame results = sqlContext.sql("SELECT name FROM people");
// The results of SQL queries are DataFrames and support all the
normal RDD operations.// The columns of a row in the result can be
accessed by ordinal.List<String> names = results.map(new Function<Row,
String>() {
public String call(Row row) {
return "Name: " + row.getString(0);
}}).collect();
my pom file looks like:
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase</artifactId>
<version>0.94.0</version>
</dependency>
When I try to mvn package I am getting this issue:
cannot find symbol
[ERROR] symbol: variable StringType
[ERROR] location: class org.apache.spark.sql.types.DataType
I have gone through
https://spark.apache.org/docs/latest/api/java/org/apache/spark/sql/types/StringType.html
What is missing here?