t(k=1, v="Ruud"), dict(k=3, v="Vincent")]).toDF()
> x.registerTempTable('x')
> y.registerTempTable('y')
> sqlContext.sql("select y.v, x.v FROM x INNER JOIN y ON x.k=y.k").collect()
>
> Out[1]: [Row(v=u'Ruud', v=u'Evert')]
>
> On
Am I overlooking something? This doesn't seem right:
x = sc.parallelize([dict(k=1, v="Evert"), dict(k=2, v="Erik")]).toDF()
y = sc.parallelize([dict(k=1, v="Ruud"), dict(k=3, v="Vincent")]).toDF()
x.registerTempTable('x')
y.registerTempTable('y')
sqlContext.sql("select y.v, x.v FROM x INNER JOIN y
Yes you can, using HiveContext, a metastore and the thriftserver. The
metastore persists information about your SchemaRDD, and the HiveContext,
initialised with information on the metastore, can interact with the
metastore. The thriftserver provides JDBC connections using the metastore.
Using MySQ