Thanks for the suggestion Ayan, it has not solved my problem but I did get sqlContext to execute the SQL and return dataframe object. SQL is running fine in the pyspark interpreter but not passing to SQL note (though it works fine for a different dataset) - guess I'll take this question to the Zeppelin list. Thanks again, more tips welcome if anyone sees anything funny...
------------ %pyspark from pyspark.sql.types import Row, StructType, StructField, IntegerType, StringType, DecimalType from os import getcwd sqlContext = SQLContext(sc) datafile = sc.textFile("/Users/mitty01/data/geonames/CA.txt") geonames = datafile.map(lambda s: s.split("\t")).map(lambda s: Row( geonameid=int(s[0]), asciiname=str(s[2]), latitude=float(s[4]), longitude=float(s[5]), elevation=str(s[16]), featureclass=str(s[6]), featurecode=str(s[7]), countrycode=str(s[8]) )) gndf = sqlContext.inferSchema(geonames) gndf.registerTempTable('geonames') #print gndf.count() print "-----------" print gndf.columns print "-----------" print gndf.first() print "-----------" gndf.schema print "-----------" sqlContext.sql("SELECT * FROM geonames LIMIT 10") ==== OUTPUT ==== ----------- [u'asciiname', u'countrycode', u'elevation', u'featureclass', u'featurecode', u'geonameid', u'latitude', u'longitude'] ----------- Row(asciiname=u'Swiftsure Bank', countrycode=u'CA', elevation=u'-9999', featureclass=u'U', featurecode=u'BNKU', geonameid=4030308, latitude=48.55321, longitude=-125.02235) ----------- StructType(List(StructField(asciiname,StringType,true),StructField(countrycode,StringType,true),StructField(elevation,StringType,true),StructField(featureclass,StringType,true),StructField(featurecode,StringType,true),StructField(geonameid,LongType,true),StructField(latitude,DoubleType,true),StructField(longitude,DoubleType,true))) ----------- DataFrame[asciiname: string, countrycode: string, elevation: string, featureclass: string, featurecode: string, geonameid: bigint, latitude: double, longitude: double] ================== %sql SELECT * FROM geonames LIMIT 1 no such table List(geonames); line 2 pos 5 ________________________________ From: ayan guha <guha.a...@gmail.com> Sent: May 11, 2015 12:27 AM To: Tyler Mitchell Cc: user Subject: Re: Python -> SQL (geonames dataset) Try this Res = ssc.sql("your SQL without limit") Print red.first() Note: your SQL looks wrong as count will need a group by clause. Best Ayan On 11 May 2015 16:22, "Tyler Mitchell" <tyler.mitch...@actian.com<mailto:tyler.mitch...@actian.com>> wrote: I'm using Python to setup a dataframe, but for some reason it is not being made available to SQL. Code (from Zeppelin) below. I don't get any error when loading/prepping the data or dataframe. Any tips? (Originally I was not hardcoding the Row() structure, as my other tutorial added it by default, not sure why it didn't work here, but that might be besides the point.) Any guesses greatly appreciated as I dig my teeth in here for the first time. Thanks! ------- %pyspark from pyspark.sql.types import Row, StructType, StructField, IntegerType, StringType, DecimalType from os import getcwd sqlContext = SQLContext(sc) datafile = sc.textFile("/Users/tyler/data/geonames/CA.txt") geonames = datafile.map(lambda s: s.split("\t")).map(lambda s: Row( geonameid=int(s[0]), asciiname=str(s[2]), latitude=float(s[4]), longitude=float(s[5]), elevation=str(s[16]), featureclass=str(s[6]), featurecode=str(s[7]), countrycode=str(s[8]) )) gndf = sqlContext.inferSchema(geonames) gndf.registerAsTable("geonames") #print gndf.count() print "-----------" print gndf.columns print "-----------" print gndf.first() print "-----------" gndf.schema ============ OUTPUT ============ [u'asciiname', u'countrycode', u'elevation', u'featureclass', u'featurecode', u'geonameid', u'latitude', u'longitude'] ----------- Row(asciiname=u'100 Mile House', countrycode=u'CA', elevation=u'928', featureclass=u'P', featurecode=u'PPL', geonameid=5881639, latitude=51.64982, longitude=-121.28594) ----------- StructType(List(StructField(asciiname,StringType,true),StructField(countrycode,StringType,true),StructField(elevation,StringType,true),StructField(featureclass,StringType,true),StructField(featurecode,StringType,true),StructField(geonameid,LongType,true),StructField(latitude,DoubleType,true),StructField(longitude,DoubleType,true))) ============= %sql SELECT geonameid, count(1) value FROM geonames LIMIT 1 no such table List(geonames); line 2 pos 5