I checked with Spark 1.6.1 it still works fine. I also check out latest source code in Spark 2.0 branch and built and get the same issue.
I think because of changing API to dataset in Spark 2.0? Regards, Chanh > On Aug 5, 2016, at 9:44 AM, Chanh Le <giaosu...@gmail.com> wrote: > > Hi Nicholas, > Thanks for the information. > How did you solve the issue? > Did you change the parquet file by renaming the column name? > I used to change the column name when I create a table in Hive without > changing the parquet file but it’s still showing NULL. > The parquet files of mine quite big so anything I can do without rewriting > the parquet will be better. > > > Regards, > Chanh. > > >> On Aug 5, 2016, at 2:24 AM, Nicholas Hakobian >> <nicholas.hakob...@rallyhealth.com >> <mailto:nicholas.hakob...@rallyhealth.com>> wrote: >> >> Its due to the casing of the 'I' in userId. Your schema (from printSchema) >> names the field "userId", while your external table definition has it as >> "userid". >> >> We've run into similar issues with external Parquet tables defined in Hive >> defined with lowercase only and accessing through HiveContext. You should >> check out this documentation as it describes how Spark handles column >> definitions: >> http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-metastore-parquet-table-conversion >> >> <http://spark.apache.org/docs/latest/sql-programming-guide.html#hive-metastore-parquet-table-conversion> >> >> >> Nicholas Szandor Hakobian, Ph.D. >> Data Scientist >> Rally Health >> nicholas.hakob...@rallyhealth.com <mailto:nicholas.hakob...@rallyhealth.com> >> >> >> On Thu, Aug 4, 2016 at 4:53 AM, Chanh Le <giaosu...@gmail.com >> <mailto:giaosu...@gmail.com>> wrote: >> Hi Takeshi, >> I already have changed the colum type into INT and String but it got the >> same Null values. >> it only happens in userid that why it so annoying. >> >> thanks and regards, >> Chanh >> >> >> On Aug 4, 2016 5:59 PM, "Takeshi Yamamuro" <linguin....@gmail.com >> <mailto:linguin....@gmail.com>> wrote: >> Hi, >> >> When changing the long type into int one, does the issue also happen? >> And also, could you show more simple query to reproduce the issue? >> >> // maropu >> >> On Thu, Aug 4, 2016 at 7:35 PM, Chanh Le <giaosu...@gmail.com >> <mailto:giaosu...@gmail.com>> wrote: >> >> Hi everyone, >> >> I have a parquet file and it has data but when I use Spark Thrift Server to >> query it shows NULL for userid. >> As you can see I can get data by Spark Scala but STS is not. >> >> <Screen Shot 2016-08-04 at 5.32.04 PM.png> >> >> The file schema >> root >> |-- time: string (nullable = true) >> |-- topic_id: integer (nullable = true) >> |-- interest_id: integer (nullable = true) >> |-- inmarket_id: integer (nullable = true) >> |-- os_id: integer (nullable = true) >> |-- browser_id: integer (nullable = true) >> |-- device_type: integer (nullable = true) >> |-- device_id: integer (nullable = true) >> |-- location_id: integer (nullable = true) >> |-- age_id: integer (nullable = true) >> |-- gender_id: integer (nullable = true) >> |-- website_id: integer (nullable = true) >> |-- channel_id: integer (nullable = true) >> |-- section_id: integer (nullable = true) >> |-- zone_id: integer (nullable = true) >> |-- placement_id: integer (nullable = true) >> |-- advertiser_id: integer (nullable = true) >> |-- campaign_id: integer (nullable = true) >> |-- payment_id: integer (nullable = true) >> |-- creative_id: integer (nullable = true) >> |-- audience_id: integer (nullable = true) >> |-- merchant_cate: integer (nullable = true) >> |-- ad_default: integer (nullable = true) >> |-- userId: long (nullable = true) >> |-- impression: integer (nullable = true) >> |-- viewable: integer (nullable = true) >> |-- click: integer (nullable = true) >> |-- click_fraud: integer (nullable = true) >> |-- revenue: double (nullable = true) >> |-- proceeds: double (nullable = true) >> |-- spent: double (nullable = true) >> |-- network_id: integer (nullable = true) >> >> >> I create a table in Spark Thrift Server by. >> >> CREATE EXTERNAL TABLE ad_cookie_report (time String, advertiser_id int, >> campaign_id int, payment_id int, creative_id int, website_id int, channel_id >> int, section_id int, zone_id int, ad_default int, placment_id int, topic_id >> int, interest_id int, inmarket_id int, audience_id int, os_id int, >> browser_id int, device_type int, device_id int, location_id int, age_id int, >> gender_id int, merchant_cate int, userid bigint, impression int, viewable >> int, click int, click_fraud int, revenue double, proceeds double, spent >> double, network_id integer) >> STORED AS PARQUET LOCATION 'alluxio://master2:19998/AD_COOKIE_REPORT' <>; >> >> But when I query it got all in NULL values. >> >> 0: jdbc:hive2://master1:10000> select userid from ad_cookie_report limit 10; >> +---------+--+ >> | userid | >> +---------+--+ >> | NULL | >> | NULL | >> | NULL | >> | NULL | >> | NULL | >> | NULL | >> | NULL | >> | NULL | >> | NULL | >> | NULL | >> +---------+--+ >> 10 rows selected (3.507 seconds) >> >> How to solve the problem? Is that related to field with Uppercase? >> How to change the field name in this situation. >> >> >> Regards, >> Chanh >> >> >> >> >> -- >> --- >> Takeshi Yamamuro >> >> >