Thanks, Owen. I tried to run from hdfs (not from s3) the problem is the same. May you please share your hive-site.xml? What env variables, parameters should I check?
I would use structor with pleasure, but I need to use EMR for this project. Thanks Oleg On Thu, Oct 26, 2017 at 12:22 AM, Owen O'Malley <[email protected]> wrote: > I'm not sure. Using a virtual environment with Hortonwork's version > (2.6.1) and hdfs instead of s3 it works: > > hive> CREATE EXTERNAL TABLE Table1 (Id INT, Name STRING) STORED AS ORC >> LOCATION 'hdfs://nn.example.com/user/vagrant/country/'; >> OK >> Time taken: 4.073 seconds >> hive> Select * from Table1; >> OK >> 1 Singapore >> 2 Malaysia >> 3 India >> 4 Hong Kong >> 5 Macau >> 6 Thailand >> 7 Indonesia >> 8 Philippines >> 9 Dubai >> 10 Vietnam >> Time taken: 0.76 seconds, Fetched: 10 row(s) > > > If you want to create a virtual environment, you can use > https://github.com/hortonworks/structor . You can use > the 1node-nonsecure.profile unless you want multiple nodes or security. > > Based on that, it is either a problem with EMR or the binding to S3. > > .. Owen > > On Wed, Oct 25, 2017 at 12:04 AM, Oleg Ruchovets <[email protected]> > wrote: > >> Yes, It is exactly my point. Since the file has the data (orc is valid), >> why hive returns NULLs? >> I tested it s3 , hdfs , hive , beeline. the behavior is the same: >> >> select count (*) returns 10. >> select * returns NULLs ... >> >> What is the way to debug this problem? Any configuration, logging. I am >> using defaults of EMR. >> >> Please advice. >> Thanks, Oleg. >> >> >> >> >> >> >> On Wed, Oct 25, 2017 at 2:30 PM, Owen O'Malley <[email protected]> >> wrote: >> >>> The file has the data. I'm not sure what Hive is doing wrong. >>> >>> owen@laptop> java -jar ../tools/target/orc-tools-1.5.0-SNAPSHOT-uber.jar >>>> data ~/Downloads/Country.orc >>>> Processing data file /Users/owen/Downloads/Country.orc [length: 392] >>>> {"Id":1,"Name":"Singapore"} >>>> {"Id":2,"Name":"Malaysia"} >>>> {"Id":3,"Name":"India"} >>>> {"Id":4,"Name":"Hong Kong"} >>>> {"Id":5,"Name":"Macau"} >>>> {"Id":6,"Name":"Thailand"} >>>> {"Id":7,"Name":"Indonesia"} >>>> {"Id":8,"Name":"Philippines"} >>>> {"Id":9,"Name":"Dubai"} >>>> {"Id":10,"Name":"Vietnam"} >>>> ____________________________________________________________ >>>> ____________________________________________________________ >>> >>> >>> .. Owen >>> >>> On Tue, Oct 24, 2017 at 11:11 PM, Oleg Ruchovets <[email protected]> >>> wrote: >>> >>>> I am creating hive external table ORC (ORC file located on S3). >>>> >>>> *Command* >>>> >>>> CREATE EXTERNAL TABLE Table1 (Id INT, Name STRING) STORED AS ORC LOCATION >>>> 's3://bucket_name' >>>> >>>> *After running the query*: >>>> >>>> Select * from Table1; >>>> >>>> *Result is*: >>>> >>>> +-------------------------------------+---------------------------------------+ >>>> | Table1.id | Table1.name | >>>> +-------------------------------------+---------------------------------------+ >>>> | NULL | NULL >>>> | >>>> | NULL | NULL >>>> | >>>> | NULL | NULL >>>> | >>>> | NULL | NULL >>>> | >>>> | NULL | NULL >>>> | >>>> | NULL | NULL >>>> | >>>> | NULL | NULL >>>> | >>>> | NULL | NULL >>>> | >>>> | NULL | NULL >>>> | >>>> | NULL | NULL >>>> | >>>> +-------------------------------------+---------------------------------------+ >>>> >>>> Interesting that the number of returned records 10 and it is correct >>>> but all records are NULL. What is wrong, why query returns only NULLs? I am >>>> using EMR instances on AWS. Should I configure/check to support ORC format >>>> for hive? >>>> >>>> ORC file attached >>>> >>> >>> >> >
