Re: Hive select shows null after successful data load

Sunita Arvind Wed, 19 Jun 2013 12:30:38 -0700

Thanks Stephen,
Let me explore options. I will let you all know once I am successful.


regards
Sunita


On Wed, Jun 19, 2013 at 3:08 PM, Stephen Sprague <sprag...@gmail.com> wrote:

> try_parsed_json is not trivial imho :)
>
> start with the very, very basic, for example,  { "jobs" : "foo" }.  Get
> that to work first. :)   When that works add a level of nesting and see
> what happens.  Keep building on it until you either break it (and then you
> know that last thing you added broke it and can concentrate on that) or
> you'll have worked out all the bugs and your final example will work.
> Nothing fancy here except old school trial and error.
>
> An alternative I keep bringing up when native semantics don't go one's way
> is the transform() function.  use python, perl,  ruby or whatever to parse
> the json and go nuts with the rich features of said language.  just write
> your output to stdout as a delimited serialization of what you want to
> store and that's it.  That would be another way to get your scalars, arrays
> and structs to work.
>
> Don't give up yet though on the JsonSerde! :)  Its probably something very
> easy that we just can't see.
>
>
>
> On Wed, Jun 19, 2013 at 10:00 AM, Sunita Arvind <sunitarv...@gmail.com>wrote:
>
>> Thanks for looking into it Ramki.
>> Yes I had tried these options. Here is what I get (renamed the table to
>> have a meaningful name):
>>
>> hive> select jobs.values[1].id from linkedinjobsearch;
>> ......mapreduce task details....
>> OK
>> NULL
>> Time taken: 9.586 seconds
>>
>>
>> hive> select jobs.values[0].position.title from linkedinjobsearch;
>>  Total MapReduce jobs = 1
>> Launching Job 1 out of 1
>>
>> OK
>> NULL
>> Time taken: 9.617 seconds
>>
>>
>> I am trying to connect btrace to the process to be able to trace the code
>> but cant get it to respond. Here is what I tried:
>>
>> [sunita@node01 ~]$ hive --debug, recursive=y, port=7000,mainSuspend=y,
>> childSuspend=y
>> ERROR: Cannot load this JVM TI agent twice, check your java command line
>> for duplicate jdwp options.
>> Error occurred during initialization of VM
>> agent library failed to init: jdwp
>>
>> Tried changing the port also. Any idea regarding the debuggers that can
>> be used. I also tried explain query and that does not show any issues
>> either.
>>
>> regards
>> Sunita
>>
>>
>>
>>
>>
>>
>>
>> On Wed, Jun 19, 2013 at 12:11 PM, Ramki Palle <ramki.pa...@gmail.com>wrote:
>>
>>> Can you run some other queries from job1 table and see if any query
>>> returns some data?
>>>
>>> I am guessing your query "select jobs.values.position.title from
>>> jobs1;"  may have some issue. May be it should be as
>>>
>>> select jobs.values[0].position.title from jobs1;
>>>
>>>
>>> Regards,
>>> Ramki.
>>>
>>>
>>> On Wed, Jun 19, 2013 at 8:24 AM, Sunita Arvind <sunitarv...@gmail.com>wrote:
>>>
>>>> Thanks Stephen,
>>>>
>>>> That's just what I tried with the try_parsed table. It is exactly same
>>>> data with lesser nesting in the structure and lesser number of entries.
>>>> Do you mean to say that highly nested jsons can lead to issues? What
>>>> are typical solution to such issues? Write UDFs in hive or parse the JSON
>>>> into a delimited file?
>>>> I have heard of custom serdes also. Not sure if UDFs and custom serdes
>>>> are one and the same.
>>>>
>>>> regards
>>>> Sunita
>>>>
>>>>
>>>> On Wed, Jun 19, 2013 at 10:38 AM, Stephen Sprague 
>>>> <sprag...@gmail.com>wrote:
>>>>
>>>>> I think you might have to start small here instead of going for the
>>>>> home run on the first swing.  when all else fails start with a trivial 
>>>>> json
>>>>> object and then build up from there and see what additional step breaks
>>>>> it.   that way you know if the trivial example fails is something
>>>>> fundamental and not the complexity of your json object that's throwing
>>>>> things off.
>>>>>
>>>>>
>>>>> On Wed, Jun 19, 2013 at 4:34 AM, Sunita Arvind 
>>>>> <sunitarv...@gmail.com>wrote:
>>>>>
>>>>>> Thanks for sharing your experience Richa.
>>>>>> I do have timestamps but in the format of year : INT, day : INT,
>>>>>> month : INT.
>>>>>> As per your suggestion, I changed them all to string, but still get
>>>>>> null as the output.
>>>>>>
>>>>>> regards
>>>>>> Sunita
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 19, 2013 at 2:17 AM, Richa Sharma <
>>>>>> mailtorichasha...@gmail.com> wrote:
>>>>>>
>>>>>>> Do you have any timestamp fields in the table that might contain
>>>>>>> null value ?
>>>>>>>
>>>>>>> I faced a similar situation sometime back - changing the data type
>>>>>>> to string made it work.
>>>>>>>
>>>>>>> But I was working on delimited text files.
>>>>>>> Not sure if it applies to JSON .. but its still worth giving a try !!
>>>>>>>
>>>>>>> Richa
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jun 19, 2013 at 7:28 AM, Sunita Arvind <
>>>>>>> sunitarv...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Having the a column name same as the table name, is a problem due
>>>>>>>> to which I was not able to reference jobs.values.id from jobs.
>>>>>>>> Changing the table name to jobs1 resolved the semantic error.
>>>>>>>> However, the query still returns null
>>>>>>>>
>>>>>>>> hive> select jobs.values.position.title from jobs1;
>>>>>>>> Total MapReduce jobs = 1
>>>>>>>> Launching Job 1 out of 1
>>>>>>>> Number of reduce tasks is set to 0 since there's no reduce operator
>>>>>>>> Starting Job = job_201306080116_0036, Tracking URL =
>>>>>>>> http://node01.expressanalytics.net:50030/jobdetails.jsp?jobid=job_201306080116_0036
>>>>>>>> Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill
>>>>>>>> job_201306080116_0036
>>>>>>>> Hadoop job information for Stage-1: number of mappers: 1; number of
>>>>>>>> reducers: 0
>>>>>>>> 2013-06-18 18:55:52,381 Stage-1 map = 0%,  reduce = 0%
>>>>>>>> 2013-06-18 18:55:56,394 Stage-1 map = 100%,  reduce = 0%,
>>>>>>>> Cumulative CPU 0.88 sec
>>>>>>>> 2013-06-18 18:55:57,400 Stage-1 map = 100%,  reduce = 0%,
>>>>>>>> Cumulative CPU 0.88 sec
>>>>>>>> 2013-06-18 18:55:58,407 Stage-1 map = 100%,  reduce = 100%,
>>>>>>>> Cumulative CPU 0.88 sec
>>>>>>>> MapReduce Total cumulative CPU time: 880 msec
>>>>>>>> Ended Job = job_201306080116_0036
>>>>>>>> MapReduce Jobs Launched:
>>>>>>>> Job 0: Map: 1   Cumulative CPU: 0.88 sec   HDFS Read: 35374 HDFS
>>>>>>>> Write: 3 SUCCESS
>>>>>>>> Total MapReduce CPU Time Spent: 880 msec
>>>>>>>> OK
>>>>>>>> null
>>>>>>>> Time taken: 9.591 seconds
>>>>>>>>
>>>>>>>>
>>>>>>>> regards
>>>>>>>> Sunita
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jun 18, 2013 at 9:35 PM, Sunita Arvind <
>>>>>>>> sunitarv...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> Ok.
>>>>>>>>> The data files are quite small. Around 35 KB and 1 KB each.
>>>>>>>>>
>>>>>>>>> [sunita@node01 tables]$  hadoop fs -ls /user/sunita/tables/jobs
>>>>>>>>> Found 1 items
>>>>>>>>> -rw-r--r--   3 sunita hdfs      35172 2013-06-18 18:31
>>>>>>>>> /user/sunita/tables/jobs/jobs_noSite_parsed.json
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> [sunita@node01 tables]$ hadoop fs -text
>>>>>>>>> /user/sunita/tables/jobs/jobs_noSite_parsed.json |more
>>>>>>>>> {"jobs": {"_total": 1812, "_count": 20, "_start": 0, "values":
>>>>>>>>> [{"company": {"i
>>>>>>>>> d": 21836, "name": "CyberCoders"}, "postingDate": {"year": 2013,
>>>>>>>>> "day": 10, "mo
>>>>>>>>> nth": 6}, "descriptionSnippet": "Software Engineer-Hadoop, HDFS,
>>>>>>>>> HBase, Pig- Ve
>>>>>>>>> rtica Analytics Senior Hadoop Engineer - Skills Required - Hadoop,
>>>>>>>>> HDFS, HBase,
>>>>>>>>>  Pig, SQL, Industrial Software Development, System Integration,
>>>>>>>>> Java, high perf
>>>>>>>>> ormance, multi-threading, VerticaWe are a well known consumer
>>>>>>>>> product developme
>>>>>>>>> nt company and we are looking to add a Hadoop Engineer to our
>>>>>>>>> Engineering team.
>>>>>>>>>   You will be working with the latest ", "expirationDate":
>>>>>>>>> {"year": 2013, "day"
>>>>>>>>>
>>>>>>>>> Its a single line, so used 'more' rather than 'head'. But
>>>>>>>>> effectively, the file exists and has the data.
>>>>>>>>>
>>>>>>>>> regards
>>>>>>>>> Sunita
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jun 18, 2013 at 8:38 PM, Stephen Sprague <
>>>>>>>>> sprag...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> As Nitin alluded to its best to confirm the data is definitely in
>>>>>>>>>> hdfs using hdfs semantics rather than hive for the first step.
>>>>>>>>>>
>>>>>>>>>> 1. how big is it?   hadoop fs -ls <your hdfs dir>
>>>>>>>>>> 2. cat a bit of it and see if anything is there.   hadoop fs
>>>>>>>>>> -text <your hdfs dir>/<filename> | head -10
>>>>>>>>>>
>>>>>>>>>> do you see any data from step #2?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jun 18, 2013 at 3:58 PM, Sunita Arvind <
>>>>>>>>>> sunitarv...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> I ran some complex queries. Something to the extent of
>>>>>>>>>>>                     select jobs from jobs;
>>>>>>>>>>>  which triggers map reduce jobs but does not show errors and
>>>>>>>>>>> produces the same output "null". If I try referencing the struct 
>>>>>>>>>>> elements,
>>>>>>>>>>> I get error which seems to be the root cause.
>>>>>>>>>>>
>>>>>>>>>>> Attached are the select statement outputs with the corresponding
>>>>>>>>>>> hive logs.
>>>>>>>>>>>
>>>>>>>>>>> I have also attached my usage details of another table -
>>>>>>>>>>> try_parsed which has a subset of the same data which seems to work 
>>>>>>>>>>> fine.
>>>>>>>>>>> Also attached is the input file for this table - try_parsed.json
>>>>>>>>>>> Thanks for your help
>>>>>>>>>>>
>>>>>>>>>>> Sunita
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jun 18, 2013 at 4:35 PM, Nitin Pawar <
>>>>>>>>>>> nitinpawar...@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> can you run a little more complex query
>>>>>>>>>>>>
>>>>>>>>>>>> select uniq across columns or do some maths. so we know when it
>>>>>>>>>>>> fires up a mapreduce
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Wed, Jun 19, 2013 at 1:59 AM, Sunita Arvind <
>>>>>>>>>>>> sunitarv...@gmail.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for responding Nitin. Yes I am sure that serde is
>>>>>>>>>>>>> working fine and json file is being picked based on all the 
>>>>>>>>>>>>> errors that
>>>>>>>>>>>>> showed up till this stage. What sort of error are you suspecting. 
>>>>>>>>>>>>> File not
>>>>>>>>>>>>> present or serde not parsing it ?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tuesday, June 18, 2013, Nitin Pawar wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> select * from table is as good as hdfs -cat
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> are you sure there is any data in the table?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jun 18, 2013 at 11:54 PM, Sunita Arvind <
>>>>>>>>>>>>>> sunitarv...@gmail.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am able to parse the input JSON file and load it into
>>>>>>>>>>>>>>> hive. I do not see any errors with create table, so I am 
>>>>>>>>>>>>>>> assuming that. But
>>>>>>>>>>>>>>> when I try to read the data, I get null
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> hive> select * from jobs;
>>>>>>>>>>>>>>> OK
>>>>>>>>>>>>>>> null
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I have validated the JSON with JSONLint and Notepad++ JSON
>>>>>>>>>>>>>>> plugin and it is a valid JSON. Here is my create table
>>>>>>>>>>>>>>> statement and attached is the json input file.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> create external table jobs (
>>>>>>>>>>>>>>> jobs STRUCT<
>>>>>>>>>>>>>>> values : ARRAY<STRUCT<
>>>>>>>>>>>>>>> company : STRUCT<
>>>>>>>>>>>>>>> id : STRING,
>>>>>>>>>>>>>>> name : STRING>,
>>>>>>>>>>>>>>> postingDate : STRUCT<
>>>>>>>>>>>>>>> year : INT,
>>>>>>>>>>>>>>> day : INT,
>>>>>>>>>>>>>>> month : INT>,
>>>>>>>>>>>>>>> descriptionSnippet : STRING,
>>>>>>>>>>>>>>> expirationDate : STRUCT<
>>>>>>>>>>>>>>> year : INT,
>>>>>>>>>>>>>>> day : INT,
>>>>>>>>>>>>>>> month : INT>,
>>>>>>>>>>>>>>> position : STRUCT<
>>>>>>>>>>>>>>> title : STRING,
>>>>>>>>>>>>>>> jobFunctions : ARRAY<STRUCT<
>>>>>>>>>>>>>>> code : STRING,
>>>>>>>>>>>>>>> name : STRING>>,
>>>>>>>>>>>>>>> industries : ARRAY<STRUCT<
>>>>>>>>>>>>>>> code : STRING,
>>>>>>>>>>>>>>> id : STRING,
>>>>>>>>>>>>>>> name : STRING>>,
>>>>>>>>>>>>>>> jobType : STRUCT<
>>>>>>>>>>>>>>> code : STRING,
>>>>>>>>>>>>>>> name : STRING>,
>>>>>>>>>>>>>>> experienceLevel : STRUCT<
>>>>>>>>>>>>>>> code : STRING,
>>>>>>>>>>>>>>> name : STRING>>,
>>>>>>>>>>>>>>> id : STRING,
>>>>>>>>>>>>>>> customerJobCode : STRING,
>>>>>>>>>>>>>>> skillsAndExperience : STRING,
>>>>>>>>>>>>>>> salary : STRING,
>>>>>>>>>>>>>>> jobPoster : STRUCT<
>>>>>>>>>>>>>>> id : STRING,
>>>>>>>>>>>>>>> firstName : STRING,
>>>>>>>>>>>>>>> lastName : STRING,
>>>>>>>>>>>>>>> headline : STRING>,
>>>>>>>>>>>>>>> referralBonus : STRING,
>>>>>>>>>>>>>>> locationDescription : STRING>>>
>>>>>>>>>>>>>>>  )
>>>>>>>>>>>>>>> ROW FORMAT SERDE 'com.cloudera.hive.serde.JSONSerDe'
>>>>>>>>>>>>>>> LOCATION '/user/sunita/tables/jobs';
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The table creation works fine, but when I attempt to query,
>>>>>>>>>>>>>>> I get null as the result.
>>>>>>>>>>>>>>> I tried adding Input/Output formats, Serde Properties,
>>>>>>>>>>>>>>> nothing seems to impact.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am of the opinion that the libraries cannot handle this
>>>>>>>>>>>>>>> level of nesting and I probably will have to write a custom 
>>>>>>>>>>>>>>> serde or a
>>>>>>>>>>>>>>> parser myself. Just wanted to seek guidance before I get into 
>>>>>>>>>>>>>>> that.
>>>>>>>>>>>>>>> Appreciate your help and guidance.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> regards
>>>>>>>>>>>>>>> Sunita
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Nitin Pawar
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Nitin Pawar
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Hive select shows null after successful data load

Reply via email to