RE: Hive Query

yogesh.kumar13 Tue, 24 Jul 2012 03:51:19 -0700

Thank you Bejoy, Thank you very much:-).

Yes my n/w setup was with proxy authentication, if I do remove proxy for HTTP, 
HTTPS then it works but the Internet goes down.


I set the proxy bypass for machine and it working. :-)

hip hip hurry ;-)

Thanks a Lot :-)
Yogesh Kumar
________________________________
From: Bejoy Ks [bejoy...@yahoo.com]
Sent: Tuesday, July 24, 2012 4:18 PM
To: user@hive.apache.org
Subject: Re: Hive Query

Hi Yogesh

Did you try out the suggested changes?

1. Increase the value of tasktracker.http.threads  (this to be done at TT level 
and not on job level, restart TT)
2. Increase the value mapred.reduce.parallel.copies

You need to add the new values for these properties in mapred-site.xml and 
restart all the TTs .  (I'm not very sure this will resolve your issue, Just a 
suggestion).

Also I as Nitin suggested please work on proxy authentication as well.


Regards
Bejoy KS

________________________________
From: "yogesh.kuma...@wipro.com" <yogesh.kuma...@wipro.com>
To: user@hive.apache.org; bejoy...@yahoo.com
Sent: Tuesday, July 24, 2012 3:54 PM
Subject: RE: Hive Query

hello Bejoy,

I have checked the logs of failour nodes on TT web Interface,

it is.

2012-07-24 15:38:45,415 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
Initializing JVM Metrics with processName=SHUFFLE, sessionId=
2012-07-24 15:38:45,554 INFO org.apache.hadoop.mapred.ReduceTask: 
ShuffleRamManager: MemoryLimit=144965632, MaxSingleShuffleLimit=36241408
2012-07-24 15:38:45,562 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201207241536_0002_r_000000_3 Thread started: Thread for merging in 
memory files
2012-07-24 15:38:45,562 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201207241536_0002_r_000000_3 Thread started: Thread for merging on-disk 
files
2012-07-24 15:38:45,563 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201207241536_0002_r_000000_3 Thread waiting: Thread for merging on-disk 
files
2012-07-24 15:38:45,564 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201207241536_0002_r_000000_3 Need another 1 map output(s) where 0 is 
already in progress
2012-07-24 15:38:45,564 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201207241536_0002_r_000000_3 Thread started: Thread for polling Map 
Completion Events
2012-07-24 15:38:45,565 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201207241536_0002_r_000000_3 Scheduled 0 outputs (0 slow hosts and0 dup 
hosts)
2012-07-24 15:38:45,569 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201207241536_0002_r_000000_3: Got 1 new map-outputs
2012-07-24 15:38:50,565 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201207241536_0002_r_000000_3 Scheduled 1 outputs (0 slow hosts and0 dup 
hosts)
2012-07-24 15:38:50,632 WARN org.apache.hadoop.mapred.ReduceTask: 
attempt_201207241536_0002_r_000000_3 copy failed: 
attempt_201207241536_0002_m_000000_0 from 10.203.33.81
2012-07-24 15:38:50,634 WARN org.apache.hadoop.mapred.ReduceTask: 
java.io.IOException: Server returned HTTP response code: 407 for URL: 
http://10.203.33.81:50060/mapOutput?job=job_201207241536_0002&map=attempt_201207241536_0002_m_000000_0&reduce=0
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
    at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    at 
sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1491)
    at java.security.AccessController.doPrivileged(Native Method)
    at 
sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1485)
    at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1139)
    at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1447)
    at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1349)
    at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
    at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
Caused by: java.io.IOException: Server returned HTTP response code: 407 for 
URL: 
http://10.203.33.81:50060/mapOutput?job=job_201207241536_0002&map=attempt_201207241536_0002_m_000000_0&reduce=0
    at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
    ... 4 more

2012-07-24 15:38:50,635 INFO org.apache.hadoop.mapred.ReduceTask: Task 
attempt_201207241536_0002_r_000000_3: Failed fetch #1 from 
attempt_201207241536_0002_m_000000_0
2012-07-24 15:38:50,635 WARN org.apache.hadoop.mapred.ReduceTask: 
attempt_201207241536_0002_r_000000_3 adding host 10.203.33.81 to penalty box, 
next contact in 4 seconds
2012-07-24 15:38:50,635 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201207241536_0002_r_000000_3: Got 1 map-outputs from previous failures
2012-07-24 15:38:55,635 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201207241536_0002_r_000000_3 Scheduled 1 outputs (0 slow hosts and0 dup 
hosts)
2012-07-24 15:38:55,689 WARN org.apache.hadoop.mapred.ReduceTask: 
attempt_201207241536_0002_r_000000_3 copy failed: 
attempt_201207241536_0002_m_000000_0 from 10.203.33.81
2012-07-24 15:38:55,689 WARN org.apache.hadoop.mapred.ReduceTask: 
java.io.IOException: Server returned HTTP response code: 407 for URL: 
http://10.203.33.81:50060/mapOutput?job=job_201207241536_0002&map=attempt_201207241536_0002_m_000000_0&reduce=0
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
    at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    at 
sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1491)
    at java.security.AccessController.doPrivileged(Native Method)
    at 
sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1485)
    at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1139)
    at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1447)
    at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1349)
    at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
    at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
Caused by: java.io.IOException: Server returned HTTP response code: 407 for 
URL: 
http://10.203.33.81:50060/mapOutput?job=job_201207241536_0002&map=attempt_201207241536_0002_m_000000_0&reduce=0
    at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
    ... 4 more

2012-07-24 15:38:55,690 INFO org.apache.hadoop.mapred.ReduceTask: Task 
attempt_201207241536_0002_r_000000_3: Failed fetch #2 from 
attempt_201207241536_0002_m_000000_0
2012-07-24 15:38:55,690 INFO org.apache.hadoop.mapred.ReduceTask: Failed to 
fetch map-output from attempt_201207241536_0002_m_000000_0 even after 
MAX_FETCH_RETRIES_PER_MAP retries...  reporting to the JobTracker
2012-07-24 15:38:55,690 FATAL org.apache.hadoop.mapred.ReduceTask: Shuffle 
failed with too many fetch failures and insufficient progress!Killing task 
attempt_201207241536_0002_r_000000_3.

Please have a look and please help.

Thanks & Regards
Yogesh Kumar

________________________________
From: Bejoy Ks [bejoy...@yahoo.com]
Sent: Tuesday, July 24, 2012 3:10 PM
To: user@hive.apache.org
Subject: Re: Hive Query

Hi Yogesh

I'm not exactly sure of the real root cause of the error.
>From the error log and the nature of occurrence. I suspect it could be 
>happening when the reduce task is not able to reach the map task node and 
>fetch the map output. Something close to fetch failures. Can you try out the 
>following and see whether it does make some difference
1. Increase the value of tasktracker.http.threads  (this to be done at TT level 
and not on job level, restart TT)
2. mapred.reduce.parallel.copies


The query, I just tested it out on my local environment, It is working fine and 
returned the desired output. Looks like the root cause at your end is  some 
hadoop mis configuration as most of the issues are mostly with Map reduce jobs.

Regards
Bejoy KS


________________________________
From: "yogesh.kuma...@wipro.com" <yogesh.kuma...@wipro.com>
To: user@hive.apache.org; bejoy...@yahoo.com
Sent: Tuesday, July 24, 2012 2:56 PM
Subject: RE: Hive Query

Thanks Bejoy :-)

I have an error Issue with

select count(*) from table;

it throws error

2012-07-24 13:39:25,181 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201207231123_0011 with errors
Error during job, obtaining debugging information...
Examining task ID: task_201207231123_0011_m_000002 (and more) from job 
job_201207231123_0011
Exception in thread "Thread-93" java.lang.RuntimeException: Error while reading 
from task log url
    at 
org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130)
    at 
org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(JobDebugger.java:211)
    at org.apache.hadoop.hive.ql.exec.JobDebugger.run(JobDebugger.java:81)
    at java.lang.Thread.run(Thread.java:680)
Caused by: java.io.IOException: Server returned HTTP response code: 407 for 
URL: 
http://10.203.33.81:50060/tasklog?taskid=attempt_201207231123_0011_r_000000_0&start=-8193
    at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
    at java.net.URL.openStream(URL.java:1010)
    at 
org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120)
    ... 3 more
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 1  Reduce: 1   HDFS Read: 24 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec



and I run query

SELECT count(*),sub.name FROM (Select * FROM sitealias JOIN site ON 
(sitealias.site_id = site.site_id) ) sub GROUP BY sub.name;

it went into loop and still Map-Reduce process going on.

Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201207231123_0018, Tracking URL = 
http://localhost:50030/jobdetails.jsp?jobid=job_201207231123_0018
Kill Command = /HADOOP/hadoop-0.20.2/bin/../bin/hadoop job  
-Dmapred.job.tracker=localhost:9001 -kill job_201207231123_0018
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
2012-07-24 14:42:03,824 Stage-1 map = 0%,  reduce = 0%
2012-07-24 14:42:09,850 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:43:10,030 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:44:10,177 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:45:10,358 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:46:10,516 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:47:10,672 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:48:10,882 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:49:11,016 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:50:11,152 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:51:11,409 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:52:11,550 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:53:11,679 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:54:11,807 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:55:11,935 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:56:12,060 Stage-1 map = 100%,  reduce = 0%


from past 10 minutes and still on...


Please suggest and Help

Thanks & Regards
Yogesh Kumar

________________________________
From: Bejoy Ks [bejoy...@yahoo.com]
Sent: Tuesday, July 24, 2012 2:33 PM
To: user@hive.apache.org
Subject: Re: Hive Query

Hi Yogesh

Try out this query, it should work though it is little expensive

SELECT count(*),sub.name FROM (Select * FROM sitealias JOIN site ON 
(sitealias.site_id = site.site_id) ) sub GROUP BY sub.name;


Regards
Bejoy KS

________________________________
From: "yogesh.kuma...@wipro.com" <yogesh.kuma...@wipro.com>
To: user@hive.apache.org; bejoy...@yahoo.com
Sent: Tuesday, July 24, 2012 1:39 PM
Subject: RE: Hive Query

Hi Bejoy,

even If if perform count(*) operation on table it shows error,

select count(*) from dummysite;


Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201207231123_0011, Tracking URL = 
http://localhost:50030/jobdetails.jsp?jobid=job_201207231123_0011
Kill Command = /HADOOP/hadoop-0.20.2/bin/../bin/hadoop job  
-Dmapred.job.tracker=localhost:9001 -kill job_201207231123_0011
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2012-07-24 13:38:18,928 Stage-1 map = 0%,  reduce = 0%
2012-07-24 13:38:21,938 Stage-1 map = 100%,  reduce = 0%
2012-07-24 13:39:22,170 Stage-1 map = 100%,  reduce = 0%
2012-07-24 13:39:25,181 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201207231123_0011 with errors
Error during job, obtaining debugging information...
Examining task ID: task_201207231123_0011_m_000002 (and more) from job 
job_201207231123_0011
Exception in thread "Thread-93" java.lang.RuntimeException: Error while reading 
from task log url
    at 
org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130)
    at 
org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(JobDebugger.java:211)
    at org.apache.hadoop.hive.ql.exec.JobDebugger.run(JobDebugger.java:81)
    at java.lang.Thread.run(Thread.java:680)
Caused by: java.io.IOException: Server returned HTTP response code: 407 for 
URL: 
http://10.203.33.81:50060/tasklog?taskid=attempt_201207231123_0011_r_000000_0&start=-8193
    at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
    at java.net.URL.openStream(URL.java:1010)
    at 
org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120)
    ... 3 more
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 1  Reduce: 1   HDFS Read: 24 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec


Please suggest why this error is comming :-(

Regards
Yogesh Kumar

________________________________
From: Bejoy KS [bejoy...@yahoo.com]
Sent: Tuesday, July 24, 2012 12:52 PM
To: user@hive.apache.org
Subject: Re: Hive Query


Hi Yogesh

Can you try out this?

select count(*), site.name from sitealias join site on 
(site_alias.site_id=site.site_id) Group By site.name;

Regards
Bejoy KS

Sent from handheld, please excuse typos.
________________________________
From: <yogesh.kuma...@wipro.com>
Date: Tue, 24 Jul 2012 07:14:25 +0000
To: <user@hive.apache.org>
ReplyTo: user@hive.apache.org
Subject: Hive Query

Hi all,

I have two tables
1) sitealias
2) site


sitealias contains
-------------------------
id                   site_id
----------------------------
1                        15
2                        12
3                        12
4                        15
---------------------------

site contains

-----------------------------
site_id                        name
-------------------------------
12                        google
13                        wiki
14                        yahoo
15                        flipcart
---------------------------------



I am runing a query to perform equi join and to result  how many times same 
site_id repeats and its name and it gets group bi site id.

result of query I want

---------------------------------
site_id                name
---------------------------------
2                        google
2                        flipcart
----------------------------------


I performed
select sitealias.count(*), site.name from sitealias join site on 
(site_alias.site_id=site.site_id);

it shows error :  Parse Error:  mismatched input '(' expecting FROM near 
'count' in from clause


Please help and suggest a query of this kind of operations.


Thanks & Regards
Yogesh Kumar
Please do not print this email unless it is absolutely necessary.
The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments.
WARNING: Computer viruses can be transmitted via email. The recipient should 
check this email and any attachments for the presence of viruses. The company 
accepts no liability for any damage caused by any virus transmitted by this 
email.
www.wipro.com
Please do not print this email unless it is absolutely necessary.
The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments.
WARNING: Computer viruses can be transmitted via email. The recipient should 
check this email and any attachments for the presence of viruses. The company 
accepts no liability for any damage caused by any virus transmitted by this 
email.
www.wipro.com


Please do not print this email unless it is absolutely necessary.
The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments.
WARNING: Computer viruses can be transmitted via email. The recipient should 
check this email and any attachments for the presence of viruses. The company 
accepts no liability for any damage caused by any virus transmitted by this 
email.
www.wipro.com


Please do not print this email unless it is absolutely necessary.
The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments.
WARNING: Computer viruses can be transmitted via email. The recipient should 
check this email and any attachments for the presence of viruses. The company 
accepts no liability for any damage caused by any virus transmitted by this 
email.
www.wipro.com



Please do not print this email unless it is absolutely necessary. 

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should 
check this email and any attachments for the presence of viruses. The company 
accepts no liability for any damage caused by any virus transmitted by this 
email. 

www.wipro.com

RE: Hive Query

Reply via email to