Hi

Since you are on a pseudo distributed/ single node environment the hadoop 
mapreduce parallelism is limited.

You might be having just a few map slots and map tasks might be in queue 
waiting for others to complete. In a larger cluster your job should be faster.

Certain SQL queries that ulilize indexing would be faster in sql server than in 
hive.

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: Gobinda Paul <gobi...@live.com>
Date: Tue, 12 Mar 2013 15:09:31 
To: user@hive.apache.org<user@hive.apache.org>
Reply-To: user@hive.apache.org
Subject: Getting Slow Query Performance!






i use sqoop to import 30GB data ( two table employee(aprox 21 GB)  and 
salary(aprox 9GB ) into hadoop(Single Node) via hive.
i run a sample query like SELECT 
EMPLOYEE.ID,EMPLOYEE.NAME,EMPLOYEE.DEPT,SALARY.AMOUNT FROM EMPLOYEE JOIN SALARY 
WHERE EMPLOYEE.ID=SALARY.EMPLOYEE_ID AND SALARY.AMOUNT>900000;
In Hive it's take 15 Min(aprox.) where as mySQL take 4.5 min( aprox ) to 
execute that query .
CPU: Pentium(R) Dual-Core  CPU      E5700  @ 3.00GHzRAM:  2GBHDD: 500GB

Here IS My hive-site.xml conf.

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>  <property>    <name>javax.jdo.option.ConnectionURL</name>    
<value>jdbc:mysql://localhost:3306/metastore?createDatabaseIfNotExist=true</value>
  </property>  <property>    <name>javax.jdo.option.ConnectionDriverName</name> 
   <value>com.mysql.jdbc.Driver</value>  </property>  <property>    
<name>javax.jdo.option.ConnectionUserName</name>    <value>root</value>  
</property>  <property>    <name>javax.jdo.option.ConnectionPassword</name>    
<value>123456</value>  </property>  <property>    
<name>hive.hwi.listen.host</name>     <value>0.0.0.0</value>     
<description>This is the host address the Hive Web Interface will listen 
on</description>  </property>  <property>    <name>hive.hwi.listen.port</name>  
  <value>9999</value>    <description>This is the port the Hive Web Interface 
will listen on</description>   </property>   <property>    
<name>hive.hwi.war.file</name>    <value>/lib/hive-hwi-0.9.0.war</value>    
<description>This is the WAR file with the jsp content for Hive Web 
Interface</description>   </property>
  <property>  <name>mapred.reduce.tasks</name>    <value>-1</value>     
<description>The default number of reduce tasks per job.  Typically set to a 
prime close to the number of available hosts.  Ignored when        
mapred.job.tracker is "local". Hadoop set this to 1 by default, whereas hive 
uses -1 as its default value.      By setting this property to -1, Hive will 
automatically figure out what should be the number of reducers.       
</description>   </property>
   <property>     <name>hive.exec.reducers.bytes.per.reducer</name>     
<value>1000000000</value>     <description>size per reducer.The default is 1G, 
i.e if the input size is 10G, it will use 10 reducers.</description>   
</property>

  <property>    <name>hive.exec.reducers.max</name>    <value>999</value>       
 <description>max number of reducers will be used. If the one           
specified in the configuration parameter mapred.reduce.tasks is         
negative, hive will use this one as the max number of reducers when             
automatically determine number of reducers.             </description>   
</property>
  <property>    <name>hive.exec.scratchdir</name>    
<value>/tmp/hive-${user.name}</value>    <description>Scratch space for Hive 
jobs</description>  </property>
   <property>     <name>hive.metastore.local</name>     <value>true</value>   
</property>
</configuration>

Any IDEA ??                                                                     
          

Reply via email to