Re: how to let hive support lzo

2013-07-22 Thread bejoy_ks
Hi, Along with the mapred.compress* properties try to set hive.exec.compress.output to true. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: ch huang justlo...@gmail.com Date: Mon, 22 Jul 2013 13:41:01 To: user@hive.apache.org Reply-To:

Re: Hive CLI

2013-07-09 Thread bejoy_ks
Hi Rahul, The same shortcuts ctrl+A and ctrl+E works in hive shell for me( hive 0.9) Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: rahul kavale kavale.ra...@gmail.com Date: Tue, 9 Jul 2013 11:00:49 To: user@hive.apache.org Reply-To:

Re: Strange error in hive

2013-07-08 Thread bejoy_ks
Hii Jerome Can you send the error log of the MapReduce task that failed? That should have some pointers which can help you troubleshoot the issue. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: Jérôme Verdier verdier.jerom...@gmail.com Date:

Re: integration issure about hive and hbase

2013-07-08 Thread bejoy_ks
Hi Can you try including the zookeeper quorum and port in your hive configuration as shown below hive --auxpath .../hbase-handler.jar, .../hbase.jar, ...zookeeper.jar, .../guava.jar -hiveconf hbase.zookeeper.quorum=zk server names separated by comma -hiveconf

Re: Need help in Hive

2013-07-08 Thread bejoy_ks
Hi Maheedhar As I understand, you are having a column with data of type MM:SS in your input data set. AFAIK this format is not in the standard java.sql.Timestamp format also it doesn't even have any date part . Hence you may not be able to use Timestamp data type here. You can define it as a

Re: When to use bucketed tables with/instead of partitioned tables

2013-06-17 Thread bejoy_ks
Hi Stephen In addition to join optimization, bucketing helps much in sampling as well. It helps you to choose the sample space, (ie n buckets of m). Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: Stephen Boesch java...@gmail.com Date: Sun, 16

Re: How to delete Specific date data using hive QL?

2013-06-04 Thread bejoy_ks
Adding my two cents If you are having an unpartitioned data/table and would like to partition it on some specific columns in source table, Use dynamic partition insert. That would get the source data in separate partitions on a partitioned target table.

Re: how does hive find where is MR job tracker

2013-05-28 Thread bejoy_ks
Hive gets the JobTracker from the mapred-site.xml specified within your $HADOOP_HOME/conf. Is your $HADOOP_HOME/conf/mapred-site.xml on the node that runs hive have the correct value for jobtracker? If not changing that to the right one might resolve your issue. Regards Bejoy KS Sent from

Re: Hive on Oracle

2013-05-17 Thread bejoy_ks
Hi Raj Which jar depends on what version of oracle you are using? The jar version corresponding to each oracle release would be there in oracle documentations online. JDBC Jars should be available from the oracle website for free download. Regards Bejoy KS Sent from remote device, Please

Re: Getting Slow Query Performance!

2013-03-12 Thread bejoy_ks
Hi Since you are on a pseudo distributed/ single node environment the hadoop mapreduce parallelism is limited. You might be having just a few map slots and map tasks might be in queue waiting for others to complete. In a larger cluster your job should be faster. Certain SQL queries that

Re: Getting Slow Query Performance!

2013-03-12 Thread bejoy_ks
Hi Since you are on a pseudo distributed/ single node environment the hadoop mapreduce parallelism is limited. You might be having just a few map slots and map tasks might be in queue waiting for others to complete. In a larger cluster your job should be faster. As a side note, Certain SQL

Re: hive issue with sub-directories

2013-03-11 Thread bejoy_ks
Hi Suresh AFAIK as of now a partition cannot contain sub directories, it can contain only files. You may have to move the sub dirs out of the parent dir 'a' and create separate partitions for those. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message-

Re: java.lang.NoClassDefFoundError: com/jayway/jsonpath/PathUtil

2013-03-10 Thread bejoy_ks
Hi Sai Local mode is just for trials, for any pre prod/production environment you need MR jobs. Hive under the hood stores data in HDFS (mostly) and definitely we use hadoop/hive for larger data volumes. So MR should be in there to process them. Regards Bejoy KS Sent from remote device,

Re: Accessing sub column in hive

2013-03-08 Thread bejoy_ks
Hi Sai You can do it as Select address.country from employees;   Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: Bennie Schut bsc...@ebuddy.com Date: Fri, 8 Mar 2013 09:09:49 To: user@hive.apache.orguser@hive.apache.org; 'Sai

Re: Finding maximum across a row

2013-03-01 Thread bejoy_ks
Hi Sachin You could get the detailed ateps from hive wiki itself https://cwiki.apache.org/Hive/hiveplugins.html Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: Sachin Sudarshana sachin.sudarsh...@gmail.com Date: Fri, 1 Mar 2013 22:37:54 To:

Re: Hive queries

2013-02-25 Thread bejoy_ks
Hi Cyril I believe you are using the derby meta store and then it should be an issue with the hive configs. Derby is trying to create a metastore at your current dir from where you are starting hive. The tables exported by sqoop would be inside HIVE_HOME and hence you are not able to see the

Re: Security for Hive

2013-02-23 Thread bejoy_ks
Hi Austin AFAIK at the moment you can control permissions gracefully only on a data level not on the metadata level. ie you can play with the hdfs permissions . Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: Austin Chungath austi...@gmail.com

Re: Security for Hive

2013-02-22 Thread bejoy_ks
Hi Sachin Currently there is no such admin user concept in hive. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: Sachin Sudarshana sachin.sudarsh...@gmail.com Date: Fri, 22 Feb 2013 16:40:49 To: user@hive.apache.org Reply-To:

Re: Running Hive on multi node

2013-02-21 Thread bejoy_ks
Hi Hive uses the hadoop installation specified in HADOOP_HOME. If your hadoop home is configured for fully distributed operation it'll utilize the cluster itself. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: Hamza Asad hamza.asa...@gmail.com

Re: Adding comment to a table for columns

2013-02-21 Thread bejoy_ks
Hi Gupta Try out DESCRIBE EXTENDED FORMATTED table-name I vaguely recall a operation like this. Please check hive wiki for the exact syntax. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: Chunky Gupta chunky.gu...@vizury.com Date: Thu, 21 Feb

Re: bucketing on a column with millions of unique IDs

2013-02-20 Thread bejoy_ks
Hi Li The major consideration you should give is regarding the size of bucket. One bucket corresponds to a file in hdfs and you should ensure that every bucket is atleast a block size or in the worst case atleast majority of the buckets should be. So based on the data size you should derive

Re: Map join optimization issue

2013-02-15 Thread bejoy_ks
Hi In later versions of hive you actually don't need a map joint hint in your query. Just the following would suffice the purpose Set hive.auto.convert.join=true Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: Mayuresh Kunjir

Re: CREATE EXTERNAL TABLE Fails on Some Directories

2013-02-15 Thread bejoy_ks
Hi Joseph There are differences in the following ls commands cloudera@localhost data]$ hdfs dfs -ls /715 This would list out all the contents in /715 in hdfs, if it is a dir Found 1 items -rw-r--r--   1 cloudera supergroup    7853975 2013-02-14 17:03 /715 The output clearly defines it is

Re: LOAD HDFS into Hive

2013-01-25 Thread bejoy_ks
Hi Venkataraman You can just create an external table and give it location as the hdfs dir where the data resides. No need to perform an explicit LOAD operation here. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: venkatramanan

Re: An explanation of LEFT OUTER JOIN and NULL values

2013-01-24 Thread bejoy_ks
Hi David An explain extended would give you the exact pointer. From my understanding, this is how it could work. You have two tables then two different map reduce job would be processing those. Based on the join keys, combination of corresponding columns would be chosen as key from mapper1

Re: An explanation of LEFT OUTER JOIN and NULL values

2013-01-24 Thread bejoy_ks
Hi David, The default partitioner used in map reduce is the hash partitioner. So based on your keys they are send to a particular reducer. May be in your current data set, the keys that have no values in table are all falling in the same hash bucket and hence being processed by the same

Re: Mapping HBase table in Hive

2013-01-13 Thread bejoy_ks
Hi Ibrahim. SQOOP is used to import data from rdbms to hbase in your case. Please get the schema from hbase for your corresponding table and post it here. We can point out how your mapping could be. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message-

Re: Map Reduce Local Task

2013-01-08 Thread bejoy_ks
Hi Santhosh As long as the smaller table size is in the range of a few MBs. It is a good candidate for map join. If the smaller table size is still more then you can take a look at bucketed map joins. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message-

Re: Mapping HBase table in Hive

2013-01-08 Thread bejoy_ks
Hi Ibrahim The hive hbase integration totally depends on the hbase table schema and not the schema of the source table in mysql. You need to provide the column family qualifier mapping in there. Get the hbase table's schema from hbase shell. suppose you have the schema as Id CF1.qualifier1

Re: View with map join fails

2013-01-08 Thread bejoy_ks
Looks like there is a bug with mapjoin + view. Please check hive jira to see if there an issue open against this else file a new jira. From my understanding, When you enable map join, hive parser would create back up jobs. These back up jobs are executed only if map join fails. In normal

Re: External table with partitions

2013-01-06 Thread bejoy_ks
Hi Oded If you have created the directories manually that would come visible to the hive table only if the partitions/ sub dirs are added to the meta data using 'ALTER TABLE ... ADD PARTITION' . Partitions are not retrieved implicitly into hive tabe even if you have a proper sub dir structure.

Re: External table with partitions

2013-01-06 Thread bejoy_ks
Sorry, I din understand your query on first look through. Like Jagat said, you may need to go with a temp table for this. Do a hadoop fs -cp ../../a.* destn dir Create a external table with location as 'destn dir'. CREATE EXERNAL TABLE tmp tble name LIKE src table name LOCATION '' ; NB: I

Re: Map side join

2012-12-13 Thread bejoy_ks
Hi Souvik Is your input files compressed using some non splittable compression codec? Do you have enough free slots while this job is running? Make sure that the job is not running locally. Regards Bejoy KS Sent from remote device, Please excuse typos -Original Message- From: Souvik

Re: Map side join

2012-12-13 Thread bejoy_ks
Hi Souvik To have the new hdfs block size in effect on the already existing files, you need to re copy them into hdfs. To play with the number of mappers you can set lesser value like 64mb for min and max split size. Mapred.min.split.size and mapred.max.split.size Regards Bejoy KS Sent

Re: Map side join

2012-12-12 Thread bejoy_ks
Hi Souvik Apart from hive jobs is the normal mapreduce jobs like the wordcount running fine on your cluster? If it is working, for the hive jobs are you seeing anything skeptical in task, Tasktracker or jobtracker logs? Regards Bejoy KS Sent from remote device, Please excuse typos

Re: Map side join

2012-12-07 Thread bejoy_ks
Hi Souvik In earlier versions of hive you had to give the map join hint. But in later versions just set hive.auto.convert.join = true; Hive automatically selects the smaller table. It is better to give the smaller table as the first one in join. You can use a map join if you are joining a

Re: Doubt in INSERT query in Hive?

2012-02-15 Thread bejoy_ks
Hi Bhavesh INSERT INTO is supported in hive 0.8 . An upgrade would get you things rolling. LOAD DATA inefficient? What was the performance overhead you were facing here? Regards Bejoy K S From handheld, Please excuse typos. -Original Message- From: Bhavesh Shah

Re: Doubt in INSERT query in Hive?

2012-02-15 Thread bejoy_ks
Bhavesh In this case if you are not using INSERT INTO, you may need some tmp table write the query output to that. Load that data from there to your target table's data dir. You are not writing that to any file while doing the LOAD DATA operation. Rather you are just moving the files(in

Re: parallel inserts ?

2012-02-15 Thread bejoy_ks
Hi John Yes Insert is parallel in default for hive. Hive QL gets transformed to mapreduce jobs and hence definitely it is parallel. The only case it is not parallel is when you have just 1 reducer . It is just reading and processing the input files and in parallel using map reduce jobs

Re: external partitioned table

2012-02-08 Thread bejoy_ks
Hi Koert As you are creating dir/sub dirs using mapreduce jobs out of hive, hive is unaware of these sub dirs. There is no other way in such cases other than an add partition DDL to register the dir with a hive partition. If you are using oozie or shell to trigger your jobs,you can

Re: Error when Creating an UDF

2012-02-06 Thread bejoy_ks
Hi One of your jar is not available and may be that has the required UDF or any related methods. Hive was not able to locate your first jar '/scripts/hiveMd5.jar does not exist' Just fix this with the correct location. Everything should work fine. Regards Bejoy K S From handheld, Please

Re: Important Question

2012-01-25 Thread bejoy_ks
Real Time.. Definitely not hive. Go in for HBase, but don't expect Hbase to be as flexible as RDBMS. You need to choose your Row Key and Column Families wisely as per your requirements. For data mining and analytics you can mount Hive table over corresponding Hbase table and play on with SQL

Re: Question on bucketed map join

2012-01-19 Thread bejoy_ks
Corrected a few typos in previous mail Hi Avrila Hi Avrila    AFAIK the bucketed map join is not default in hive and it happens only when the configuration parameter hive.optimize.bucketmapjoin is set to true. You may be getting the same execution plan because hive.optimize.bucketmapjoin

Re: Insert based on whether string contains

2012-01-04 Thread bejoy_ks
I agree with Matt on that aspect. The solution proposed by me was purely based on the sample data provided where there were 3 digit comma separated values. If there are chances of 4 digit values as well in event_list you may need to revisit the solution. Regards Bejoy K S -Original

Re: Schemas/Databases in Hive

2011-12-22 Thread bejoy_ks
Ranjith Hive do support multiple data bases if you are on some of the latest versions of hive try Create database testdb; Use testdb; It should give you what you are looking for. Regards Bejoy K S -Original Message- From: Raghunath, Ranjith ranjith.raghuna...@usaa.com Date: Thu,

Re: Schemas/Databases in Hive

2011-12-22 Thread bejoy_ks
Also multiple databases have proved helpful for me in organizing tables into corresponding databases when you have quite a large number of tables to manage. Also I believe it'd be helpful in providing access restrictions. Regards Bejoy K S -Original Message- From: bejoy...@yahoo.com

Re: Loading data into hive tables

2011-12-08 Thread bejoy_ks
Adithya The answer is yes. SQOOP is the tool you are looking for. It has an import option to load data from from any jdbc compliant database into hive. It even creates the hive table for you by refering to the source db table. Hope It helps!.. Regards Bejoy K S -Original

Re: Hive query failing on group by

2011-10-19 Thread bejoy_ks
Hi Mark What does your Map reduce job logs say? Try figuring out the error form there. From hive CLI you could hardly find out the root cause of your errors. From job tracker web UI http://hostname:50030/jobtracker.jsp you can easily browse to failed tasks and get the actual exception

Re: Hive query failing on group by

2011-10-19 Thread bejoy_ks
Looks like some data problem. Were you using the GROUP BY query on same data set? But if count(*) also throws an error then it comes to square 1, installation/configuration problem with hive or map reduce. Regards Bejoy K S -Original Message- From: Mark Kerzner

Re: upgrading hadoop package

2011-09-01 Thread bejoy_ks
Hi Li AFAIK 0.21 is not really a stable version of hadoop . So if this upgrade is on a production cluster it'd be better to go in with 0.20.203. Regards Bejoy K S -Original Message- From: Shouguo Li the1plum...@gmail.com Date: Thu, 1 Sep 2011 11:41:46 To: user@hive.apache.org

Re: Re:Re: Re: RE: Why a sql only use one map task?

2011-08-25 Thread bejoy_ks
Hi Daniel In the hadoop eco system the number of map tasks is actually decided by the job basically based no of input splits . Setting mapred.map.tasks wouldn't assure that only that many number of map tasks are triggered. What worked out here for you is that you were specifying that

Re: Hive crashing after an upgrade - issue with existing larger tables

2011-08-18 Thread bejoy_ks
A small correction to my previous post. The CDH version is CDH u1 not u0 Sorry for the confusion Regards Bejoy K S -Original Message- From: Bejoy Ks bejoy...@yahoo.com Date: Thu, 18 Aug 2011 05:51:58 To: hive user groupuser@hive.apache.org Reply-To: user@hive.apache.org Subject: Hive

Re: why need to copy when run a sql with a single map

2011-08-10 Thread bejoy_ks
Hi Hive queries are parsed into hadoop map reduce jobs. In map reduce jobs, between map and reduce tasks there are two phases, copy-phase and sort-phase together known as sort and shuffle phase. So the copy task indicated in hive job here should be the copy phase of map reduce. It does the

Hive or pig for sequential iterations like those using foreach

2011-08-08 Thread bejoy_ks
Hi I've been successful using hive for a past few projects. Now for a particular use case I'm bit confused what to choose, Hive or Pig. My project involves a step by step sequential work flow. In every step I retrieve some values based on some query, use these values as input to new queries

Re: Hive or pig for sequential iterations like those using foreach

2011-08-08 Thread bejoy_ks
Thanks Amareshwari, the article gave me much valuable hints to decide my choice. But on curiosity, does hive support stage by stage iterative processing? If so how? Thank You Regards Bejoy K S -Original Message- From: Amareshwari Sri Ramadasu amar...@yahoo-inc.com Date: Mon, 8 Aug

Re: NPE with hive.cli.print.header=true;

2011-08-01 Thread bejoy_ks
Hi Ayon AFAIK hive is supposed to behave so. If you set the hive.cli.print.header=true for enabling column headers then some commands like 'desc' is not expected to work. Not sure whether there is some patch recently out for this. Regards Bejoy K S -Original Message- From: Ayon

Re: Partition by existing field?

2011-07-08 Thread bejoy_ks
Hi Travis From my understanding of your requirement, Dynamic Partitions in hive is the most suitable solution. I have written a blogpost on such requirements please refer http://kickstarthadoop.blogspot.com/2011/06/how-to-speed-up-your-hive-queries-in.html for an understanding on the

Re: Hive create table

2011-05-25 Thread bejoy_ks
Hi Jinhang I don't think hive supports multi character delimiters. The hassle free option here would be to preprocess the data using mapreduce to replace the multi character delimiter with another permissible one that suits your data. Regards Bejoy K S -Original Message- From:

Re: Hadoop error 2 while joining two large tables

2011-03-17 Thread bejoy_ks
Try out CDH3b4 it has hive 0.7 and the latest of other hadoop tools. When you work with open source it is definitely a good practice to upgrade those with latest versions. With newer versions bugs would be minimal , performance would be better and you get more functionalities. Your query looks