Small files under SequenceFile table partition directories

2015-11-10 Thread reveen joe
Hi, Most of our Hive tables are SequenceFile tables and there are currently many small file ranging from *1-4 MB* under the Partition directories (created by insert-overwrite). I am assuming this is due to 2 reasons 1. Some of our tables are Bucketed and so individual files are created for each b

Re: Compare Query Execution duration between ORC and SequenceFile

2015-11-10 Thread reveen joe
> Hi, > > I understand that data retrieval against an ORC table can be much faster > than a SequenceFile table when a *subset of columns* are selected. > > I am assuming Query Execution duration would be faster even when *all the > columns* in a given a partition are selected but not very sure abou

Re: Cross join/cartesian product explanation

2015-11-10 Thread Gopal Vijayaraghavan
>I¹m having trouble doing a cross join between two tables that are too big >for a map-side join. The actual query would help btw. Usually what is planned as a cross-join can be optimized out into a binning query with a custom UDF. In particular with 2-D geo queries with binning, which people ten

Which user should start the local task if Hive impersonation is enabled

2015-11-10 Thread Jim Green
Hi Team, I am trying to understand what is the expected behavior of Hive impersonation is enabled. Say HiveServer2 process is running as userA, and userB is connecting to beeline. If userB create a table, the table file should owned by userB because impersonation is enabled. However If userB is

Re: Hive and HBase

2015-11-10 Thread Jörn Franke
Probably it is outdated. Hive can access hbase tables via external tables. The execution engine in Hive can be mr, tez, spark. Hiveql is nowadays very similar to sql . In fact, Hortonworks plans to make it sql2011:analytics compatible. Hbase can be accessed independently of Hive via sql using P

Re: Hive and HBase

2015-11-10 Thread Ashok Kumar
Hi, It is from Hadoop 2 Quick-Start Guide: Learn the Essentials of Big Data Computing in the Apache Hadoop 2 Ecosystem (Addison-Wesley Data & Analytics Series) (Kindle Locations 735-738). Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summary, ad hoc q

Re: Hive and HBase

2015-11-10 Thread Binglin Chang
> > Hive transparently translates queries into MapReduce jobs that are > executed in HBase I think this is not correct, are you sure it is from some book? On Tue, Nov 10, 2015 at 6:56 PM, Ashok Kumar wrote: > hi, > > I have read in a book about Hadoop that says > > Apache Hive is a data wareh

Re: Hive and HBase

2015-11-10 Thread Ipremyadav
Hbase doesn't allow sql like queries. Its built for a use case different from hive. If you need a full fledged column based database, you can use hbase. If you have a lot if data already there in files on hdfs and you want a sql like interface to query the data, hive is useful there. You can

Hive and HBase

2015-11-10 Thread Ashok Kumar
hi, I have read in a book about Hadoop that says Apache Hive is a data warehouse infrastructure built on top of Hadoop for providing data summary, ad hoc queries, and the analysis of large data sets using an SQL-like language called HiveQL. Hive transparently translates queries into MapReduce j

how to close local map join process

2015-11-10 Thread Fun
Hi every one: When I run map join sql on hive1.2.0; hive always launch local task to process map join. but i set hive.exec.mode.local.auto=false; why hive still launch local task? how can i close it?? tkx