Hi,
Most of our Hive tables are SequenceFile tables and there are currently
many small file ranging from *1-4 MB* under the Partition directories
(created by insert-overwrite). I am assuming this is due to 2 reasons
1. Some of our tables are Bucketed and so individual files are created for
each b
> Hi,
>
> I understand that data retrieval against an ORC table can be much faster
> than a SequenceFile table when a *subset of columns* are selected.
>
> I am assuming Query Execution duration would be faster even when *all the
> columns* in a given a partition are selected but not very sure abou
>I¹m having trouble doing a cross join between two tables that are too big
>for a map-side join.
The actual query would help btw. Usually what is planned as a cross-join
can be optimized out into a binning query with a custom UDF.
In particular with 2-D geo queries with binning, which people ten
Hi Team,
I am trying to understand what is the expected behavior of Hive
impersonation is enabled.
Say HiveServer2 process is running as userA, and userB is connecting to
beeline.
If userB create a table, the table file should owned by userB because
impersonation is enabled.
However If userB is
Probably it is outdated.
Hive can access hbase tables via external tables. The execution engine in Hive
can be mr, tez, spark. Hiveql is nowadays very similar to sql . In fact,
Hortonworks plans to make it sql2011:analytics compatible.
Hbase can be accessed independently of Hive via sql using P
Hi,
It is from Hadoop 2 Quick-Start Guide: Learn the Essentials of Big Data
Computing in the Apache Hadoop 2 Ecosystem (Addison-Wesley Data & Analytics
Series) (Kindle Locations 735-738).
Apache Hive is a data warehouse infrastructure built on top of Hadoop for
providing data summary, ad hoc q
>
> Hive transparently translates queries into MapReduce jobs that are
> executed in HBase
I think this is not correct, are you sure it is from some book?
On Tue, Nov 10, 2015 at 6:56 PM, Ashok Kumar wrote:
> hi,
>
> I have read in a book about Hadoop that says
>
> Apache Hive is a data wareh
Hbase doesn't allow sql like queries. Its built for a use case different from
hive.
If you need a full fledged column based database, you can use hbase.
If you have a lot if data already there in files on hdfs and you want a sql
like interface to query the data, hive is useful there.
You can
hi,
I have read in a book about Hadoop that says
Apache Hive is a data warehouse infrastructure built on top of Hadoop for
providing data summary, ad hoc queries, and the analysis of large data sets
using an SQL-like language called HiveQL.
Hive transparently translates queries into MapReduce j
Hi every one:
When I run map join sql on hive1.2.0; hive always launch local task to
process map join. but i set hive.exec.mode.local.auto=false; why hive still
launch local task? how can i close it??
tkx
10 matches
Mail list logo