Hi,
I am facing the following problem while trying to store/use a huge
partitioned table with 1000+ columns in Hive. I would like to know how to
solve this problem either using hive or any other store.
Requirement:
1).There is a table with around 1000+ columns which is partitioned by date.
2).Ev
ucceed due to VERTEX_FAILURE.
failedVertices:1 killedVertices:0
Regards
Saurabh Mishra
.
Regards,
Saurabh
vast topic and cannot be described, however
some quick pointers will be helpful.
I am currently working on:
Query vectorization and COB with ORC tables.
Thanks,
Saurabh
ase let me know if any more information is required on the same.
Thanks,
Saurabh
This is not open source but we are using Vertica and it works very nicely
for us. There is a 1TB community edition but above that it costs money.
It has really advanced SQL (analytical functions, etc), works like an
RDBMS, has R/Java/C++ SDK and scales nicely. There is a similar option of
Redshift
Hi,
I need some inputs to execute hive queries in parallel. I tried doing this
using CLI (by opening multiple ssh connection) and executed 4 HQL's; it was
observed that the queries are getting executed sequentially. All the FOUR
queries got submitted however while the first one was in execution mod
ividual text documents), but it does
> get through all the mechanics of exactly what you state you want.
>
> The meetup page also has links to video, if the slides don't give enough
> context.
>
> HTH
>
> [1]: http://www.meetup.com/Data-Science-MD/events/111081282/
&
Hi Nitin,
No offense taken. Thank you for your response. Part of this is also trying
to find the right tool for the job.
I am doing queries to determine the cuts of tweets that I want, then doing
some modest normalization (through a python script) and then I want to
create sequenceFiles from that
Hi,
I have a lot of tweets saved as text. I created an external table on top of
it to access it as textfile. I need to convert these to sequencefiles with
each tweet as its own record. To do this, I created another table as a
sequencefile table like so -
CREATE EXTERNAL TABLE tweetseq(
tweet ST
Hi,
I have a lot of tweets saved as text. I created an external table on top of
it to access it as textfile. I need to convert these to sequencefiles with
each tweet as its own record. To do this, I created another table as a
sequencefile table like so -
CREATE EXTERNAL TABLE tweetseq(
tweet ST
Hi,
I have a lot of tweets saved as text. I created an external table on top of
it to access it as textfile. I need to convert these to sequencefiles with
each tweet as its own record. To do this, I created another table as a
sequencefile table like so -
CREATE EXTERNAL TABLE tweetseq(
tweet ST
Hi all,
Below are some of observations based on the on-going rank function
discussion.
1. I executed below mentioned queries and only the query with "rank"
(lowercase) executed successfully, rest were throwing exceptions "FAILED:
SemanticException Failed to breakup Windowing invocations into Gro
/h/tpc-h-impala/data/supplier.tbl';
I assume that "supplier.tbl" is a directory and the csv file is present in
the same.
Let me know if it worked!
Thanks,
Saurabh
On Thu, Jul 18, 2013 at 1:55 AM, Mainak Ghosh wrote:
> Hello,
>
> I have just started using Hive and I w
ce.com
To: user@hive.apache.org
Subject: Re: Connecting to Hive from R through JDBC
Date: Wed, 8 May 2013 00:27:35 +
Hi Saurabh
The usual suspect looks like hive-server service is not running on server where
hive is installed….The hive-server service needs to be installed and
started….It
river through following command:drv <-
JDBC('org.apache.hadoop.hive.jdbc.HiveDriver',
'C:/Users/Saurabh/Documents/RWork/hive-jdbc-0.9.0-cdh4.1.2.jar')
But when I try to make the connection using the following command:conn <-
dbConnect(drv,
'jdbc:hi
Hi,
When i try to insert some data into a hive table mapped to a specific location
in the HDFS, the file which gets created has user information as 'hive' and
permissions as '755' i.e 'rwxr-xr-x' Is there any way to change this so that i
can give my own usename or atleast the user from where i h
insensitively
*
* @author Saurabh
*/
public class UDAFCaseInsensitiveDistinctMerge extends UDAF {
/**
* Default Separator Defined and used unless overriden.
*/
private static final String DEFAULT_SEPARATOR = ";";
/**
* Nested Class to Store the Updated Set of Unique E
;
then this configuration i am already using, but to no avail...:(
Date: Tue, 16 Oct 2012 14:17:47 +0900
Subject: Re: Hive Query Unable to distribute load evenly in reducers
From: navis@nexr.com
To: user@hive.apache.org
How about using MapJoin?
2012/10/16 Saurabh Mishra
no there is
@hive.apache.org
How about using MapJoin?
2012/10/16 Saurabh Mishra
no there is apparently no heavy skewing. also another stats i wanted to point
was, following is approximate table contents in this 4 table join query :
tableA : 170 million (actual number, + i am also exploding these records, so
the
Query Unable to distribute load evenly in reducers
> From: philip.j.trom...@gmail.com
> To: user@hive.apache.org
>
> Is your data heavily skewed towards certain values of a.x etc?
>
> On 15 October 2012 15:23, Saurabh Mishra
> wrote:
> > The queries are simple joins, somet
gt; To: user@hive.apache.org
>
> And your queries were?
>
> On Mon, Oct 15, 2012 at 8:09 PM, Saurabh Mishra
> wrote:
> > Hi,
> > I am firing some hive queries joining tables containing upto 30millions
> > records each. Since the load on the reducers is very significa
any way to overcome this load distribution disparity.
Any help in this regards will be highly appreciated.
Sincerely
Saurabh Mishra
Is it possible to write Hive UDFs in Python? I googled but didn't find
anything. I would be happy with RTFM replies if you can give link to the manual.
l pointer exception
> To: user@hive.apache.org
>
> Which version of Hive are you running?
>
> On Fri, Jun 1, 2012 at 3:49 PM, Saurabh S wrote:
> >
> > Well it seems that simply moving the set header statement after the 'c
ver.java:490)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
> --
>
> Any idea what's going on?
>
> Regards,
> Saurabh
>
ssorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:197)
--
Any idea what's going on?
Regards,
Saurabh
As far as I understand, there is no equivalent of MySQL group_concat() in Hive.
This stackoverflow question is from Sept 2010:
http://stackoverflow.com/questions/3703740/combine-multiple-rows-into-one-space-separated-string
Does anyone know any other method to create a delimited list from from
Whew, thanks everyone! I think wrapping quotes around that did it.
Nicole, I was going to attempt that as a last resort. But the actual query is
much longer and it would be extremely undesirable to do so.
Regards,
Saurabh
> From: nicole
nd "select ${hiveconf:ref_date} from
dummytable limit 1" produces "1999".
I noticed that there is an option to "set hive.variable.substitute=false;", but
in that case, hive throws the following error:
FAILED: Parse Error: line 3:7 cannot recognize input near '$
Hi,
How do I get the current date in Hive? Specifically, I’m
looking for the equivalent of following SQL where clause:
where LOCAL_DT >= current date - 3 day
I tried using
where local_dt >= date_sub(to_date(unix_timestamp()), 3)
but this method seems to be many times slower than
I have a table with three columns, A, B, and Score, where A and B are some
items, and Score is some kind of affinity between A and B. There are N number
of items of each A and B, so that the total number of rows in the table are N^2.
Is there a way to fetch "top 5 items in B" for each item in A
after the ‘3’ but before the tab? Matt Tucker
From: Saurabh S [mailto:saurab...@live.com]
Sent: Wednesday, March 28, 2012 2:45 PM
To: user@hive.apache.org
Subject: RE: Help in aggregating comma separated values Thanks for the reply,
Matt. This is exactly what I'm looking for. I'll l
ues, ",")) values_tbl as value
> GROUP BY id, value
>
>
>
> Matt Tucker
>
> -Original Message-
> From: Saurabh S [mailto:saurab...@live.com]
> Sent: Wednesday, March 28, 2012 2:21 PM
> To: user@hive.apache.org
> Subject: Help in aggregating comma separate
stion rather than one specific to
Hive, but I'm at a roadblock here.
Thanks,
Saurabh
How do I get the length of an array in Hive?
Specifically, I'm looking at the following problem: I'm splitting a column
using the split() function and a pattern. However, the resulting array can have
variable number of entries and I want to handle each case separately.
ip.j.trom...@gmail.com
> To: user@hive.apache.org
>
> I guess that split(...)[1] is giving you what's inbetween the 1st and
> 2nd '/' character, which is nothing. Try split(...)[2].
>
> Phil.
>
> On 1 March 2012 21:19, Saurabh S wrote:
> > Hello,
> >
>
reason, that function on my
database is running extremely slow.
First time posting to this list. If there is anything wrong, please let me know.
Regards,
Saurabh
metastore DB.
Thanks!
Saurabh Bajaj | Senior Business Analyst | +91 9986588089 |
www.mu-sigma.com<http://www.mu-sigma.com/> |
From: Saurabh Bajaj
Sent: Tuesday, January 10, 2012 2:44 PM
To: 'user@hive.apache.org'
Subject: Error in running Hive with Postgresql as metastore DB
Hi
y this error would be occuring.
Thanks in advance!
Saurabh Bajaj
+91 9986588089
This email message may contain proprietary, private and confidential
information. The information transmitted is intended only for the person(s) or
entities to whi
40 matches
Mail list logo