Hi all,
I have 3 tables in Mysql and i want to club the data of 3 tables in 1
table of hive.(create a data warehouse).I created table with all the
columns of 3 tables but i am unable to push data in table of
hive.After running an import statement of sqoop i pulled all the
records in hdfs but At a
Hi,
are you using import-all-tables tool ? if this is make sure that you respect
consigns of this sqoop tool.
See the sqoop user guide :
http://sqoop.apache.org/docs/1.4.0-incubating/SqoopUserGuide.html#id1766722
For more information.
-Message d'origine-
DeĀ : iwannaplay games
thanks i did it by creating 3 external tables and then using this
query to update createddate from users table for a particular userid.
insert overwrite table userinfo
select u.userid,a.createddate from users a join userinfo u on u.userid=a.userid
i can use query option also
I ll try that now :)
Latest eclipse birt release has Hive and Hadoop connector.
Artem Ervits
Data Analyst
New York Presbyterian Hospital
From: Techy Teck [mailto:comptechge...@gmail.com]
Sent: Tuesday, July 31, 2012 08:46 PM
To: user@hive.apache.org user@hive.apache.org
Subject: Best Report Generating tools for
Hi, there
I'm writing mapreduce to replace some hive query and I find that my
mapper is slow than hive's mapper. The Hive query is like:
select sum(column1) from table group by column2, column3;
My mapreduce program likes this:
public static class HiveTableMapper extends
This is actually not surprising. Hive is essentially a MapReduce compiler. It
is common for regular compilers (C, C#, Fortran) to emit faster assembler code
than you write yourself. Compilers know the tricks of their target language.
Chuck Connell
Nuance RD Data Team
Burlington, MA
One hint would be to reduce the number of writable instances you need.
Create the object once and reuse it.
By the way, Hive do not use Writable. ;)
Bertrand
On Wed, Aug 1, 2012 at 4:35 PM, Connell, Chuck chuck.conn...@nuance.comwrote:
This is actually not surprising. Hive is essentially a
Hive don't use Writable?!!. Could you please give me a pointer to hive
code to see how they do the job?
I check the map output record. I find this:
my case:
total mapper input record: 23091348
total mapper output record: 23091348
avg mapper output bytes/record: 34.819994
total combiner output
As mentioned, if you avoid using new, by re-using objects and possibly
use buffer objects you may be able to match or beat the speed. But in
the general case the hive saves you time by allowing you not to worry
about low level details like this.
On Wed, Aug 1, 2012 at 10:35 AM, Connell, Chuck
I am not sure about Hive but if you look at Cascading they use a pseudo
combiner instead of the standard (I mean Hadoop's) combiner.
I guess Hive has a similar strategy.
The point is that when you use a compiler, the compiler does smart thing
that you don't need to think about (like loop
Hive does not use combiners it uses map side aggregation. Hive does
use writables, sometimes it uses ones from hadoop, sometimes it uses
its own custom writables for things like timestamps.
On Wed, Aug 1, 2012 at 11:40 AM, Bertrand Dechoux decho...@gmail.com wrote:
I am not sure about Hive but
Hey Hive gurus -
Does anyone know how the CLI handles metastore connection timeouts? It
seems if I leave a CLI session idle more than
hive.metastore.client.socket.timeout seconds then run show tables,
the cli hangs for the timeout then throws a SocketTimeoutException.
Restarting the CLI and
My bad. I wasn't sure, at least I know now. But other solutions may use
other 'Serialization' strategies like Thrift (which is only other
customisation point of Hadoop).
Bertrand
On Wed, Aug 1, 2012 at 5:49 PM, Edward Capriolo edlinuxg...@gmail.comwrote:
Hive does not use combiners it uses map
The story here is that we have a work flow based on hive queries. It
takes several stages to get to the final data. For each stage, we have a
hive table. And we try to write the whole work flow in mapreduce.
Ideally, it will remove all the intermediate process and take two rounds
of mapreduce
Cloudera has connector with microstrategy and Tableau.
Looks like Cloudera Might have better working versions in 4.x releases. Wort=
h checking.
Datameer is another tool that also connects to hive in their new release and=
let y
ou analyse data And generate reports and graphs.
Thanks,
Anurag
Are you communicating with a thrift metastore or a JDBC metastore? I
have had connections opened for long periods of time and never
remember experiencing them timeout.
Edward
On Wed, Aug 1, 2012 at 12:01 PM, Travis Crawford
traviscrawf...@gmail.com wrote:
Hey Hive gurus -
Does anyone know
What is the difference between storing the data as a TextFile and
SequenceFile? And which will be faster while doing Hive queries.
I am creating a table like this-
create table quality
( id bigint,
total_chkout bigint,
total_errpds bigint
)
partitioned by (ds string)
row format delimited
I'm using the thrift metastore via TFramedTransport. What value do you
specify for hive.metastore.client.socket.timeout? I'm using 60.
If I open the CLI, run show tables, wait the timeout period, then
run show tables the CLI hangs in:
main prio=10 tid=0x4151a000 nid=0x448 runnable
I feel that that interface is very rarely used in the wild. The only
use case I can figure out for it is people with very in depth hive
experience that do not wish to interact with hive through the QL
language. That being said I would think the coverage might be a little
weak there. With the local
How can I efficiently store data in Hive and also store and retrieve
compressed data in hive?
Currently I am storing it as a TextFile.
I was going through Bejoy article (
http://kickstarthadoop.blogspot.com/2011/10/how-to-efficiently-store-data-in-hive.html)
and I found that LZO compression will
Oh interesting - you're saying instead of running a single
HiveMetaStore thrift service, most users use the embedded
HiveMetaStore mode and have each CLI instance connect to the DB
directly?
--travis
On Wed, Aug 1, 2012 at 11:47 AM, Edward Capriolo edlinuxg...@gmail.com wrote:
I feel that that
The two setup options are:
cli-thriftmetastore-jdbc
cli-jdbc (used to be called local mode)
localmode has less moving parts so I prefer it.
On Wed, Aug 1, 2012 at 2:54 PM, Travis Crawford
traviscrawf...@gmail.com wrote:
Oh interesting - you're saying instead of running a single
HiveMetaStore
Interesting - this issue would certainly go away with local mode as
there's no thrift call to fail. I'd very much prefer to run HMS as a
centralized service though.
Thanks for the info - I'll have to take a look at how the thrift
client handles timeouts/reconnects/etc.
--travis
On Wed, Aug 1,
I am trying to load data in to the date partition, so my data got
succesfully loaded for 20120709 but when I tried to load the data for
*20120710,
* then I am seeing the below exception. Can anyone suggest me why is it
happening like this?
*Loading data to table data_quality partition
Hi Techy this error use to appeare when the user executing the query has
not permisions into the origin or target folder, if you create a single
table (no externa) is probable that you has not permissions to write into
/user/hive
Respect to your before question, i am using snappy to compress the
25 matches
Mail list logo