cebook.com/mailman/listinfo/hive-devel>] On
Behalf Of Joydeep Sen Sarma
Sent: Tuesday, October 14, 2008 12:17 AM
To: Zheng Shao; hive
Subject: Re: [hive-devel] A question about implicit type conversions
Dunno. So I guess the number type hierarchy is pretty clear.
>From whatever
Can you please send the output of 'describe extended activity_test'. This will
help us understand what's happening with all the create table parameters.
Also - as a sanity check - can you please check hadoop dfs -cat /data/sample/*
(to make sure data got loaded/moved into that dir)
-Origi
Hi Johann,
Create external table with the 'location' clause set to ur data would be the
way to go. However - Hive has it's own directory naming scheme for partitions
('='). So just pointing to a directory with
subdirectories would not work.
So right now case one would have to move or copy the
dered?
Josh
On Nov 28, 2008, at 3:00 PM, Joydeep Sen Sarma wrote:
> Hi Johann,
>
> Create external table with the 'location' clause set to ur data
> would be the way to go. However - Hive has it's own directory naming
> scheme for partitions ('='). So
Yes - from the jiras - bz2 is splitable in hadoop-0.19.
Hive doesn't have to do anything to support this (although we haven't tested
it). please mark ur tables as 'stored as textfile' (not sure if that's the
default). As long as the file as bz2 extension and hadoop has the codec that
matches th
Hi Paradisehi
The issue is that the default file system uri obtained from hadoop config
variable fs.default.name (from hadoop-default/site.xml) does not match the uri
that u are loading from.
As Zheng mentioned �C can u please use the
hdfs://namenode:x/test/shixing/log �C where ‘namenode:x
rking.
Josh
On Dec 2, 2008, at 10:30 AM, Joydeep Sen Sarma wrote:
Yes - from the jiras - bz2 is splitable in hadoop-0.19.
Hive doesn't have to do anything to support this (although we haven't tested
it). please mark ur tables as 'stored as textfile' (not sure if that
This is done already
Use: add file
This is same as �Cfile argument in hadoop streaming. U can refer to this file
by it’s last component in ‘USING’ clause.
list file
will show list of current added files
delete file
will delete from current session
From: Zh
Please use hive from http://svn.apache.org/repos/asf/hadoop/hive/trunk/
This should work with hadoop-0.19.
Will update UserGuide with this info ..
From: Bill Au [mailto:[EMAIL PROTECTED]
Sent: Thursday, December 04, 2008 1:59 PM
To: hive-user@hadoop.apache.org
S
Hi Johan - so keys and value class types are RecordIO classes?
This may need some dev work. A few things:
- traditionally our serde's have ignored the keys altogether (the row is
embedded in the value). What are the semantics for ur case?
- the jute code was written for an older version of the se
/jira/browse/HIVE-126
Thanks in advance!
/Johan
Joydeep Sen Sarma wrote:
> Hi Johan - so keys and value class types are RecordIO classes?
>
> This may need some dev work. A few things:
> - traditionally our serde's have ignored the keys altogether (the row is
> embedded in
The jobid is printed out for non-silent session execution mode.
Since there's no structured interface - I had tried to have structured data
emitted as key=value in the output stream. The relevant output emitted here is
from:
console.printInfo("Starting Job = " + rj.getJobID() + ", Trackin
We use mysql as metadb server.
Prasad can give a more detailed response when he's back - but here are the
relevant entries from our hive-default.xml:
javax.jdo.option.ConnectionURL
jdbc:mysql://xxx.yyy.facebook.com/hms_during_upgrade?createDatabaseIfNotExist=true
javax.jdo.option.Conn
Hive should work with 0.18
However - it needs to be specifically compiled with 0.18 to work. Please do
'ant -Dhadoop.version=0.18.0 package' from the source tree root to get jar
files that work with 18.
From: Martin Matula [mailto:matu...@gmail.com]
Sent: Sunday
We have done some preliminary work with indexing - but that's not the focus
right now and no code is available in the open source trunk for this purpose. I
think it's fair to say that hive is not optimized for online processing right
now. (and we are quite some ways off from columnar storage).
ee hive go toward hbase or katta. What is the long term vision for hive?
Josh
On Dec 14, 2008, at 1:06 PM, Joydeep Sen Sarma wrote:
We have done some preliminary work with indexing - but that's not the focus
right now and no code is available in the open source trunk for this purpose. I
erent fields of the same
rows, but it's not very clear what's the best way to do that.
Zheng
On Sun, Dec 14, 2008 at 3:51 PM, Josh Ferguson
mailto:j...@besquared.net>> wrote:
What would columnar organization look like and what are the benefits and
drawbacks to this?
Josh
O
be to implement this using a message queue
(publish/subscribe system). We could leverage ActiveMQ or something similar,
but that would be a bit more heavyweight but potentially people can develop or
advanced monitoring applications around it.
Ashish
____________
We should be able to control this (specify exact mapper count) once hadoop-4565
and hive-74 are resolved (these are being worked on actively).
From: Zheng Shao [mailto:zsh...@gmail.com]
Sent: Sunday, January 11, 2009 9:16 PM
To: hive-user@hadoop.apache.org
Subject
If u have a file of this type already - loading it into Hive is trivial.
- create table xxx () ... stored as sequencefile
- load data infile yyy into table xxx
assuming yyy is already in hdfs. See the wiki for additional create table
documentation: http://wiki.apache.org/hadoop
Please give a full uri - like hdfs://xxx.yyy.zzz:9000/user/...
Where xxx.yyy.zzz is the same namenode/hdfs instance where u are planning to
store the hive tables.
From: Jeremy Chow [mailto:coderp...@gmail.com]
Sent: Monday, January 12, 2009 6:17 PM
To: hive-user@
Hey Jeremy -
Looks like this was more trouble than it should have been. Can u help us by
filing a couple of Jiras on expected behavior:
1. should 'location ..' clause in create table force people to specify uri?
Or should it use fs.default.name from hadoop configuration and tell user that
i
Can u do a describe extended on the ip_locations table?
it will have a location string. It's possible that the location spec in it does
not have full uri (perhaps the table was created before the warehouse.dir was
filled in?)
some of these issues were fixed in a jira fixed by Prasad a couple of
There was a small change to the Load command a couple of days back (to fix a
different problem) and it's triggering this.
Can you apply the attached patch and check that it works.
There's no extra logging here - so looking at the code was the only option ..
From
Moral of the story - don't google around too much before writing code.
-Original Message-
From: Raghu Murthy [mailto:ra...@facebook.com]
Sent: Friday, January 23, 2009 4:01 PM
To: hive-user@hadoop.apache.org
Subject: Re: equijoin with multiple columns?
We could add trim to hive load, but
I would say package everything up in hadoop/lib to be sure. (Even the jetty
stuff is now required by the hive web server I think)
From: Prasad Chakka [mailto:pra...@facebook.com]
Sent: Sunday, January 25, 2009 10:08 AM
To: hive-user@hadoop.apache.org
Subject: Re:
Hi Josh,
Copying large number small map outputs can take a while. Can't say why the
tasktracker is not running more than one mapper.
We are working on this. hadoop-4565 tracks a jira to create splits that cross
files while preserving locality. Hive-74 will use 4565 on hive side to control
numb
Searching my computer, I find Namit quoting: "ansi sql semantics are that the
filter is executed after the join."
So there u go ..
In the same mail he suggested putting the filter condition for the table inside
the ON clause for execution before the join. So I guess u might want to try:
SELECT
Only for count(1) though. For others it still does 2mr.
See hive-223 - it does what Qing is asking for. Still not committed - so can
try out patch. 1mr with the option mentioned below. Will also do 1mr with
hive.groupby.skewindata=false for non map-side aggregate as well.
__
There are certain class of errors (out of memory types) that cannot be handled
within Hive. For such cases - doing it in Hadoop would make sense. The other
case is handling errors in user scripts. This is especially tricky - and we
would need to borrow/use hadoop techniques for retry during the
Hi Min,
One possibility is to have ur data sets stored in Hive - but for ur map-reduce
programs - use the Hive Java api's (to find input files for a table, to extract
rows from a table - etc.). That way at least the metadata about all data is
standardized in Hive. If you want to go down this ro
Unfortunately - #1 is not current Hive behavior.
We are in a weird in-between state where the deserializer exceptions are
ignored but execution exceptions are not. (there's a counter that keeps track
of deserializer errors).
There's a related question of whether we should verify the schema of t
. What is
your solution then?
BTW, is it Hive only run as a thrift service in Facebook?
On Mon, Feb 23, 2009 at 12:23 PM, Joydeep Sen Sarma
mailto:jssa...@facebook.com>> wrote:
Hi Min,
One possibility is to have ur data sets stored in Hive - but for ur map-reduce
programs - use the
We already pick up all jars from auxlib/ (both for client side and execution).
Also modifiable via -auxpath switch
From: Zheng Shao [mailto:zsh...@gmail.com]
Sent: Monday, February 23, 2009 8:29 PM
To: hive-user@hadoop.apache.org
Subject: Re: how to store UDFs in
lback.)
From: Joydeep Sen Sarma [mailto:jssa...@facebook.com]
Sent: Monday, February 23, 2009 10:16 PM
To: hive-user@hadoop.apache.org
Subject: RE: how to store UDFs in Hive system?
We already pick up all jars from auxlib/ (both for client side and execution).
Also modifiable via -au
hive-user@hadoop.apache.org
Subject: Re: How to simplify our development flow under the means of using Hive?
Hi Joydeep,
What drive your batch-processing jobs to work? Data? or a crontab script? or
your shell script?
On Tue, Feb 24, 2009 at 9:56 AM, Joydeep Sen Sarma
mailto:jssa...@faceboo
r users to manager UDFs. If Hive takes over all
UDF registration, then it might be a pain for users to upgrade the jars
containing UDFs.
Zheng
On Mon, Feb 23, 2009 at 10:40 PM, Joydeep Sen Sarma
mailto:jssa...@facebook.com>> wrote:
My bad. Obviously this doesn't work (need to call
We can write a small example program to get files for a table/partition. To
open a table using deserializer and get rows from it etc.
This would help people write java map-reduce on hive tables.
From: Zheng Shao [mailto:zsh...@gmail.com]
Sent: Tuesday, February 2
add file adds the files to the distributed cache. it's the same as the -files
option in hadoop streaming (and hadoop in general).
so u can use this option.
From: Min Zhou [coderp...@gmail.com]
Sent: Thursday, February 26, 2009 5:53 PM
To: hive-user@hadoop.apache.
Yeah - we definitely want to convert it to a MFU type flush algorithm.
If someone wants to take a crack at it before we can get to it - that would be
awesome
From: Namit Jain [mailto:nj...@facebook.com]
Sent: Friday, February 27, 2009 1:59 PM
To: hive-user@hadoop
There's also a jira open to ignore (upto threshold) exceptions from the
execution engine. That would be easy to implement and help fix this particular
scenario as well.
From: Zheng Shao [mailto:zsh...@gmail.com]
Sent: Sunday, March 01, 2009 1:35 AM
To: hive-user@
there already a JIRA for this improvement?
On 2/27/09 2:22 PM, "Joydeep Sen Sarma" wrote:
Yeah - we definitely want to convert it to a MFU type flush algorithm.
If someone wants to take a crack at it before we can get to it - that would be
awesome
__
can you describe a bit more on the format of the input file?
is it a set of serialized thrift records of the same class type? the current
ThriftDeserializer expects serialized records to be embedded inside a
BytesWritable (we make sure of this during the loading process) - but probably
not the
and loading it into Hive.. I just can't figure
out how to tell hive that the input data is a bunch of serialized thrift
records (all of the records are the "struct" type) in a TFileTransport.
Hopefully this makes sense...
-Steve
____________
From:
does Hive throw an
error?
I saw the JSON function but I think that the delimited maps/lists is a better
solution because we don't need nested maps/lists.
Thanks again!
Steve Corona
________
From: Joydeep Sen Sarma [jssa...@facebook.com]
Sent: Saturday, Ma
(also been reading up on this code a bit just now)
That's weird. It seems to be using TThreadPoolServer and that seems to just
service all requests from a single connection in one thread. (and uses the same
processor I assume that seems to initialize the session state in the interface
construct
ed to use a new thread for each connection.
From: Joydeep Sen Sarma
Reply-To:
Date: Mon, 9 Mar 2009 20:16:22 -0700
To:
Subject: RE: thread cofinement session state
(also been reading up on this code a bit just now)
That's weird. It seems to be using TThread
t work here?
From: Joydeep Sen Sarma
Reply-To:
Date: Mon, 9 Mar 2009 20:44:02 -0700
To:
Subject: RE: thread cofinement session state
Min is right. this seems a little screwed up.
The Thrift Interface handler is constructed just once for the lifetime of the
HiveServer. The sessio
uming he is using the same code as MetaStore server. AFAIK,
TThreadPoolServer is supposed to use a new thread for each connection.
________
From: Joydeep Sen Sarma http://jssa...@facebook.com>>
Reply-To: http://hive-user@hadoop.apache.org>>
Date: Mon, 9 Mar
bject: Re: thread cofinement session state
The server was keeping stay at the start point.
On Tue, Mar 10, 2009 at 1:36 PM, Joydeep Sen Sarma
mailto:jssa...@facebook.com>> wrote:
Attaching a small patch. Can u try and see if this works? (it compiles and
passes the hiveserver test)
It doe
high. I guess it
will cause a StackOverflowError when connection reaching a certain amount.
On Tue, Mar 10, 2009 at 2:16 PM, Min Zhou
mailto:coderp...@gmail.com>> wrote:
No connection right now, the server can not start well.
On Tue, Mar 10, 2009 at 2:07 PM, Joydeep Sen Sarma
ht now, the server can not start well.
On Tue, Mar 10, 2009 at 2:07 PM, Joydeep Sen Sarma
mailto:jssa...@facebook.com>> wrote:
Hey - not able to understand - does this mean it didn't work. Can u explain in
more detail what u did (how many connect
Hey - not sure if anyone responded.
Sequencefiles are the way to go if u want parallelism on the files as well
(since gz compressed files cannot be split).
One simple way to do this is to start with text files, build (potentially an
external) table on them - and load them into another table th
with the hive setting (hive.exec.compress.output=true)?
>
> Beside that I wonder how Hive deals with the key/value records in a
> sequence file.
>
> Bob
>
> Joydeep Sen Sarma schrieb:
> > Hey - not sure if anyone responded.
> >
>
eping Data compressed
Joydeep Sen Sarma schrieb:
> Can't reproduce this. can u run explain on the insert query and post the
> results?
>
I'll do this but meanwhile I figured out that it doesnt need sequence
files to get compression. I just stay with textfiles:
1. hadoop p
Yeah - that's really really surprising.
The row count is reported using hadoop counters - we haven't seen any
discrepancies so far (we use hadoop-17) - but that's one possibility.
But the count(1) is the more important one to resolve - that should definitely
be correct. Are the count results no
ble:
input format:
org.apache.hadoop.mapred.SequenceFileInputFormat
output format:
org.apache.hadoop.mapred.SequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
name: t15
Joydeep Sen Sarma schrieb:
> Can't reprod
: Friday, March 20, 2009 9:50 AM
To: hive-user@hadoop.apache.org
Subject: Re: getting different row counts on each import
Joydeep Sen Sarma schrieb:
> Yeah - that's really really surprising.
>
> The row count is reported using hadoop counters - we haven't seen any
> discrepancies
PM
To: hive-user@hadoop.apache.org
Subject: Re: getting different row counts on each import
Could this be related to Hadoop counters for compressed data being wonky?
On Fri, Mar 20, 2009 at 10:03 AM, Joydeep Sen Sarma
mailto:jssa...@facebook.com>> wrote:
Ok - is this correct summary?:
Hey - take a look at the patch for hive-333. In general this kind of file
cannot be split by hadoop (since record boundaries are unknown). I would
suggest converting these files into sequencefiles with binary records stuffed
inside bytewritables. Hive-333 has an example program that does this fo
This should work. What version of hive are u running? (it almost seems that
the add functionality is not implemented - which it has been forever. Hope you
aren't using hive from the contrib. section of hadoop-19)
From: Manhee Jo [mailto:j...@nttdocomo.com]
Sent:
t seems that the file name is not quoted.
We need to use either single or double quotation mark (' or ") to quote the
whole path.
Zheng
On Wed, May 13, 2009 at 8:40 PM, Joydeep Sen Sarma
mailto:jssa...@facebook.com>> wrote:
This should work. What version of hive are u run
Hi folks,
I have put up a short tutorial on running SQL queries on EC2 against files in
S3 using Hive and Hadoop. Please find it here:
http://wiki.apache.org/hadoop/Hive/HiveAws/HivingS3nRemotely
Some example data and queries (from TPCH benchmark) are also made available in
S3.
Cc'ing core-us
Yeah - we will get the 0.20 patch committed before 0.4
From: Zheng Shao [mailto:zsh...@gmail.com]
Sent: Sunday, June 14, 2009 7:20 PM
To: hive-user@hadoop.apache.org
Subject: Re: Query execution error on cast w/ lazyserde w/ join ...
There is currently no way to g
Sorry - this is also needed as part of hive-487:
In hadoop-20 - the -libjars has to come after the jar file/class
Please try applying this patch to bin/ext/cli.sh
--- cli.sh (revision 789726)
+++ cli.sh (working copy)
@@ -10,7 +10,7 @@
exit 3;
fi
- exec $HADOOP jar $AUX_JARS_CMD_LIN
hey - not sure there was a reply. there's likely to be a fuller stack trace in
the hive log file .. (whose path should be mentioned somewhere in the config
files). that info would help debugging this further.
From: Neal Richter [nrich...@gmail.com]
Sent: S
i hate this message: 'THIS PAGE WAS MOVED TO HIVE XDOCS ! DO NOT EDIT!Join
Syntax'
why must edits to the wiki be banned if there are xdocs? hadoop has both.
there will always be things that are not captured in xdocs. it's pretty sad to
discourage free form edits by people who want to contribute
Lei - not sure I understand the question. I tried to document the relationship
between hive, MR and local-mode at
http://wiki.apache.org/hadoop/Hive/GettingStarted#Hive.2C_Map-Reduce_and_Local-Mode
recently. perhaps you have already read it.
Regarding whether local mode can be run on windows or
68 matches
Mail list logo