hive.groupby.skewindata=false That's exactly what I am looking for. Does
this parameter also apply for Join?
Hive Join uses a single map-reduce job.
Zheng
On Thu, Feb 19, 2009 at 12:23 AM, Qing Yan qing...@gmail.com wrote:
hive.groupby.skewindata=false That's exactly what I am looking for. Does
this parameter also apply for Join?
--
Yours,
Zheng
Say I have some bad/ill-formatted records in the input, is there a way to
configure the default Hive parser to discard those records directly(e.g.
when a integer column get a string)?
Besides, is the new skip-bad-records feature in 0.19 accessible in Hive?
It is a quite handy feature in the real
Hi Qing,
That's a good idea. Can you open a jira?
There are lots of details before we can add that feature to Hive. For
example, how to specify the largest number of data corruption that can
be accepted, by absolute number or percentage, etc. What about half
corrupted records in case we only need
The best way to answer this is that all hadoop components work
remotely, assuming you have the proper configuration and library files
(the same ones from the remote cluster)
I attached a HiveLet (Made up term). It was my first API testing
program. It is more or less a 'One Shot', run the query
Hive supports both a Thrift service as well as a partial JDBC interface.
Check out sample usage in service/src/test and jdbc/src/test. I can help you
set up the thrift service if you have problems.
On 2/19/09 2:16 PM, Edward Capriolo edlinuxg...@gmail.com wrote:
The best way to answer this is
I attached a HiveLet (Made up term)
That's a cool name!
Hi Guys,
That's a big head start. It looks like I need to:
1) Configure Hive to use Derby as a meta db
2) Launch the hive thrift service with bin/hive --service hiveserver
3) Using the thrift api, I should be able to send queries from remote hosts
Am I missing anything from there?
Thanks!
On
Hi Zheng,
I have opened a Jira(HIVE295).
IMHO there are three steps errors can be handled:
1) Always fail. One bad record and whole job fails which is the current Hive
behavior.
2) Always success. Ignoring bad records(save them somewhere to allow
further analysis) and job still successes.
3)