Re: Is there a way to hint Hive the reduce key will be evenly distributed?

2009-02-19 Thread Qing Yan
hive.groupby.skewindata=false That's exactly what I am looking for. Does this parameter also apply for Join?

Re: Is there a way to hint Hive the reduce key will be evenly distributed?

2009-02-19 Thread Zheng Shao
Hive Join uses a single map-reduce job. Zheng On Thu, Feb 19, 2009 at 12:23 AM, Qing Yan qing...@gmail.com wrote: hive.groupby.skewindata=false That's exactly what I am looking for. Does this parameter also apply for Join? -- Yours, Zheng

Error input handling in Hive

2009-02-19 Thread Qing Yan
Say I have some bad/ill-formatted records in the input, is there a way to configure the default Hive parser to discard those records directly(e.g. when a integer column get a string)? Besides, is the new skip-bad-records feature in 0.19 accessible in Hive? It is a quite handy feature in the real

Re: Error input handling in Hive

2009-02-19 Thread Zheng Shao
Hi Qing, That's a good idea. Can you open a jira? There are lots of details before we can add that feature to Hive. For example, how to specify the largest number of data corruption that can be accepted, by absolute number or percentage, etc. What about half corrupted records in case we only need

Re: Is there a network interface?

2009-02-19 Thread Edward Capriolo
The best way to answer this is that all hadoop components work remotely, assuming you have the proper configuration and library files (the same ones from the remote cluster) I attached a HiveLet (Made up term). It was my first API testing program. It is more or less a 'One Shot', run the query

Re: Is there a network interface?

2009-02-19 Thread Raghu Murthy
Hive supports both a Thrift service as well as a partial JDBC interface. Check out sample usage in service/src/test and jdbc/src/test. I can help you set up the thrift service if you have problems. On 2/19/09 2:16 PM, Edward Capriolo edlinuxg...@gmail.com wrote: The best way to answer this is

Re: Is there a network interface?

2009-02-19 Thread Dhruba Borthakur
I attached a HiveLet (Made up term) That's a cool name!

Re: Is there a network interface?

2009-02-19 Thread Gary Richardson
Hi Guys, That's a big head start. It looks like I need to: 1) Configure Hive to use Derby as a meta db 2) Launch the hive thrift service with bin/hive --service hiveserver 3) Using the thrift api, I should be able to send queries from remote hosts Am I missing anything from there? Thanks! On

Re: Error input handling in Hive

2009-02-19 Thread Qing Yan
Hi Zheng, I have opened a Jira(HIVE295). IMHO there are three steps errors can be handled: 1) Always fail. One bad record and whole job fails which is the current Hive behavior. 2) Always success. Ignoring bad records(save them somewhere to allow further analysis) and job still successes. 3)