Side channel for transform stats

2010-09-02 Thread Yun Huang Yong
I'm using a TRANSFORM mapper script to expand web logs and am wondering if there is a recommended way to capture side channel stats in an accurate manner. For e.g. if I wanted to count the number of 404 entries. What I want to avoid is writing this data out to some side channel which, if a map

Do you set hadoop.job.ugi in Hive? If so, we need to talk.

2010-09-02 Thread Carl Steinbach
Hi, In Hadoop 0.20 and older versions you can set the variable hadoop.job.ugi to basically change your username to whatever you like. This allows you to easily fudge HDFS permissions and run your mapreduce jobs as another user. This is a handy trick if a bunch of people want to share one HDFS

GroupByOperator class confuse , it will result in out of memeory

2010-09-02 Thread lei liu
I find GroupByOperator cache the Aggregation results of different keys. Please look below cod: AggregationBuffer[] aggs = null; boolean newEntryForHashAggr = false; keyProber.hashcode = newKeys.hashCode(); // use this to probe the hashmap keyProber.keys = newKeys; //

Re: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.ExecDriver

2010-09-02 Thread Robert Hennig
Hello, Thanks Shirjeet for your answer. I found an execption in a task log which results from a casting error: Caused by: java.lang.ClassCastException: org.apache.hadoop.mapred.FileSplit cannot be cast to com.adconion.hadoop.hive.DataLogSplit at

Re: Having a Connections Leak with the Hive Server

2010-09-02 Thread Dave Brondsema
Scott, after re-reading your original email, I'm thinking maybe we didn't have the same problem. Hive crashed for us when it ran out of file descriptors, it didn't hang. Nonetheless, an upgrade may help. On Wed, Sep 1, 2010 at 10:24 AM, Scott Whitecross swhitecr...@gmail.comwrote: Thanks

RE: Having a Connections Leak with the Hive Server

2010-09-02 Thread Bennie Schut
We had some leaking file descriptors which ended up being a problem in hadoop. They fixed it on 0.21 but not on older versions. There is a workaround for hive which we successfully use. By adding this to your hive-site.xml: !-- workaround for connection leak problem fixed in HADOOP-5476 but

Re: Do you set hadoop.job.ugi in Hive? If so, we need to talk.

2010-09-02 Thread Edward Capriolo
On Thu, Sep 2, 2010 at 4:14 AM, Carl Steinbach c...@cloudera.com wrote: Hi, In Hadoop 0.20 and older versions you can set the variable hadoop.job.ugi to basically change your username to whatever you like. This allows you to easily fudge HDFS permissions and run your mapreduce jobs as another

Does Hive trunk version work with Hadoop release 0.21.0?

2010-09-02 Thread Ping Zhu
Dear all, I did not find any related statements regarding this issue either online or within trunk documentation. Anyone can confirm about this? Thanks. Ping

RE: how to call fetchN method in HiveServerHandler class by JDBC

2010-09-02 Thread Steven Wong
The Hive JDBC driver is calling fetchOne. The driver would need to be changed to call fetchN instead. Steven From: lei liu [mailto:liulei...@gmail.com] Sent: Tuesday, August 31, 2010 8:58 PM To: hive-user@hadoop.apache.org Subject: how to call fetchN method in HiveServerHandler class by JDBC

Hive launching next job in sequence - even if preceding one fails

2010-09-02 Thread Shrijeet Paliwal
Hi, Hive Version: 0.5 Hadoop Version: 0.20 Job submitted through: Hive Command Line I noticed this today after upgrading to hive 0.5 (we moved up from 0.4 very recently). A complex query is made of four map reduce jobs as indicated in the log entries below. The first job fails, but the second in

Re: Does Hive trunk version work with Hadoop release 0.21.0?

2010-09-02 Thread Neil Xu
Not yet, the trunk version can only work with Hadoop 0.17.*, 0.18.*, 0.19.* 0.20.* Neil 2010/9/3 Ping Zhu p...@sharethis.com Dear all, I did not find any related statements regarding this issue either online or within trunk documentation. Anyone can confirm about this? Thanks. Ping