Re: way to add custom udf jar in hadoop 2.x version

2014-12-31 Thread Binglin Chang
On hiveserver machine, create a dir: HIVE_HOME/auxlib , and add all extend jars here, when hiveserver2 starts, it will automatically pick up all jars in this directory and set hive.aux.jars.path properly. So every new session will first add those jars automatically, you don't need to add those jars

Re: How to config node group in hadoop

2014-12-16 Thread Binglin Chang
Some user guide doc is added in HDFS-6261,but have not committed. But you can still get those docs in the patch. https://issues.apache.org/jira/browse/HDFS-6261 On Wed, Dec 17, 2014 at 3:04 PM, zhangyujian1984 wrote: > > hello: > > recent hadoop supports BlockPlacementPolicyWithNodeGroup

Re: hadoop 2.4.1 build failure on CentOS 6

2014-08-05 Thread Binglin Chang
Could you attach the entire log? On Tue, Aug 5, 2014 at 3:43 PM, Romu wrote: > Yes, cmake is installed. > > # rpm -qa|grep cmake > cmake-2.6.4-5.el6.x86_64 > > > Regards, > Romu > > > 2014-08-05 15:36 GMT+08:00 Binglin Chang : > > do you install cmak

Re: hadoop 2.4.1 build failure on CentOS 6

2014-08-05 Thread Binglin Chang
do you install cmake? if not, do not use -Pnative to build hadoop:) On Tue, Aug 5, 2014 at 2:57 PM, Romu wrote: > Hi, > > I tried to build hadoop 2.4.1 in a CentOS 6 x86_64 vm but failed. Maven > 3.2.2 (installed from official bin tar ball) > > >- Maven: 3.2.2 >- Java: Oracle jdk 1.8.0

Re: question about balance info

2014-05-29 Thread Binglin Chang
there are multiple replicas for a block, we don't need to move a block from source -> dest, just find a node with the block, and copy it to dest, then delete replica on the source. Why? We can choose a node which is closest to dest, or the source node is busy(too many blocks to move), make sense?

Re: why can FSDataInputStream.read() only read 2^17 bytes in hadoop2.0?

2014-03-06 Thread Binglin Chang
the semantic of read does not guarantee read as much as possible. you need to call read() many times or use readFully On Fri, Mar 7, 2014 at 1:32 PM, hequn cheng wrote: > Hi~ > First, i use FileSystem to open a file in hdfs. > FSDataInputStream m_dis = fs.open(...); > > Second, read th

Re: HDFS snapshots restore

2013-11-28 Thread Binglin Chang
snapshot restore feature is not implemented yet. Currently you can use distcp to copy snapshot dir to your new cluster, suppose your hive dir is /user/hive/, snapshot dir is /user/hive/.snapshot/sn0, you can: distcp hfds://oldcluster:8020/user/hive/.snapshot/sn0 hdfs://newcluster:8020/somedir O

Re: Is there any way to set Reducer to output to multi-places?

2013-09-02 Thread Binglin Chang
MultipleOutputFormat allows you to write multiple files in one reducer, but can't write output to HDFS and Database concurrently, but I is a good example to show how you can write a customized OutputFormat to achieve this. Please note that for fault tolerance, a reducer may run multiple times, this

Re: Helper files in python

2013-08-28 Thread Binglin Chang
If you mean to make those files available in map/reduce tasks: How about put them in one directory, say "app", and specify -file app -mapper app/xxx -reducer app/xx... , hadoop will pack entire dir for you. On Thu, Aug 29, 2013 at 7:49 AM, Chengi Liu wrote: > Hi, > I have four files > mapper.

Re: Is there any possible way to use hostname variable in mapred-site.xml file

2013-08-15 Thread Binglin Chang
How about add -Dhost.name=`hostname` in HADOOP_OPTS and get this variable in config file ${host.name} ? I have not tried this, you can try this. On Thu, Aug 15, 2013 at 5:26 PM, Kun Ling wrote: > Hi all, >I have a Hadoop MapReduce Cluster. In which I want to adjust the > mapred.local.dir, s

Re: How to import custom Python module in MapReduce job?

2013-08-12 Thread Binglin Chang
But script failed, and from logs I see that lib.jar hasn't been unpacked. > What am I missing? > > > > > On Mon, Aug 12, 2013 at 11:33 AM, Binglin Chang wrote: > >> Hi, >> >> The problem seems to caused by symlink, hadoop uses file cache, so every &

Re: How to import custom Python module in MapReduce job?

2013-08-12 Thread Binglin Chang
Hi, The problem seems to caused by symlink, hadoop uses file cache, so every file is in fact a symlink. lrwxrwxrwx 1 root root 65 Aug 12 15:22 lib.py -> /root/hadoop3/data/nodemanager/usercache/root/filecache/13/lib.py lrwxrwxrwx 1 root root 66 Aug 12 15:23 main.py -> /root/hadoop3/data/nodemanag

Re: metics v1 in hadoop-2.0.5

2013-08-05 Thread Binglin Chang
metrics v1 is deprecated, both in 1.2 and 2.x. the existence of hadoop-metrics.properties is confusing, I think it should be removed. On Mon, Aug 5, 2013 at 6:26 PM, lei liu wrote: > There is hadoop-metrics.properties file in etc/hadoop directory. > I config the file with below content: > df

Re: 【data migrate from hdfs0.20.* to hdfs-2.0.5(HA)】

2013-07-26 Thread Binglin Chang
have you looked at distcp over hftp? http://hadoop.apache.org/docs/r1.0.4/distcp.html#cpver On Fri, Jul 26, 2013 at 2:28 PM, Bing Jiang wrote: > > hi,all > > Have you tried to find out a way to make data transformation between two > hdfs cluster, which are the different version. > > In our envir

Re: problem with hadoop-snappy

2012-11-06 Thread Binglin Chang
library not loaded > > So, it was my understanding that snappy is not included with > hadoop-1.0.4. After removing the libs I installed, I see again the same > warning. > > Any ideas how to fix this issue? > > Thanks. > Alex. > > > > > -Original Message-

Re: problem with hadoop-snappy

2012-11-05 Thread Binglin Chang
I think hadoop-1.0.4 already have snappy included, you should not using other third party libraries. On Tue, Nov 6, 2012 at 9:10 AM, wrote: > Hello, > > I use hadoop-1.0.4 I have followed instruction to install hadoop-snappy > at > http://code.google.com/p/hadoop-snappy/ > > When I run a map

Re: Tools for extracting data from hadoop logs

2012-10-29 Thread Binglin Chang
Hi, I think you want to analyze hadoop job logs in jobtracker history folder? These logs are in a centralized folder and don't need tools like flume or scribe to gather them. I used to write a simple python script to parse those log files, and generate csv/json reports, basically you can use it to