Re: Data Locality Importance

2014-03-22 Thread Chen He
Hi Mike Data locality has an assumption. It assumes storage access (disk, ssd, etc) is faster than network data transferring. Vinod has already explained the benefits. But locality in map stage may not always bring good things. If a fat node saves a large file, it is possible that current MR frame

Re: Hadoop and Cuda , JCuda (CPU+GPU architecture)

2012-09-25 Thread Chen He
oceed. It is our UG project > Thanking you > Dr G sudha Sadasivam > > --- On *Mon, 9/24/12, Chen He * wrote: > > > From: Chen He > Subject: Re: Hadoop and Cuda , JCuda (CPU+GPU architecture) > To: common-user@hadoop.apache.org > Date: Monday, September 24, 2012, 9:03 PM >

Re: Hadoop and Cuda , JCuda (CPU+GPU architecture)

2012-09-24 Thread Chen He
d NVIDIA) . > I mean using CPU only arthitecture I have 8-12 core per one computer(for > example). > What should I do in orger to use CPU+GPU arthitecture? What kind of NVIDIA > do I need for this. > > By the way I didn't fine code Jcuda example with Hadoop. :-) > > Thanks in ad

Re: Hadoop and Cuda , JCuda (CPU+GPU architecture)

2012-09-24 Thread Chen He
his link !!! . Do you have any code , example > shared in the network (github for example). > > On Mon, Sep 24, 2012 at 5:33 PM, Chen He wrote: > > > http://wiki.apache.org/hadoop/CUDA%20On%20Hadoop > > > > On Mon, Sep 24, 2012 at 10:30 AM, Oleg Ruchovets > >wro

Re: Hadoop and Cuda , JCuda (CPU+GPU architecture)

2012-09-24 Thread Chen He
http://wiki.apache.org/hadoop/CUDA%20On%20Hadoop On Mon, Sep 24, 2012 at 10:30 AM, Oleg Ruchovets wrote: > Hi > > I am going to process video analytics using hadoop > I am very interested about CPU+GPU architercute espessially using CUDA ( > http://www.nvidia.com/object/cuda_home_new.html) and JC

Re: migrate cluster to different datacenter

2012-08-03 Thread Chen He
sometimes, physically moving hard drives helps. :) On Aug 3, 2012 1:50 PM, "Patai Sangbutsarakum" wrote: > Hi Hadoopers, > > We have a plan to migrate Hadoop cluster to a different datacenter > where we can triple the size of the cluster. > Currently, our 0.20.2 cluster have around 1PB of data.

Re: Re: HDFS block physical location

2012-07-25 Thread Chen He
; but that just gives me the hostnames or am I overlooking something? > I actually need the filename/harddisk on the node. > > JS > > Gesendet: Mittwoch, 25. Juli 2012 um 23:33 Uhr > Von: "Chen He" > An: common-user@hadoop.apache.org > Betreff: Re: HDFS block physic

Re: HDFS block physical location

2012-07-25 Thread Chen He
>nohup hadoop fsck / -files -blocks -locations >cat nohup.out | grep [your block name] Hope this helps. On Wed, Jul 25, 2012 at 5:17 PM, <20seco...@web.de> wrote: > Hi, > > just a short question. Is there any way to figure out the physical storage > location of a given block? > I don't mean just

Re: AUTO: Yuan Jin is out of the office. (returning 07/25/2012)

2012-07-23 Thread Chen He
this type of auto email to our technical mail-list and say at least "Execuse me" to all people in this mail-list. On Mon, Jul 23, 2012 at 9:31 PM, Chen He wrote: > Looks like that guy is your boss, Jason. It was you to let people forgive > him last time. Tell him, remove the grou

Re: AUTO: Yuan Jin is out of the office. (returning 07/25/2012)

2012-07-23 Thread Chen He
: > Guys, just be nice > > On Tue, Jul 24, 2012 at 5:59 AM, Chen He wrote: > > > Just kick this junk mail guy out of the group. > > > > On Mon, Jul 23, 2012 at 5:22 PM, Jean-Daniel Cryans > >wrote: > > > > > Fifth offense. > > > > &

Re: AUTO: Yuan Jin is out of the office. (returning 07/25/2012)

2012-07-23 Thread Chen He
Just kick this junk mail guy out of the group. On Mon, Jul 23, 2012 at 5:22 PM, Jean-Daniel Cryans wrote: > Fifth offense. > > Yuan Jin is out of the office. - I will be out of the office starting > 06/22/2012 and will not return until 06/25/2012. I am out of > Jun 21 > > Yuan Jin is out

Re: what does "keep 10% map, 40% reduce" mean in gridmix2's README?

2012-06-14 Thread Chen He
er than input size, > > it will not be the case, because the input data is compressed, the size of > the generated data will expand to be very large > > it's just my guessing, can anyone correct me? > > Best, > > Nan > > > On Thu, Jun 14, 2012 at 11:5

Re: what does "keep 10% map, 40% reduce" mean in gridmix2's README?

2012-06-14 Thread Chen He
Hi Nan probably the map stage will output 10% of the total input, and the reduce stage will output 40% of intermediate results (10% of total input). For example, 500GB input, after the map stage, it will be 50GB and it will become 20GB after the reduce stage. It may be similar to the loadgen in

Re: Feedback on real world production experience with Flume

2012-04-21 Thread Chen He
Can the NFS become the bottleneck ? Chen On Sat, Apr 21, 2012 at 5:23 PM, Edward Capriolo wrote: > It seems pretty relevant. If you can directly log via NFS that is a > viable alternative. > > On Sat, Apr 21, 2012 at 11:42 AM, alo alt > wrote: > > We decided NO product and vendor advertising on

Re: Yuan Jin is out of the office.

2012-04-12 Thread Chen He
This is the second time. Pure junk email. Could you avoid sending email to public mail-list, Ms/Mr. Yuan Jin? On Thu, Apr 12, 2012 at 6:22 PM, Chen He wrote: > who cares? > > > On Thu, Apr 12, 2012 at 6:09 PM, Yuan Jin wrote: > >> >> I will be out of the office st

Re: Yuan Jin is out of the office.

2012-04-12 Thread Chen He
who cares? On Thu, Apr 12, 2012 at 6:09 PM, Yuan Jin wrote: > > I will be out of the office starting 04/13/2012 and will not return until > 04/16/2012. > > I am out of office, and will reply you when I am back. > > For HAMSTER related things, you can contact Jason(Deng Peng Zhou/China/IBM) > or

Re: start hadoop slave over WAN

2012-03-30 Thread Chen He
login your remote datanode and start the datanode manually to see what happen. start HDFS based on the WAN is not as easy as on a cluster. There are many issues. datanode log should be the best way to shoot troubles. Chen On Fri, Mar 30, 2012 at 12:52 PM, Michael Segel wrote: > Probably a timeo

Re: mapred.map.tasks vs mapred.tasktracker.map.tasks.maximum

2012-03-09 Thread Chen He
> On Fri, Mar 9, 2012 at 5:00 PM, Chen He wrote: > > > Hi Mohit > > > > " mapred.tasktracker.reduce(map).tasks.maximum " means how many > reduce(map) > > slot(s) you can have on each tasktracker. > > > > "mapred.job.reduce(maps)"

Re: mapred.tasktracker.map.tasks.maximum not working

2012-03-09 Thread Chen He
you set the " mapred.tasktracker.map.tasks.maximum " in your job means nothing. Because Hadoop mapreduce platform only checks this parameter when it starts. This is a system configuration. You need to set it in your conf/mapred-site.xml file and restart your hadoop mapreduce. On Fri, Mar 9, 20

Re: mapred.map.tasks vs mapred.tasktracker.map.tasks.maximum

2012-03-09 Thread Chen He
Hi Mohit " mapred.tasktracker.reduce(map).tasks.maximum " means how many reduce(map) slot(s) you can have on each tasktracker. "mapred.job.reduce(maps)" means default number of reduce (map) tasks your job will has. To set the number of mappers in your application. You can write like this: *conf

Re: Incompatible namespaceIDs after formatting namenode

2012-01-15 Thread Chen He
For short, here is a script that may be useful for your to remove hdfs directory on DNs from your headnode. for each DN hostname do ssh root@[DN hostname] "rm [your hdfs directory]/dfs/data/current/VERSION"; done On Sun, Jan 15, 2012 at 7:22 AM, Uma Maheswara Rao G wrote: > Since you al

Re: dual power for hadoop in datacenter?

2012-01-09 Thread Chen He
If you configure your replica number as 3 or more. I would suggest you keep half of your nodes with dual power on each rack, especially node with larger or more disks. As well as your namenode, resource manager, secondaryNamenode, and all other master nodes. On Mon, Jan 9, 2012 at 10:50 AM, Robert