Re: CUDA on Hadoop

2011-02-10 Thread Lance Norskog
If you want to use Python, one of the Py+CUDA projects generates CUDA C from the Python byte-codes. You don't have to write any C. I don't remember which project it is. This lets you debug the CUDA code in isolation, then run it from the Hadoop streaming mode. On 2/9/11, Adarsh Sharma wrote: >

Is there any smart ways to give arguments to mappers & reducers from a main job?

2011-02-10 Thread Jun Young Kim
Hi, all in my job, I wanna pass some arguments to mappers and reducers from a main job. I googled some references to do that by using Configuration. but, it's not working. code) job) Configuration conf = new Configuration(); conf.set("test", "value"); mapper) doMap() extends Mapper... { S

Re: why is it invalid to have non-alphabet characters as a result of MultipleOutputs?

2011-02-10 Thread Jun Young Kim
OK. thanks for your replies. I decided to use '00' as a delimiter. :( Junyoung Kim (juneng...@gmail.com) On 02/09/2011 01:46 AM, David Rosenstrauch wrote: On 02/08/2011 05:01 AM, Jun Young Kim wrote: Hi, Multipleoutputs supports to have named outputs as a result of a hadoop. but, it has inc

Re: Is there any smart ways to give arguments to mappers & reducers from a main job?

2011-02-10 Thread Harsh J
Your 'Job' must reference this Configuration object for it to take those values. If it does not know about it, it would not work, logically :-) For example, create your Configuration and set things into it, and only then do new Job(ConfigurationObj) to make it use your configured object for this j

Re: Is there any smart ways to give arguments to mappers & reducers from a main job?

2011-02-10 Thread li ping
correct. Just like this: Configuration conf = new Configuration(); conf.setStrings("test", "test"); Job job = new Job(conf, "job name"); On Thu, Feb 10, 2011 at 6:42 PM, Harsh J wrote: > Your 'Job' must reference this Configuration object for it to take > those values. If it does not know about

Re: CUDA on Hadoop

2011-02-10 Thread Steve Loughran
On 09/02/11 17:31, He Chen wrote: Hi sharma I shared our slides about CUDA performance on Hadoop clusters. Feel free to modified it, please mention the copyright! This is nice. If you stick it up online you should link to it from the Hadoop wiki pages -maybe start a hadoop+cuda page and refer

Re: CUDA on Hadoop

2011-02-10 Thread Adarsh Sharma
Steve Loughran wrote: On 09/02/11 17:31, He Chen wrote: Hi sharma I shared our slides about CUDA performance on Hadoop clusters. Feel free to modified it, please mention the copyright! This is nice. If you stick it up online you should link to it from the Hadoop wiki pages -maybe start a h

some doubts Hadoop MR

2011-02-10 Thread Matthew John
Hi all, I had some doubts regarding the functioning of Hadoop MapReduce : 1) I understand that every MapReduce job is parameterized using an XML file (with all the job configurations). So whenever I set certain parameters using my MR code (say I set splitsize to be 32kb) it does get reflected

Re: some doubts Hadoop MR

2011-02-10 Thread Harsh J
Hello, On Thu, Feb 10, 2011 at 5:16 PM, Matthew John wrote: > Hi all, > > I had some doubts regarding the functioning of Hadoop MapReduce : > > 1) I understand that every MapReduce job is parameterized using an XML file > (with all the job configurations). So whenever I set certain parameters > u

Re: Hadoop Multi user - Cluster Setup

2011-02-10 Thread Piyush Joshi
Hey Amit, please try HOD or hadoop on demand tool. This will suffice to your need for creating multiple users on ur cluster. -Piyush On Thu, Feb 10, 2011 at 12:42 AM, Kumar, Amit H. wrote: > Dear All, > > I am trying to setup Hadoop for multiple users in a class, on our cluster. > For some reas

Re: Could not add a new data node without rebooting Hadoop system

2011-02-10 Thread 안의건
Dear Harsh, Your advice gave me insight, and I finally solved my problem. I'm not sure this is the correct way, but anyway it worked in my situation. I hope it would be helpful to someone else who has similar problem with me. hadoop/

hadoop 0.20 append - some clarifications

2011-02-10 Thread Gokulakannan M
Hi All, I have run the hadoop 0.20 append branch . Can someone please clarify the following behavior? A writer writing a file but he has not flushed the data and not closed the file. Could a parallel reader read this partial file? For example, 1. a writer is writing a 10MB file(block size 2 MB

Re: hadoop 0.20 append - some clarifications

2011-02-10 Thread Ted Dunning
Correct is a strong word here. There is actually an HDFS unit test that checks to see if partially written and unflushed data is visible. The basic rule of thumb is that you need to synchronize readers and writers outside of HDFS. There is no guarantee that data is visible or invisible after wri

Re: MRUnit and Herriot

2011-02-10 Thread Edson Ramiro
Hi, I took a look around on the Internet, but I didn't find any docs about MiniDFS and MiniMRCluster. Is there docs about them? It remember me this phrase I got from the Herriot [1] page. "As always your best source of information and knowledge about any software system is its source code" :) Do

Fwd: multiple namenode directories

2011-02-10 Thread mike anderson
-- Forwarded message -- From: mike anderson Date: Thu, Feb 10, 2011 at 11:57 AM Subject: multiple namenode directories To: core-u...@hadoop.apache.org This should be a straightforward question, but better safe than sorry. I wanted to add a second name node directory (on an NFS a

multiple namenode directories

2011-02-10 Thread mike anderson
This should be a straightforward question, but better safe than sorry. I wanted to add a second name node directory (on an NFS as a backup), so now my hdfs-site.xml contains: dfs.name.dir /mnt/hadoop/name dfs.name.dir /public/hadoop/name When I go to start DFS i'm ge

Re: CUDA on Hadoop

2011-02-10 Thread He Chen
Thank you Steve Loughran. I just created a new page on Hadoop wiki, however, how can I create a new document page on Hadoop Wiki? Best wishes Chen On Thu, Feb 10, 2011 at 5:38 AM, Steve Loughran wrote: > On 09/02/11 17:31, He Chen wrote: > >> Hi sharma >> >> I shared our slides about CUDA perf

Re: multiple namenode directories

2011-02-10 Thread Harsh J
DO NOT format your NameNode. Formatting a NameNode is equivalent to formatting a FS -- you're bound lose it all. And while messing with NameNode, after bringing it down safely, ALWAYS take a backup of the existing dfs.name.dir contents and preferably the SNN checkpoint directory contents too (if y

Map reduce streaming unable to partition

2011-02-10 Thread Kelly Burkhart
Hi, I'm trying to get partitioning working from a streaming map/reduce job. I'm using hadoop r0.20.2. Consider the following files, both in the same hdfs directory: f1: 01:01:01a,a,a,a,a,1 01:01:02a,a,a,a,a,2 01:02:01a,a,a,a,a,3 01:02:02a,a,a,a,a,4 02:01:01a,a,a,a,a,5 02:01:02a,a,a,a,a,6 02:02:

RE: Hadoop Multi user - Cluster Setup

2011-02-10 Thread Kumar, Amit H.
Li Ping: Disabling dfs.permissions did the charm!. I have the following questions, if you can help me understand this better: 1. Not sure what are the consequences of disabling it or even doing chmod o+w on the entire filesyste(/). 2. Is there any need to have the permissions in place, other t

Re: multiple namenode directories

2011-02-10 Thread mike anderson
Whew, glad I asked. It might be useful for someone to update the wiki: http://wiki.apache.org/hadoop/FAQ#How_do_I_set_up_a_hadoop_node_to_use_multiple_volumes.3F -Mike On Thu, Feb 10, 2011 at 12:43 PM, Harsh J wrote: > DO NOT format your NameNode. Formatting a NameNode is equivalent to > forma

Re: multiple namenode directories

2011-02-10 Thread Harsh J
The links appeared outdated, I've updated those to reflect the current release 0.21's configurations. The configuration descriptions describe properly, the way to set them 'right'. For 0.20 releases, only the configuration name changes: dfs.name.dir instead of dfs.namenode.name.dir, and dfs.data.d

Re: Hadoop Multi user - Cluster Setup

2011-02-10 Thread Harsh J
Please read the HDFS Permissions guide which explains the understanding required to have a working permissions model on the DFS: http://hadoop.apache.org/hdfs/docs/current/hdfs_permissions_guide.html On Thu, Feb 10, 2011 at 11:15 PM, Kumar, Amit H. wrote: > Li Ping: Disabling dfs.permissions did

recommendation on HDDs

2011-02-10 Thread Shrinivas Joshi
What would be a good hard drive for a 7 node cluster which is targeted to run a mix of IO and CPU intensive Hadoop workloads? We are looking for around 1 TB of storage on each node distributed amongst 4 or 5 disks. So either 250GB * 4 disks or 160GB * 5 disks. Also it should be less than 100$ each

Re: recommendation on HDDs

2011-02-10 Thread Ted Dunning
Get bigger disks. Data only grows and having extra is always good. You can get 2TB drives for <$100 and 1TB for < $75. As far as transfer rates are concerned, any 3GB/s SATA drive is going to be about the same (ish). Seek times will vary a bit with rotation speed, but with Hadoop, you will be d

Re: Map reduce streaming unable to partition

2011-02-10 Thread Kelly Burkhart
OK, I think I sumbled upon the correct incantation: time hadoop jar /opt/hadoop-0.20.2/contrib/streaming/hadoop-0.20.2-streaming.jar \ -D map.output.key.field.separator=: \ -D mapred.text.key.partitioner.options=-k1,1 \ -D mapred.reduce.tasks=16 \ -input /tmp/krb/part \ -output /tmp/krb/

Re: recommendation on HDDs

2011-02-10 Thread Chris Collins
Of late we have had serious issues with seagate drives in our hadoop cluster. These were purchased over several purchasing cycles and pretty sure it wasnt just a single "bad batch". Because of this we switched to buying 2TB hitachi drives which seem to of been considerably more reliable. Bes

Hadoop on physical machine Vs Cloud

2011-02-10 Thread praveen.peddi
Hello all, I have been using Hadoop on physical machine for sometime now. But recently I tried to run the same hadoop jobs on the Raskspace cloud and I am not yet successful. My input file has 150M transactions and all hadoop jobs finish in less than 90 minutes on a 4 node 4GB hadoop cluster on

Re: recommendation on HDDs

2011-02-10 Thread Shrinivas Joshi
Hi Ted, Chris, Much appreciate your quick reply. The reason why we are looking for smaller capacity drives is because we are not anticipating a huge growth in data footprint and also read somewhere that larger the capacity of the drive, bigger the number of platters in them and that could affect d

Re: recommendation on HDDs

2011-02-10 Thread Ted Dunning
We see well over 100MB/s off of commodity 2TB drives. On Thu, Feb 10, 2011 at 1:47 PM, Shrinivas Joshi wrote: > But looks like you can get 1TB drives with only 2 platters. > Large capacity drives should be OK for us as long as they perform equally > well. >

Re: MRUnit and Herriot

2011-02-10 Thread Konstantin Boudnik
On Thu, Feb 10, 2011 at 08:39, Edson Ramiro wrote: > Hi, > > I took a look around on the Internet, but I didn't find any docs about > MiniDFS > and MiniMRCluster. Is there docs about them? > > It remember me this phrase I got from the Herriot [1] page. > "As always your best source of information

RE: recommendation on HDDs

2011-02-10 Thread Michael Segel
Shrinivas, Assuming you're in the US, I'd recommend the following: Go with 2TB 7200 SATA hard drives. (Not sure what type of hardware you have) What we've found is that in the data nodes, there's an optimal configuration that balances price versus performance. While your chasis may hold 8 dr

Re: hadoop 0.20 append - some clarifications

2011-02-10 Thread Konstantin Boudnik
You might also want to check append design doc published at HDFS-265 --   Take care, Konstantin (Cos) Boudnik On Thu, Feb 10, 2011 at 07:11, Gokulakannan M wrote: > Hi All, > > I have run the hadoop 0.20 append branch . Can someone please clarify the > following behavior? > > A writer writing

Re: some doubts Hadoop MR

2011-02-10 Thread Greg Roelofs
>> 2) Assume I am running cascading (chained) MR modules. In this case I feel >> there is a huge overhead when output of MR1 is written back to HDFS and then >> read from there as input of MR2.Can this be avoided ? (maybe store it in >> some memory without hitting the HDFS and NameNode ) Please let

File name which includes defined keyword

2011-02-10 Thread 안의건
File name which includes defined keyword Dear all I have an error when I copy in Hadoop fs commands. e.g. hadoop/bin> hadoop fs -copyFromLocal abcdef:abcdef.exm /test I can't copy a which includes ':' to the . Does anybody know what could I do? Regards, Henny ahn(ahneui...@gmail.com)

Re: File name which includes defined keyword

2011-02-10 Thread Harsh J
There appears to be a bug filed about this, check it's JIRA out here: https://issues.apache.org/jira/browse/HDFS-13 On Fri, Feb 11, 2011 at 6:09 AM, 안의건 wrote: > File name which includes defined keyword > > Dear all > > I have an error when I copy in Hadoop fs commands. > > e.g. > > hadoop/bin>

How do I insert a new node while running a MapReduce hadoop?

2011-02-10 Thread Sandro Simas
Hi, i started using hadoop now and I'm doing some tests on a cluster of three machines. I wanted to insert a new node after the MapReduce started, is this possible? How do I?

Re: How do I insert a new node while running a MapReduce hadoop?

2011-02-10 Thread li ping
of course you can. What is the node type, datanode?job tracker?task tracker? Let's say you are trying to add a datanode. You can modify the xml file let the datanode pointed to the NameNode, JobTracker, TaskTracker. fs.default.name hdfs://:9000/ mapred.job.tracker

RE: hadoop 0.20 append - some clarifications

2011-02-10 Thread Gokulakannan M
Thanks Ted for clarifying. So the sync is to just flush the current buffers to datanode and persist the block info in namenode once per block, isn't it? Regarding reader able to see the unflushed data, I faced an issue in the following scneario: 1. a writer is writing a 10MB file(block size 2

Re: hadoop 0.20 append - some clarifications

2011-02-10 Thread Ted Dunning
It is a bit confusing. SequenceFile.Writer#sync isn't really sync. There is SequenceFile.Writer#syncFs which is more what you might expect to be sync. Then there is HADOOP-6313 which specifies hflush and hsync. Generally, if you want portable code, you have to reflect a bit to figure out what c