Read and Write throughputs via JVM

2011-04-13 Thread Matthew John
Hi all, I wanted to figure out the Read and Write throughputs that happens in a Map task (Read - reading from the input splits, Write - writing the map output back) inside a JVM. Do we have any counters that can help me with this? Or where exactly should I focus on tweaking the code to add some

cluster does not stop

2011-04-13 Thread Sumeet M Nikam
Hi All, We had set up 5 node hadoop[0.20.2] hbase[0.90.1] cluster, the cluster was idle for many days [a week] today we were unable to stop it, it seems pid files are deleted, now my question is i had set temp folder as different still how come pid files might have deleted? . Since i have 5

Re: cluster does not stop

2011-04-13 Thread Marcos M Rubinelli
Sumeet, To create your pids in another directory, you can set HADOOP_PID_DIR in your bin/hadoop-env.sh. There's an open issue about it: https://issues.apache.org/jira/browse/HADOOP-6606 Regards, Marcos Em 13-04-2011 04:19, Sumeet M Nikam escreveu: Hi All, We had set up 5 node

Re: cluster does not stop

2011-04-13 Thread Sumeet M Nikam
Hi Marcos, Thanks, will do this. but I am still not clear if hadoop.tmp.dir has different directory than the default /tmp then how can pid files get deleted, are there any other threads/process running which are deleting inactive pid files? Regards, Sumeet

Re: namenode format error

2011-04-13 Thread Harsh J
Hello, On Wed, Apr 13, 2011 at 5:25 AM, Jeffrey Wang jw...@palantir.com wrote: Hey all, I'm trying to format my NameNode (I've done it successfully in the past), but I'm getting a strange error: 11/04/12 16:47:32 INFO common.Storage: java.io.IOException: Input/output error        at

Re: Memory mapped resources

2011-04-13 Thread Benson Margulies
Guys, I'm not the one who said 'HDFS' unless I had a brain bubble in my original message. I asked for a distribution mechanism for code+mappable data. I appreciate the arrival of some suggestions. Ted is correct that I know quite a bit about mmap; I had a lot to do with the code in ObjectStore

hadoop cluster installation problems

2011-04-13 Thread bikash sharma
Hi, I need to install hadoop on 16-node cluster. I have a couple of related questions: 1. I have installed hadoop on a shared directory, i.e., there is just one place where the whole hadoop installation files exist and all the 16 nodes use the same installation. Is that an issue or I need to

Re: sorting reducer input numerically in hadoop streaming

2011-04-13 Thread Dieter Plaetinck
Thank you Harsh, that works fine! (looks like the page I was looking at was the same, but for an older version of hadoop) Dieter On Fri, 1 Apr 2011 13:07:38 +0530 Harsh J qwertyman...@gmail.com wrote: You will need to supply your own Key-comparator Java class by setting an appropriate

Re: hadoop cluster installation problems

2011-04-13 Thread bikash sharma
p.s. Also, while starting dfs using bin/start-dfs.sh, I get the following error: 2011-04-13 09:42:31,729 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: STARTUP_MSG: / STARTUP_MSG: Starting NameNode STARTUP_MSG: host =

Re: Memory mapped resources

2011-04-13 Thread M. C. Srivas
Sorry, don't mean to say you don't know mmap or didn't do cool things in the past. But you will see why anyone would've interpreted this original post, given the title of the posting and the following wording, to mean can I mmap files that are in hdfs On Mon, Apr 11, 2011 at 3:57 PM, Benson

Re: Memory mapped resources

2011-04-13 Thread Benson Margulies
Point taken. On Wed, Apr 13, 2011 at 10:33 AM, M. C. Srivas mcsri...@gmail.com wrote: Sorry, don't mean to say you don't know mmap or didn't do cool things in the past. But you will see why anyone would've interpreted this original post, given the title of the posting and the following

Re: HOD exception: java.io.IOException: No valid local directories in property: mapred.local.dir

2011-04-13 Thread Boyu Zhang
Thanks a lot for the comments, but I set the mapred.local.dir to /tmp which is a dir on every local machies. Still I got the same error, and I use the same conf file, with 3 nodes (I have this problem when use 4 nodes), I don't have the problem. Any idea what problem it may be? Thanks a lot.

hadoop streaming and job conf settings

2011-04-13 Thread Shivani Rao
Hello, I am facing trouble using hadoop streaming in order to solve a simple nearest neighbor problem. Input data is in the following format key'\t'value key is the imageid for which nearest neighbor will be computed the value is 100 dimensional vector of floating point values separated by

Re: hadoop streaming and job conf settings

2011-04-13 Thread Mehmet Tepedelenlioglu
I am not sure what the problem is but your approach seems incorrect unless you always want to use 1 mapper. You need to make your queries available to all mappers (cache them-although I am not sure how to do that with streaming). Then you definitely want to use a combiner to reduce over each

RE: namenode format error

2011-04-13 Thread Jeffrey Wang
Hi, It's just in my home directory, which is an NFS mount. I moved it off NFS and it seems to work fine. Is there some reason it doesn't work with NFS? -Jeffrey -Original Message- From: Harsh J [mailto:ha...@cloudera.com] Sent: Wednesday, April 13, 2011 3:48 AM To:

hadoop streaming and job conf settings, error in textoutputreader

2011-04-13 Thread Shivani Rao
Hello, I am facing trouble using hadoop streaming in order to solve a simple nearest neighbor problem. Input data is in the following format key'\t'value key is the imageid for which nearest neighbor will be computed the value is 100 dimensional vector of floating point values separated by

Re: namenode format error

2011-04-13 Thread Raghu Angadi
Your NFS mount is not letting NameNode lock a file. On Wed, Apr 13, 2011 at 12:38 PM, Jeffrey Wang jw...@palantir.com wrote: Hi, It's just in my home directory, which is an NFS mount. I moved it off NFS and it seems to work fine. Is there some reason it doesn't work with NFS? -Jeffrey

How to change logging level for an individual job

2011-04-13 Thread David Rosenstrauch
Is it possible to change the logging level for an individual job? (As opposed to the cluster as a whole.) E.g., is there some key that I can set on the job's configuration object that would allow me to bump up the logging from info to debug just for that particular job? Thanks, DR

Re: namenode format error

2011-04-13 Thread Allen Wittenauer
On Apr 13, 2011, at 12:38 PM, Jeffrey Wang wrote: It's just in my home directory, which is an NFS mount. I moved it off NFS and it seems to work fine. Is there some reason it doesn't work with NFS? Locking on NFS--regardless of application--is a dice roll, especially when client/server are

Hive regex SerDe issue?

2011-04-13 Thread hadoopman
Is there an issue with using the regex SerDe with loading into Hive text files above 2 gigs in size? I've been experiencing out of memory errors with a select group of logs when running a hive job. I have been able to load the data if I use split to cut it in half or thirds. No problem.

What is the best way to load data with control characters into HDFS?

2011-04-13 Thread macmarcin
I have a problem where my input data has various control characters. I thought that I could load this data (100+GB tab delimited) files and then run a perl streaming script to clean it up (wanted to take advantage of parallelization of hadoop framework). However, since some of the data has ^M

Re: Hive regex SerDe issue?

2011-04-13 Thread Lance Norskog
. Usually this is a positive-integer problem somewhere. If you use a 32-bit Java this would be a problem. On Wed, Apr 13, 2011 at 3:16 PM, hadoopman hadoop...@gmail.com wrote: Is there an issue with using the regex SerDe with loading into Hive text files above 2 gigs in size?  I've been

Re: Memory mapped resources

2011-04-13 Thread Lance Norskog
There are systems for file-system plumbing out to user processes, and FUSE does this on Linux, and there is a package for hadoop. However- pretending a remote resource is local holds a place of honor on the system design antipattern hall of fame. On Wed, Apr 13, 2011 at 7:35 AM, Benson Margulies

Re: How to change logging level for an individual job

2011-04-13 Thread Lance Norskog
If it's Java, and log4j, you can set that package tree to its own logging level. On Wed, Apr 13, 2011 at 1:52 PM, David Rosenstrauch dar...@darose.net wrote: Is it possible to change the logging level for an individual job?  (As opposed to the cluster as a whole.)  E.g., is there some key that

Re: Hive regex SerDe issue?

2011-04-13 Thread hadoopman
I appreciate the feedback. I'll check in the morning. We're running 64bit Ubuntu 10.04 LTS server and I 'believe' it should all be 64 bit but I'll verify that. Thanks !! On 04/13/2011 08:31 PM, Lance Norskog wrote: . Usually this is a positive-integer problem somewhere. If you use a 32-bit

Re: Dynamic Data Sets

2011-04-13 Thread Ted Dunning
Hbase is very good at this kind of thing. Depending on your aggregation needs OpenTSDB might be interesting since they store and query against large amounts of time ordered data similar to what you want to do. It isn't clear to whether your data is primarily about current state or about

Isssue with Job Scheduling

2011-04-13 Thread Nitin Khandelwal
Hi, I want ot use Capacity Scheduler for my Hadoop Jobs. I have currently three Queues defined and are configured and working properly. I am using Hadoop 0.20.2 And in the new library, we are not supposed to use JobConf. So, I need to set Queue name as a property in Configuration (

Re: hadoop streaming and job conf settings

2011-04-13 Thread Amareshwari Sri Ramadasu
Looks like you are hitting https://issues.apache.org/jira/browse/MAPREDUCE-1621. -Amareshwari On 4/13/11 11:39 PM, Shivani Rao raoshiv...@gmail.com wrote: Hello, I am facing trouble using hadoop streaming in order to solve a simple nearest neighbor problem. Input data is in the following