jobtracker is stopping because of permissions

2013-04-19 Thread Mohit Vadhera
Can anybody help me to start jobtracker service. it is an urgent . it looks permission issue . What permission to give on which directory. I am pasting log for the same. Service start and stops 2013-04-19 02:21:06,388 FATAL org.apache.hadoop.mapred.JobTracker:

Uploading file to HDFS

2013-04-19 Thread 超级塞亚人
I have a problem. Our cluster has 32 nodes. Each disk is 1TB. I wanna upload 2TB file to HDFS.How can I put the file to the namenode and upload to HDFS?

Re: Best way to collect Hadoop logs across cluster

2013-04-19 Thread Roman Shaposhnik
On Thu, Apr 18, 2013 at 9:23 PM, Mark Kerzner mark.kerz...@shmsoft.com wrote: Hi, my clusters are on EC2, and they disappear after the cluster's instances are destroyed. What is the best practice to collect the logs for later storage? EC2 does exactly that with their EMR, how do they do it?

log

2013-04-19 Thread Mohit Vadhera
Can anybody let me know the meaning of the below log plz Target Replicas is 10 but found 1 replica(s). ? /var/lib/hadoop-hdfs/cache/mapred/mapred/staging/test_user/.staging/job_201302180313_0623/job.split: Under replicated BP-2091347308-172.20.3.119-1356632249303:blk_6297333561560198850_70720.

Re: Uploading file to HDFS

2013-04-19 Thread Harsh J
Can you not simply do a fs -put from the location where the 2 TB file currently resides? HDFS should be able to consume it just fine, as the client chunks them into fixed size blocks. On Fri, Apr 19, 2013 at 10:05 AM, 超级塞亚人 shel...@gmail.com wrote: I have a problem. Our cluster has 32 nodes.

RE: Uploading file to HDFS

2013-04-19 Thread David Parks
I think the problem here is that he doesn't have Hadoop installed on this other location so there's no Hadoop DFS client to do the put directly into HDFS on, he would normally copy the file to one of the nodes in the cluster where the client files are installed. I've had the same problem recently.

Re: log

2013-04-19 Thread Mohit Vadhera
its one (1). Output is below. ...Status: HEALTHY Total size:903709673179 B Total dirs:2906 Total files: 0 Total blocks (validated): 20906 (avg. block size 43227287 B) Minimally replicated blocks: 20906 (100.0 %) Over-replicated blocks:0 (0.0 %)

Re: How to process only input files containing 100% valid rows

2013-04-19 Thread Niels Basjes
How about a different approach: If you use the multiple output option you can process the valid lines in a normal way and put the invalid lines in a special separate output file. On Apr 18, 2013 9:36 PM, Matthias Scherer matthias.sche...@1und1.de wrote: Hi all, ** ** In my mapreduce job,

RE: Uploading file to HDFS

2013-04-19 Thread David Parks
I just realized another trick you might trying. The Hadoop dfs client can read input from STDIN, you could use netcat to pipe the stuff across to HDFS without hitting the hard drive, I haven’t tried it, but here’s what I would think might work: On the Hadoop box, open a listening port and feed

when to hadoop-2.0 stable release

2013-04-19 Thread Azuryy Yu
I don't think this is easy to answer. maybe it's not decided. if so, can you tell me what important features are still deveoping or any other reasons? Appreciate.

Any ideas about Performance optimization inside Hadoop framework based on the NFS-like Shared FileSystem

2013-04-19 Thread Ling Kun
Dear all, I am writing to ask for 1. Any ideas about Performance optimization inside Hadoop framework based on the NFS-like Shared FileSystem 2. And this mail is also helpful to discuss about whether HDFS should support POSIX or NFS-like interface. Hadoop MapReduce Framework both

Re: log

2013-04-19 Thread Bejoy Ks
This basically happens while running a mapreduce job. When a map reduce job is triggered the job files are put in hdfs with high replication ( replication is controlled by - 'mapred.submit.replication' default value is 10). The job files are cleaned up after the job is completed and hence that

Re: What is the difference between URI, Home Directory, Working Directory in FileSystem.java or HDFS

2013-04-19 Thread Ling Kun
Dear Daryn Sharp, Your reply helps me a lot for code reading of the HDFS and FileSystem interface. Thanks. yours, Ling Kun On Thu, Apr 11, 2013 at 10:53 PM, Daryn Sharp da...@yahoo-inc.com wrote: On Apr 11, 2013, at 5:33 AM, Ling Kun wrote: Dear all, I am a little confusing

AW: How to process only input files containing 100% valid rows

2013-04-19 Thread Matthias Scherer
I have to add that we have 1-2 Billion of Events per day, split to some thousands of files. So pre-reading each file in the InputFormat should be avoided. And yes, we could use MultipleOutputs and write bad files to process each input file. But we (our Operations team) think that there is more

Re: Uploading file to HDFS

2013-04-19 Thread Wellington Chevreuil
Can't you use flume for that? 2013/4/19 David Parks davidpark...@yahoo.com I just realized another trick you might trying. The Hadoop dfs client can read input from STDIN, you could use netcat to pipe the stuff across to HDFS without hitting the hard drive, I haven’t tried it, but here’s

Re: AW: How to process only input files containing 100% valid rows

2013-04-19 Thread Nitin Pawar
Reject the entire file even if a single record is invalid? There has to be a eeal serious reason to take this approach If not in any case to check the file has all valid lines you are opening the files and parsing them. Why not then parse + separate incorrect lines as suggested in previous mails

Re: How to process only input files containing 100% valid rows

2013-04-19 Thread Wellington Chevreuil
How about use a combiner to mark as dirty all rows from a dirty file, for instance, putting dirty flag as part of the key, then in the reducer you can simply ignore this rows and/or output the bad file name. It still will have to pass through the whole file, but at least avoids the case where you

Re: Cartesian product in hadoop

2013-04-19 Thread zheyi rong
Hi Ajay Srivastava, thank you for your explanation. Regards, Zheyi Rong On Thu, Apr 18, 2013 at 5:18 PM, Ajay Srivastava ajay.srivast...@guavus.com wrote: The approach which I proposed will have m+n i/o for reading datasets not the (m + n + m*n) and but further i/o due to spills and

Re: Cartesian product in hadoop

2013-04-19 Thread zheyi rong
Hi Ted Dunning, could you please tell me some keywords so that I can google it myself? Regards, Zheyi Rong On Thu, Apr 18, 2013 at 8:52 PM, Ted Dunning tdunn...@maprtech.com wrote: It is rarely practical to do exhaustive comparisons on datasets of this size. The method used is to

Re: AW: How to process only input files containing 100% valid rows

2013-04-19 Thread MARCOS MEDRADO RUBINELLI
Matthias, As far as I know, there are no guarantees on when counters will be updated during the job. One thing you can do is to write a metadata file along with your parsed events listing what files have errors and should be ignored in the next step of your ETL workflow. If you really don't

Re: Configuration clone constructor not cloning classloader

2013-04-19 Thread Tom White
Hi Amit, It is a bug, fixed by https://issues.apache.org/jira/browse/HADOOP-6103, although the fix never made it into branch-1. Can you create a branch-1 patch for this please? Thanks, Tom On Thu, Apr 18, 2013 at 4:09 AM, Amit Sela am...@infolinks.com wrote: Hi all, I was wondering if there

Re: Re: setting hdfs balancer bandwidth doesn't work

2013-04-19 Thread zhoushuaifeng
Yes, I restart my cluster. It may be the OS problem. My cluster has 5 nodes, 4 are redhat, and the last added one is suse11. The bandwidth setting works on the redhat nodes, but not on the suse one. My be it doesn't work well when the cluster are composed by different systems. Is it a bug?

Map‘s number with NLineInputFormat

2013-04-19 Thread YouPeng Yang
Hi All I take NLineInputFormat as the Text Input Format with the following code : NLineInputFormat.setNumLinesPerSplit(job, 10); NLineInputFormat.addInputPath(job,new Path(args[0].toString())); My input file contains 1000 rows,so I thought it will distribute 100(1000/10) maps.However I got

Mapreduce

2013-04-19 Thread Adrian Acosta Mitjans
Hello: I'm working in a proyect, and i'm using hbase for storage the data, y have this method that work great but without the performance i'm looking for, so i want is to make the same but using mapreduce. public ArrayListMyObject findZ(String z) throws IOException {

Inconsistent performance numbers with increased nodes

2013-04-19 Thread Alex O'Ree
Hi I'm running a 10 data node cluster and was experimenting with adding additional nodes to it. I've done some performance bench marking with 10 nodes and have compared them to 12 nodes and I've found some rather interesting and inconsistent results. The behavior I'm seeing is that during some of

Collision with Hadoop (1.0.4) libs?

2013-04-19 Thread Vjeran Marcinko
Hi, I created fat jar to run my M/R driver application, and this fat jar contains beside other libs: slf4j-api-1.7.5.jar slf4j-simple-1.7.5.jar . and to delegate all commons-logging calls to slf4j. jcl-over-slf4j-1.7.5.jar Unfortunatelly, when I start my application using jar

error while running TestDFSIO

2013-04-19 Thread kaveh minooie
Hi everyone I am getting this error when i run TestDFSIO. the job actually finishes successfully. ( according to jobtracker at least ) but this is what i get on the console : crawler@d1r2n2:/hadoop$ bin/hadoop jar hadoop-test-1.1.1.jar TestDFSIO -write -nrFiles 10 -fileSize 1000

Re: Map‘s number with NLineInputFormat

2013-04-19 Thread 姚吉龙
The num of map is decided by the block size and your rawdata  — Sent from Mailbox for iPhone On Sat, Apr 20, 2013 at 12:30 AM, YouPeng Yang yypvsxf19870...@gmail.com wrote: Hi All I take NLineInputFormat as the Text Input Format with the following code :

Re: error while running TestDFSIO

2013-04-19 Thread Ling Kun
I have got the same problem. And also need help. Ling Kun On Sat, Apr 20, 2013 at 8:35 AM, kaveh minooie ka...@plutoz.com wrote: Hi everyone I am getting this error when i run TestDFSIO. the job actually finishes successfully. ( according to jobtracker at least ) but this is what i get on

Re: Map‘s number with NLineInputFormat

2013-04-19 Thread yypvsxf19870706
Hi I thought it would be different when adopt the NLineInputFormat So here is my conclusion the maps distribution has nothing with the NLineInputFormat . The NLineInputFormat could decide the number of row to each map, which map has been generated according to the split.size . An I

Re: Reading and Writing Sequencefile using Hadoop 2.0 Apis

2013-04-19 Thread sumit ghosh
Hi,   Looks like it still points to the old API. The following worked for me - http://stackoverflow.com/questions/16070587/reading-and-writing-sequencefile-using-hadoop-2-0-apis       String uri = args[0];     Configuration conf = new Configuration();     Path path = new Path( uri);