Can anybody help me to start jobtracker service. it is an urgent . it looks
permission issue .
What permission to give on which directory. I am pasting log for the same.
Service start and stops
2013-04-19 02:21:06,388 FATAL org.apache.hadoop.mapred.JobTracker:
I have a problem. Our cluster has 32 nodes. Each disk is 1TB. I wanna
upload 2TB file to HDFS.How can I put the file to the namenode and upload
to HDFS?
On Thu, Apr 18, 2013 at 9:23 PM, Mark Kerzner mark.kerz...@shmsoft.com wrote:
Hi,
my clusters are on EC2, and they disappear after the cluster's instances are
destroyed. What is the best practice to collect the logs for later storage?
EC2 does exactly that with their EMR, how do they do it?
Can anybody let me know the meaning of the below log plz Target Replicas
is 10 but found 1 replica(s). ?
/var/lib/hadoop-hdfs/cache/mapred/mapred/staging/test_user/.staging/job_201302180313_0623/job.split:
Under replicated
BP-2091347308-172.20.3.119-1356632249303:blk_6297333561560198850_70720.
Can you not simply do a fs -put from the location where the 2 TB file
currently resides? HDFS should be able to consume it just fine, as the
client chunks them into fixed size blocks.
On Fri, Apr 19, 2013 at 10:05 AM, 超级塞亚人 shel...@gmail.com wrote:
I have a problem. Our cluster has 32 nodes.
I think the problem here is that he doesn't have Hadoop installed on this
other location so there's no Hadoop DFS client to do the put directly into
HDFS on, he would normally copy the file to one of the nodes in the cluster
where the client files are installed. I've had the same problem recently.
its one (1). Output is below.
...Status: HEALTHY
Total size:903709673179 B
Total dirs:2906
Total files: 0
Total blocks (validated): 20906 (avg. block size 43227287 B)
Minimally replicated blocks: 20906 (100.0 %)
Over-replicated blocks:0 (0.0 %)
How about a different approach:
If you use the multiple output option you can process the valid lines in a
normal way and put the invalid lines in a special separate output file.
On Apr 18, 2013 9:36 PM, Matthias Scherer matthias.sche...@1und1.de
wrote:
Hi all,
** **
In my mapreduce job,
I just realized another trick you might trying. The Hadoop dfs client can
read input from STDIN, you could use netcat to pipe the stuff across to HDFS
without hitting the hard drive, I haven’t tried it, but here’s what I
would think might work:
On the Hadoop box, open a listening port and feed
I don't think this is easy to answer.
maybe it's not decided. if so, can you tell me what important features are
still deveoping or any other reasons?
Appreciate.
Dear all,
I am writing to ask for
1. Any ideas about Performance optimization inside Hadoop framework
based on the NFS-like Shared FileSystem
2. And this mail is also helpful to discuss about whether HDFS should
support POSIX or NFS-like interface.
Hadoop MapReduce Framework both
This basically happens while running a mapreduce job. When a map reduce job
is triggered the job files are put in hdfs with high replication (
replication is controlled by - 'mapred.submit.replication' default value
is 10).
The job files are cleaned up after the job is completed and hence that
Dear Daryn Sharp,
Your reply helps me a lot for code reading of the HDFS and FileSystem
interface.
Thanks.
yours,
Ling Kun
On Thu, Apr 11, 2013 at 10:53 PM, Daryn Sharp da...@yahoo-inc.com wrote:
On Apr 11, 2013, at 5:33 AM, Ling Kun wrote:
Dear all,
I am a little confusing
I have to add that we have 1-2 Billion of Events per day, split to some
thousands of files. So pre-reading each file in the InputFormat should be
avoided.
And yes, we could use MultipleOutputs and write bad files to process each input
file. But we (our Operations team) think that there is more
Can't you use flume for that?
2013/4/19 David Parks davidpark...@yahoo.com
I just realized another trick you might trying. The Hadoop dfs client can
read input from STDIN, you could use netcat to pipe the stuff across to
HDFS without hitting the hard drive, I haven’t tried it, but here’s
Reject the entire file even if a single record is invalid? There has to be
a eeal serious reason to take this approach
If not in any case to check the file has all valid lines you are opening
the files and parsing them. Why not then parse + separate incorrect lines
as suggested in previous mails
How about use a combiner to mark as dirty all rows from a dirty file, for
instance, putting dirty flag as part of the key, then in the reducer you
can simply ignore this rows and/or output the bad file name.
It still will have to pass through the whole file, but at least avoids the
case where you
Hi Ajay Srivastava,
thank you for your explanation.
Regards,
Zheyi Rong
On Thu, Apr 18, 2013 at 5:18 PM, Ajay Srivastava ajay.srivast...@guavus.com
wrote:
The approach which I proposed will have m+n i/o for reading datasets not
the (m + n + m*n) and but further i/o due to spills and
Hi Ted Dunning,
could you please tell me some keywords so that I can google it myself?
Regards,
Zheyi Rong
On Thu, Apr 18, 2013 at 8:52 PM, Ted Dunning tdunn...@maprtech.com wrote:
It is rarely practical to do exhaustive comparisons on datasets of this
size.
The method used is to
Matthias,
As far as I know, there are no guarantees on when counters will be updated
during the job. One thing you can do is to write a metadata file along with
your parsed events listing what files have errors and should be ignored in the
next step of your ETL workflow.
If you really don't
Hi Amit,
It is a bug, fixed by
https://issues.apache.org/jira/browse/HADOOP-6103, although the fix
never made it into branch-1. Can you create a branch-1 patch for this
please?
Thanks,
Tom
On Thu, Apr 18, 2013 at 4:09 AM, Amit Sela am...@infolinks.com wrote:
Hi all,
I was wondering if there
Yes, I restart my cluster.
It may be the OS problem. My cluster has 5 nodes, 4 are redhat, and the last
added one is suse11. The bandwidth setting works on the redhat nodes, but not
on the suse one. My be it doesn't work well when the cluster are composed by
different systems.
Is it a bug?
Hi All
I take NLineInputFormat as the Text Input Format with the following code
:
NLineInputFormat.setNumLinesPerSplit(job, 10);
NLineInputFormat.addInputPath(job,new Path(args[0].toString()));
My input file contains 1000 rows,so I thought it will distribute
100(1000/10) maps.However I got
Hello:
I'm working in a proyect, and i'm using hbase for storage the data, y have this
method that work great but without the performance i'm looking for, so i want
is to make the same but using mapreduce.
public ArrayListMyObject findZ(String z) throws IOException {
Hi I'm running a 10 data node cluster and was experimenting with
adding additional nodes to it. I've done some performance bench
marking with 10 nodes and have compared them to 12 nodes and I've
found some rather interesting and inconsistent results. The behavior
I'm seeing is that during some of
Hi,
I created fat jar to run my M/R driver application, and this fat jar
contains beside other libs:
slf4j-api-1.7.5.jar
slf4j-simple-1.7.5.jar
. and to delegate all commons-logging calls to slf4j.
jcl-over-slf4j-1.7.5.jar
Unfortunatelly, when I start my application using jar
Hi everyone
I am getting this error when i run TestDFSIO. the job actually finishes
successfully. ( according to jobtracker at least ) but this is what i
get on the console :
crawler@d1r2n2:/hadoop$ bin/hadoop jar hadoop-test-1.1.1.jar TestDFSIO
-write -nrFiles 10 -fileSize 1000
The num of map is decided by the block size and your rawdata
—
Sent from Mailbox for iPhone
On Sat, Apr 20, 2013 at 12:30 AM, YouPeng Yang yypvsxf19870...@gmail.com
wrote:
Hi All
I take NLineInputFormat as the Text Input Format with the following code
:
I have got the same problem. And also need help.
Ling Kun
On Sat, Apr 20, 2013 at 8:35 AM, kaveh minooie ka...@plutoz.com wrote:
Hi everyone
I am getting this error when i run TestDFSIO. the job actually finishes
successfully. ( according to jobtracker at least ) but this is what i get
on
Hi
I thought it would be different when adopt the NLineInputFormat
So here is my conclusion the maps distribution has nothing with the
NLineInputFormat . The
NLineInputFormat could decide the number of row to each map, which map has been
generated according to the split.size .
An I
Hi,
Looks like it still points to the old API. The following worked for me -
http://stackoverflow.com/questions/16070587/reading-and-writing-sequencefile-using-hadoop-2-0-apis
String uri = args[0];
Configuration conf = new Configuration();
Path path = new Path( uri);
31 matches
Mail list logo