Hi all,
I am getting frequent Namespace Id exceptions . I am running hadoop
0.20.0 on a cluster of 8 machines. Data nodes work properly for some
time and then they stop automatically logging an error of NameSpaceID
exception . Iam manually deleting namespace data , formatting name
node and restart
On 3/6/10 10:41 PM, "jiang licht" wrote:
> I can feel that pain, Kerberos needs you to pull more hair from your head :) I
> worked on it a while back and now only remember bit of it.
The only other real choice is PKI. CRLs? Blech. I'd much rather tie the
grid into my pre-existing Active Dir
2010/3/8 William Kang
> Hi guys,
> Thanks for your replies. I did not put anything in /tmp. It's just that
>
default setting of dfs.name.dir/dfs.data.dir is set to the subdir in /tmp
every time when I restart the hadoop, the localhost:50070 does not show up.
> The localhost:50030 is fine. Unles
Hi guys,
Thanks for your replies. I did not put anything in /tmp. It's just that
every time when I restart the hadoop, the localhost:50070 does not show up.
The localhost:50030 is fine. Unless I reformat namenode, I wont be able to
see the HDFS' web page at 50070. It did not clean /tmp automaticall
Hi William,
Can you provide a snapshot of the log-file log/hadoop-hadoop-namenode.log
file when start of service fails on reboot of machine ? Also what does your
configuration look like ?
Thanks,
Sagar
-Original Message-
From: William Kang [mailto:weliam.cl...@gmail.com]
Sent: Mon
Yeah. Don't put things in /tmp. That's unpleasant in the long run.
On Sun, Mar 7, 2010 at 9:36 PM, Eason.Lee wrote:
> Your /tmp directory is cleaned automaticly?
>
> Try to set dfs.name.dir/dfs.data.dir to a safe dir~~
>
> 2010/3/8 William Kang
>
>> Hi all,
>> I am running HDFS in Pseudo-distrib
Your /tmp directory is cleaned automaticly?
Try to set dfs.name.dir/dfs.data.dir to a safe dir~~
2010/3/8 William Kang
> Hi all,
> I am running HDFS in Pseudo-distributed mode. Every time after I restarted
> the machine, I have to format the namenode otherwise the localhost:50070
> wont show up
Hi all,
I am running HDFS in Pseudo-distributed mode. Every time after I restarted
the machine, I have to format the namenode otherwise the localhost:50070
wont show up. It is quite annoying to do so since all the data would be
lost. Does anybody know this happens? And how should I fix this problem
Lowering mapred.job.shuffle.input.buffer.percent would be the option to
choose.
Maybe GC wasn't releasing memory fast enough for in memory shuffling.
On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins wrote:
>
> Thanks Ted. Very helpful. You are correct that I misunderstood the code
> at ReduceTask
Hi
Hdfs shell command standard I/O is supported. I think if you using it for
avoiding temporary save to local file system.
For example:
wget https://web-server/file-path -O - | hadoop fs -put -
hdfs://nn.example.com/hadoop/hadoopfile
Refer to this URL
http://hadoop.apache.org/common/docs/curren
Thanks Ted. Very helpful. You are correct that I misunderstood the code at
ReduceTask.java:1535. I missed the fact that it's in a IOException catch
block. My mistake. That's what I get for being in a rush.
For what it's worth I did re-run the job with mapred.reduce.parallel.copies
set
My observation is based on this call chain:
MapOutputCopier.run() calling copyOutput() calling getMapOutput() calling
ramManager.canFitInMemory(decompressedLength)
Basically ramManager.canFitInMemory() makes decision without considering the
number of MapOutputCopiers that are running. Thus 1.25 *
Ted,
I'm trying to follow the logic in your mail and I'm not sure I'm following.
If you would mind helping me understand I would appreciate it.
Looking at the code maxSingleShuffleLimit is only used in determining if the
copy _can_ fit into memory:
boolean canFitInMemory(long
Ted,
Thank you. I filled MAPREDUCE-1571 to cover this issue. I might have
some time to write a patch later this week.
Jacob Rideout
On Sat, Mar 6, 2010 at 11:37 PM, Ted Yu wrote:
> I think there is mismatch (in ReduceTask.java) between:
> this.numCopiers = conf.getInt("mapred.reduce.parall
distcp seems to copy between clusters.
http://hadoop.apache.org/common/docs/current/distcp.html
http://hadoop.apache.org/common/docs/current/distcp.html
zenMonkey wrote:
>
> I want to write a script that pulls data (flat files) from a remote
> machine and pushes that into its hadoop cluster
Phil,
what you are describing is close to what Nutch is already doing. You can
look at it - all this coding is non-trivial, and you can save yourself a lot
of work and debugging.
Mark
On Sun, Mar 7, 2010 at 8:30 AM, Zak Stone wrote:
> Hi Phil,
>
> If you treat each HTTP request as a Hadoop tas
Hi Phil,
If you treat each HTTP request as a Hadoop task and the individual
HTTP responses are small, you may find that the latency of the web
service leaves most of your Hadoop processes idle most of the time.
To avoid this problem, you can let each mapper make many HTTP requests
in parallel, ei
I have compiled the program without errors.
This is what my .jar file looks like:
Its name is Election.jar
Directory:had...@varun:~/
hadoop-0.20.1
Inside the jar these are the files
Election.class
Election$Reduce.class
Election$Map.class
META-INF/
Manifest-Version: 1.0
Created-By: 1.6.0_0 (Sun Mic
Thanks to Mridul, here is an approach suggested by him based on pig,
which works fine for me :
input_lines = load 'my_s3_list_file' as (location_line:chararray);
grp_op = GROUP input_lines BY location_line PARALLEL $NUM_MAPPERS_REQUIRED;
actual_result = FOREACH grp_op GENERATE MY_S3_UDF(group);
I
Hi,
I'm new to Hadoop, and I'm trying to figure out the best way to use it
to parallelize a large number of calls to a web API, and then process
and store the results.
The calls will be regular HTTP requests, and the URLs follow a known
format, so can be generated easily. I'd like to understand h
*I just ran a simple program, and got the error below:*
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGFPE (0x8) at pc=0x0030bda07927, pid=22891, tid=1076017504
#
# JRE version: 6.0_18-b07
# Java VM: Java HotSpot(TM) 64-Bit Server VM (16.0-b13 mixed mode
linux-amd
21 matches
Mail list logo