I am running a single node Hadoop. If I try to debug
org.apache.hadoop.streaming.TestMultipleCachefiles.testMultipleCachefiles,
the following exception tells that
I haven't put webapps on the classpath. I have in fact put src/webapps on
classpath.
So I was wondering what is wrong.
2008-09-06 20:
We did something similar with the ARC format where is record (webpage)
is gzipped and then appended. It is not exactly the same but it may
help. Take a look at the following classes, they are in the Nutch trunk:
org.apache.nutch.tools.arc.ArcInputFormat
org.apache.nutch.tools.arc.ArcRecordRea
Is is possible to set a multiline text input in streaming to be used as
a single record? For example say I wanted to scan a webpage for a
specific regex that is multiline, is this possible in streaming?
Dennis
In your hadoop-site.xml file in config set the following variable:
dfs.datanode.dns.interface
default
The name of the Network Interface from which a data node
should
report its IP address.
Change default to the name of your interface (network card), usually
eth0, eth1, etc.
D
Any detail error message ?
2008/9/5, Abdul Qadeer <[EMAIL PROTECTED]>:
>
> I want to debug the test case
> org.apache.hadoop.streaming.TestMultipleCachefiles.testMultipleCachefiles<
> http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3151/testReport/org.apache.hadoop.streaming/TestMultipleCac
Paths are URIs. Without the authority explicitly specified in the path
or without an overriding definition in hadoop-site.xml,
fs.default.name will be "file:///" from hadoop-default.xml (which
should be why you're writing to local disk instead of HDFS). If you're
running on a single node, f
Kevin,
I think specifying datanode.dns.interface alone for dfs and mapred is enough
(not sure). You only have to set it to eth0 or eth1, etc
J-D
On Sat, Sep 6, 2008 at 7:18 PM, Kevin <[EMAIL PROTECTED]> wrote:
> Hi J-D,
>
> I could not try it right now as I am not familiar with setting up DNS
>
Hi,
I have configured HDFS on windows and running it using Cygwin.
I am interested to access programmatically the files and folders in HDFS. (
mean I can read/write files in HDFS using Java code).
I used this example http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample .
Code is running f
Hi J-D,
I could not try it right now as I am not familiar with setting up DNS
server (I assume the parameters you mentioned are those specifying DNS
server). It actually becomes more interesting as why specifying the IP
does not suffice? Do you mean that hadoop will decide the right IP of
a node b
From the stack trace you provided, your OOM is probably due to
HADOOP-3931, which is fixed in 0.17.2. It occurs when the deserialized
key in an outputted record exactly fills the serialization buffer that
collects map outputs, causing an allocation as large as the size of
that buffer. It ca
FWIW: HADOOP-3940 is merged into the 0.18 branch and should be part of
0.18.1. -C
On Sep 4, 2008, at 6:33 AM, Devaraj Das wrote:
I started a profile of the reduce-task. I've attached the profiling
output.
It seems from the samples that ramManager.waitForDataToMerge()
doesn't
actually wait
Hi,
I have thousands of webpages each represented as serialized tree object
compressed (ZLIB) together (file size varying from 2.5 GB to 4.5GB).
I have to do some heavy text processing on these pages.
What the the best way to read /access these pages.
Method1
***
1) Write Custom
Hi,
I have thousands of webpages each represented as serialized tree object
compressed (ZLIB) together (file size varying from 2.5 GB to 4.5GB).
I have to do some heavy text processing on these pages.
What the the best way to read /access these pages.
Method1
***
1) Write Custom
On Sep 6, 2008, at 9:35 AM, Ryan LeCompte wrote:
I have a question regarding multiple output files that get produced as
a result of using multiple reduce tasks for a job (as opposed to only
one). If I'm using a custom writable and thus writing to a sequence
output, am I gauranteed that all of t
This clears up my concerns. Thanks!
Ryan
On Sep 6, 2008, at 2:17 PM, Owen O'Malley <[EMAIL PROTECTED]> wrote:
On Sep 6, 2008, at 9:35 AM, Ryan LeCompte wrote:
I have a question regarding multiple output files that get produced
as
a result of using multiple reduce tasks for a job (as opp
You can give a comma separated list of files and directories to the
FileInputFormats, such as TextInputFormat. Directories are expanded
one level, so dir1 becomes dir1/*, but not dir1/*/*.
-- Owen
Hello,
> I'm currently developing a map/reduce program that emits a fair amount
> of maps per input record (around 50 - 100), and I'm getting OutOfMemory
> errors:
Sorry for the noise, I found out I had to set the mapred.child.java.opts
JobConf parameter to "-Xmx512m" to make 512MB of heap space
Hello,
I have a question regarding multiple output files that get produced as
a result of using multiple reduce tasks for a job (as opposed to only
one). If I'm using a custom writable and thus writing to a sequence
output, am I gauranteed that all of the day for a particular key will
appear in a
Hello,
I'm currently developing a map/reduce program that emits a fair amount of maps
per input record (around 50 - 100), and I'm getting OutOfMemory errors:
2008-09-06 15:28:08,993 ERROR org.apache.hadoop.mapred.pipes.BinaryProtocol:
java.lang.OutOfMemoryError: Java heap space
at
org.
Yes, I agree that that page is confusing. There was a thread named
"Confusing NameNodeFailover page in Hadoop Wiki" in August and some stuff
was done (like the failover page was removed) but my guess is that there is
still work to do.
Since this is an open wiki, anyone can edit it (smile).
J-D
O
These exceptions are apparently coming from the dfs side of things. Could
someone from the dfs side please look at these?
On 9/5/08 3:04 PM, "Espen Amble Kolstad" <[EMAIL PROTECTED]> wrote:
> Hi,
>
> Thanks!
> The patch applies without change to hadoop-0.18.0, and should be
> included in a 0.18
I know The NameNode is a Single Point of Failure for the HDFS Cluster, When
the meta in NameNode goes down, all data in filesystem is destroy.
Without consideration of any backup of metadata.
question: is it valuable to implement retrive metadata from the block report
from the slaves?
just someth
Actually, I hava readed about: The term "secondary name-node" is somewhat
misleading. It is not a name-node in the sense that data-nodes cannot
connect to the secondary name-node, and in no event it can replace the
primary name-node in case of its failure.
But today, I read another article in the
Hi Sayali,
Yes, you can submit a collection of files from HDFS as input to the
job. Please take a look at the WordCount example in the Map/Reduce
tutorial for an example:
http://hadoop.apache.org/core/docs/r0.18.0/mapred_tutorial.html#Example%3A+WordCount+v1.0
Ryan
On Sat, Sep 6, 2008 at 9:03
Hello,
When starting a hadoop job, I need to specify an input file and an output file.
Can I instead specify a list of input files?
example, I have the input distributed in the files:
file000,
file001,
file002,
file003,
...
So I can I specify input files as file*. I can add all my files to HDFS.
Hi,
See http://wiki.apache.org/hadoop/FAQ#7 and
http://hadoop.apache.org/core/docs/r0.17.2/hdfs_user_guide.html#Secondary+Namenode
Regards,
J-D
On Sat, Sep 6, 2008 at 5:26 AM, 叶双明 <[EMAIL PROTECTED]> wrote:
> Hi all!
>
> The NameNode is a Single Point of Failure for the HDFS Cluster. There
> i
Hi all!
The NameNode is a Single Point of Failure for the HDFS Cluster. There
is support for NameNodeFailover, with a SecondaryNameNode hosted on a
separate machine being able to stand in for the original NameNode if
it goes down.
Is it right? is SecondaryNameNode in support for the NameNode?
S
It is enough for you to know whice Dirctory in the HDFS contain your index
datat rather than whice datanode.
2008/9/6, Jean-Daniel Cryans <[EMAIL PROTECTED]>:
>
> Hi,
>
> I suggest that you read how data is stored in HDFS, see
> http://hadoop.apache.org/core/docs/r0.18.0/hdfs_design.html
>
> J-D
28 matches
Mail list logo