Hi Allen,
Once a node goes down, the dfs health dashboard takes 40 to 45 minutes
to refresh the status from live to dead.
But I see that the 'Last Contact' field at Live 'dfsnodelist' page
getting updated every 30 seconds.
Allen Wittenauer wrote:
On 11/1/09 10:11 PM, "V Satish Kumar"
On Mon, Nov 2, 2009 at 4:39 PM, Vipul Sharma wrote:
> Okay I think I was not clear in my first post about the question. Let me
> try
> again.
>
> I have an application that gets large number of xml files every minute
> which
> are copied over to hdfs. Each file is around 1Mb each and contains sev
Mark,
were you able to concatenate both the xml files together. What did you do to
keep the resulting xml well forned?
Regards,
Vipul Sharma,
Cell: 281-217-0761
Okay I think I was not clear in my first post about the question. Let me try
again.
I have an application that gets large number of xml files every minute which
are copied over to hdfs. Each file is around 1Mb each and contains several
records. Files are well formed xml files with a starting tag
On the other hand, a NetBeans plug-in has been and is being super-
maintained, so take a look at http://www.hadoopstudio.org if you're
not completely wedded to Eclipse.
:)
Martin
On Nov 2, 2009, at 1:12 PM, Philip Zeyliger wrote:
Hi Le,
Unfortunately as of late the Eclipse plug-in has be
Are the xml's in flat files or stored in Hbase?
1. If they are in flat files, you can use the StreamXmlRecordReader if that
works for you.
2. Or you can read the xml into a single string and process it however you
want. (This can be done if its in a flat file or stored in an hbase table).
I have
I am working on a mapreduce application that will take input from lots of
small xml files rather than one big xml file. Each xml files has some record
that I want to parse and input data in a hbase table. How should I go about
parsing xml files and input in map functions. Should I have one mapper p
I am gettin ready to set up a hadoop cluster, starting small but going
to add pretty quickly. I am planning on running the following on the
cluster, hadoop, hdfs, hbase, nutch, and mahout.
So far I have two Dell SC1425 dual processor (2.8 ghz), 4 GB Ram, 2
1.5 TB Sata drives, on a gigabit
Thanks Guys! That's helpful.
On Mon, Nov 2, 2009 at 1:22 PM, Todd Lipcon wrote:
> We generally recommend sticking with whatever Linux is already common
> inside
> your organization. Hadoop itself should run equally well on CentOS 5, RHEL
> 5, or any reasonably recent Ubuntu/Debian. It will proba
I am new to hadoop and still learning most of the details. I am working on an
application that will take input from lots of small xml files. Each xml
files has some record that I want to parse and input data in a hbase table.
How should I go about parsing xml files and input in map functions. Shou
We generally recommend sticking with whatever Linux is already common inside
your organization. Hadoop itself should run equally well on CentOS 5, RHEL
5, or any reasonably recent Ubuntu/Debian. It will probably be OK on any
other variety of Linux as well (eg SLES), though they are less commonly
us
Hi Le,
Unfortunately as of late the Eclipse plug-in has been undermaintained. I've
anecdotally heard that appyling the fix in "
http://issues.apache.org/jira/browse/HADOOP-3744";, and building the plugin
("ant -Declipse.home=... binary" should build it) will make it work. Do
comment on that JIRA
Hi Satish,
This doesn't solve your current problem, but from 0.20 (after HADOOP-4029),
"4. List of live/dead nodes is moved to separate page. "
Koji
On 11/1/09 11:11 PM, "V Satish Kumar" wrote:
> Hi,
>
> I have noticed that the dfs health dashboard(Running on port 50070)
> takes a long tim
Based on what I've seen on the list, larger installations tend to use
RedHat Enterprise Linux or one of its clones like CentOS.
On Mon, Nov 2, 2009 at 2:16 PM, Praveen Yarlagadda wrote:
> Hi all,
>
> I have been running Hadoop on Ubuntu for a while now in distributed mode (4
> node cluster). Just
Hi all,
I have been running Hadoop on Ubuntu for a while now in distributed mode (4
node cluster). Just playing around with it.
Going forward, I am planning to have more nodes added to the cluster. Just
want to know which linux flavor is the best
to run Hadoop on? Please let me know.
Regards,
Pra
Hi All (& sorry for possible double posting),
Does anybody know whether the hadoop eclipse plugin is still supported?
I've tried using the 0.18.0 and 0.18.3 plugins to talk to the hadoop
0.18.0 virtual machine, or an installed hadoop 0.18.3. All trials have
been unsuccessful.
Btw, I closel
Ok, thank you very much Amogh, I will redesign my program.
-Original Message-
From: Amogh Vasekar [mailto:am...@yahoo-inc.com]
Sent: Monday, November 02, 2009 11:45 AM
To: common-user@hadoop.apache.org
Subject: Re: Multiple Input Paths
Mark,
Set-up for a mapred job consumes a considerabl
Mark,
Set-up for a mapred job consumes a considerable amount of time and resources
and so, if possible a single job is preferred.
You can add multiple paths to your job, and if you need different processing
logic depending upon the input being consumed, you can use parameter
map.input.file in yo
On 11/1/09 10:11 PM, "V Satish Kumar" wrote:
> I have noticed that the dfs health dashboard(Running on port 50070)
> takes a long time to refresh the number of live nodes and dead nodes. Is
> there a config parameter in hadoop that can be changed to make the
> dashboard shows these changes mor
Yes, the structure is similar. They're both XML log files documenting the same
set of data, just in different ways.
That's a really cool idea though, to combine them. How exactly would I go about
doing that?
-Original Message-
From: L [mailto:archit...@galatea.com]
Sent: Monday, Novemb
Mark,
Is the structure of both files the same? It makes even more sense to
combine the files, if you can, as I have seen a considerable speed up
when I've done that (at least when I've had small files to deal with).
Lajos
Mark Vigeant wrote:
Hey, quick question:
I'm writing a program that
Hey, quick question:
I'm writing a program that parses data from 2 different files and puts the data
into a table. Currently I have 2 different map functions and so I submit 2
separate jobs to the job client. Would it be more efficient to add both paths
to the same mapper and only submit one jo
Nominally, when the map is done, the close is fired, and all framework
opened output files are flushed and the task waits for all of the ack's from
the block hosting datanodes, then the output committer stages files into the
task output directory.
It sounds like there may be an issue with the clos
inline
On Mon, Nov 2, 2009 at 3:15 AM, Zhang Bingjun (Eddy) wrote:
> Dear Khurana,
>
> We didn't use MapRunnable. In stead, we used directly the package
> org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper and passed our
> normal Mapper Class to it using its getMapperClass() interface. We se
Dear Khurana,
We didn't use MapRunnable. In stead, we used directly the package
org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper and passed our
normal Mapper Class to it using its getMapperClass() interface. We set the
number of threads using its setNumberOfThreads(). Is this one correct wa
Hi all,
An important observation. The 100% mapper without completion all have
temporary files of 64MB exactly, which means the output of the mapper is cut
off at the block boundary. However, we do have some successfully completed
mappers having output files larger than 64MB and we also have less t
On Mon, Nov 2, 2009 at 2:40 AM, Zhang Bingjun (Eddy) wrote:
> Hi Pallavi, Khurana, and Vasekar,
>
> Thanks a lot for your reply. To make up, the mapper we are using is the
> multithreaded mapper.
>
How are you doing this? Did you your own MapRunnable?
>
> To answer your questions:
>
> Pallavi,
Hi Pallavi, Khurana, and Vasekar,
Thanks a lot for your reply. To make up, the mapper we are using is the
multithreaded mapper.
To answer your questions:
Pallavi, Khurana: I have checked the logs. The key it got stuck on is the
last key it reads in. Since the progress is 100% I suppose the key i
Hi,
Quick questions...
Are you creating too many small files?
Are there any task side files being created?
Is the heap for NN having enough space to list metadata? Any details on its
general health will probably be helpful to people on the list.
Amogh
On 11/2/09 2:02 PM, "Zhang Bingjun (Eddy)
Hi Eddy,
I faced similar issue when I used pig script for fetching webpages for
certain urls. I could see the map phase showing100% and it is still
running. As I was logging the page that it is currently fetching, I
could see the process hasn't yet finished. It might be the same issue.
So, you ca
Did you try to add any logging and see what keys are they getting stuck on
or whats the last keys it processed? Do the same number of mappers get stuck
every time?
Not having reducers is not a problem. Its pretty normal to do that.
On Mon, Nov 2, 2009 at 12:32 AM, Zhang Bingjun (Eddy) wrote:
> D
Dear hadoop fellows,
We have been using Hadoop-0.20.1 MapReduce to crawl some web data. In this
case, we only have mappers to crawl data and save data into HDFS in a
distributed way. No reducers is specified in the job conf.
The problem is that for every job we have about one third mappers stuck
33 matches
Mail list logo