I have one 500GB plain-text file in HDFS, and I want to copy locally, to
zip it and put it on another machine in a local disk. The problem is that I
don't have enough space in the local disk where HDFS is, to then zip it and
transfer to another host.
Can I split the file into small files to be
Hey Sam,
Thanks for sharing your results. I'm definitely curious about what's
causing the difference.
A couple observations:
It looks like you've got yarn.nodemanager.resource.memory-mb in there twice
with two different values.
Your max JVM memory of 1000 MB is (dangerously?) close to the
On 06/06/2013 08:09 PM, Shahab Yunus wrote:
It is trying to read the JSON4J.jar from local/home/hadoop. Does that
jar exist at this path on the client from which you are invoking it?
Does this jar in the current dir from which your are kicking off the job?
Yes and yes. In fact, the job goes
Is counter like a static var. If so is it persisted on the name node or data
node.
Any input please.
Thanks
Sai
Is it possible to define num of mappers to run for a job.
What r the conditions we need to be aware of when defining such a thing.
Please help.
Thanks
Sai
1. Can we think of a job pool similar to a queue.
2. Is it possible to configure a slot if so how.
Please help.
Thanks
Sai
Hi,
LinuxResourceCalculatorPlugin and ProcfsBasedProcessTree get info about
memory and cpu,
but who use this parameters and why?
I want try to make analog for freebsd.
Hi all
I find that some if my DNs go to dead. and the datanode log shows as [1]:
I got the java.lang.OutOfMemoryError: Java heap space.
I wander how this could come out.
[1]:
1496 at file
Sai,
This is regarding all your recent emails and questions. I suggest that you
read Hadoop: The Definitive Guide by Tom White (
http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1449311520) as
it goes through all of your queries in detail and with examples. The
questions that you are
There are kind of two parts to this. The semantics of MapReduce promise that
all tuples sharing the same key value are sent to the same reducer, so that you
can write useful MR applications that do things like “count words” or
“summarize by date”. In order to accomplish that, the shuffle
Hi all
I find that when datanode startup it will run BlockPoolSliceScanner to
verificate the BP.
What is the Interval of BlockPoolSliceScanner, and when does the
BlockPoolSliceScanner begin to work?
And
Hi All
I have found that the DirectoryScanner gets error: Error compiling
report because of java.lang.OutOfMemoryError: Java heap space.
The log details are as [1]:
How does the error come out ,and how to solve this exception?
[1]
2013-06-07 22:20:28,199 INFO
Hello,
I was doing some sort of prototyping on top of YARN. I was able to launch
AM and then AM in turn was able to spawn a few containers and do certain
job.The yarn application terminated successfully.
My question is about the history server. I think the history server is an
offering from yarn
There are practical applications for defining your own partitioner as well:
1) Controlling database concurrency. For instance, lets say you have a
distributed datastore like HBase or even your own mysql sharding scheme.
Using the default HashPartitioner, keys will get for the most part
randomly
Hi Rahul,
The job history server is currently specific to MapReduce.
-Sandy
On Fri, Jun 7, 2013 at 8:56 AM, Rahul Bhattacharjee rahul.rec@gmail.com
wrote:
Hello,
I was doing some sort of prototyping on top of YARN. I was able to launch
AM and then AM in turn was able to spawn a few
Thanks Sandy.
On Fri, Jun 7, 2013 at 9:29 PM, Sandy Ryza sandy.r...@cloudera.com wrote:
Hi Rahul,
The job history server is currently specific to MapReduce.
-Sandy
On Fri, Jun 7, 2013 at 8:56 AM, Rahul Bhattacharjee
rahul.rec@gmail.com wrote:
Hello,
I was doing some sort of
Regarding TotalLoad, what would be normal operating tolerances per node for
this metric? When should one become concerned? Thanks again to everyone
participating in this community. :)
Nick
From: Suresh Srinivas sur...@hortonworks.commailto:sur...@hortonworks.com
Reply-To:
Please see https://issues.apache.org/jira/browse/HDFS-4461. You may
have to raise your heap for DN if you've accumulated a lot of blocks
per DN.
On Fri, Jun 7, 2013 at 8:33 PM, YouPeng Yang yypvsxf19870...@gmail.com wrote:
Hi All
I have found that the DirectoryScanner gets error: Error
Why not also ask yourself, what if you do not send all keys to the
same reducer? Would you get the results you desire that way? :)
On Fri, Jun 7, 2013 at 4:47 PM, Sai Sai saigr...@yahoo.in wrote:
I always get confused why we should partition and what is the use of it.
Why would one want to send
A side point for Hadoop experts: a comparator is used for sorting in the
shuffle. If a comparator always returns -1 for unequal objects, then
sorting will take longer than it should because there will be a certain
amount of items that are compared more than once.
Is this true?
On 06/05/2013
On Fri, Jun 7, 2013 at 9:10 AM, Nick Niemeyer nnieme...@riotgames.comwrote:
Regarding TotalLoad, what would be normal operating tolerances per node
for this metric? When should one become concerned? Thanks again to
everyone participating in this community. :)
Why do you want to be
Totally agree with Shahab,
just a quick answer, but detail is your homework
Can we think of a job pool similar to a queue.
I do think, partition the slot resource into different chunk size.
FS. inside can be choose between FIFO or FAIR
Queue. it's FIFO.
cool thing about queue in Yarn is
Hi again.
I am attempting to compare the strings as JSON objects using hashcodes with
the ultimate goal to remove duplicates.
I've have implemented the following solution.
1. I parse the input line into a JsonElement using the Google JSON parser
(Gson),
2. I take the hash code of the resulting
Dear All,
I recently moved from Hadoop0.20.2 to 2.0.4, and I am trying to find the
old job history files (used to be in hdfs, output/_logs/history), it
records detailed time information for every task attempts.
But now it is not on hdfs anymore, I copied the entire / from hdfs to my
local dir,
See this;
http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201302.mbox/%3c1360184802.61630.yahoomail...@web141205.mail.bf1.yahoo.com%3E
Regards,
Shahab
On Fri, Jun 7, 2013 at 4:33 PM, Boyu Zhang boyuzhan...@gmail.com wrote:
Dear All,
I recently moved from Hadoop0.20.2 to 2.0.4,
Thanks Shahab,
I saw the link, but it is not the case for me. I copied everything from
hdfs ($HADOOP_HOME/bin/hdfs dfs -copyToLocal / $local_dir). But did not see
the logs.
Did it work for you?
Thanks,
Boyu
On Fri, Jun 7, 2013 at 1:52 PM, Shahab Yunus shahab.yu...@gmail.com wrote:
See this;
How can we add and remove datanodes dynamically?
means that there is a namenode and some datanodes running, in that cluster
how can we add more datanodes?
--
*With regards ---*
*Mohammad Mustaqeem*,
M.Tech (CSE)
MNNIT Allahabad
9026604270
Hi Shahab,
How old were they?
They are new, I did the copy automatically right after the job completed,
in a script.
I am assuming they were from the jobs run on the older version, right?
I run the job using the hadoop version 2.0.4 if this is what you mean.
Or are you looking for new jobs's
What value do you have for hadoop.log.dir property?
On Fri, Jun 7, 2013 at 5:20 PM, Boyu Zhang boyuzhan...@gmail.com wrote:
Hi Shahab,
How old were they?
They are new, I did the copy automatically right after the job completed,
in a script.
I am assuming they were from the jobs run on
I used a directory that is local to every slave node: export
HADOOP_LOG_DIR=/scratch/$USER/$PBS_JOBID/hadoop-$USER/log.
I did not change the hadoop.job.history.user.location, I thought if I
don't change this property, the job history is going to be stored in hdfs
under output/_logs dir.
Then
reference Hadoop:The Definitive Guide(3rd,2012.5) p359
2013/6/8 Mohammad Mustaqeem 3m.mustaq...@gmail.com
How can we add and remove datanodes dynamically?
means that there is a namenode and some datanodes running, in that cluster
how can we add more datanodes?
--
*With regards ---*
Hi,
I have created the sample program to write the contents into HDFS file system.
The file gets created successfully, but unfortunately the file is getting
created in
Local system instead of HDFS system.
Here is the source code of sample program:
int main(int argc, char **argv) {
32 matches
Mail list logo