split big files into small ones to later copy

2013-06-07 Thread Pedro Sá da Costa
I have one 500GB plain-text file in HDFS, and I want to copy locally, to zip it and put it on another machine in a local disk. The problem is that I don't have enough space in the local disk where HDFS is, to then zip it and transfer to another host. Can I split the file into small files to be

Re: Why my tests shows Yarn is worse than MRv1 for terasort?

2013-06-07 Thread Sandy Ryza
Hey Sam, Thanks for sharing your results. I'm definitely curious about what's causing the difference. A couple observations: It looks like you've got yarn.nodemanager.resource.memory-mb in there twice with two different values. Your max JVM memory of 1000 MB is (dangerously?) close to the

Re: Issue with -libjars option in cluster in Hadoop 1.0

2013-06-07 Thread Thilo Goetz
On 06/06/2013 08:09 PM, Shahab Yunus wrote: It is trying to read the JSON4J.jar from local/home/hadoop. Does that jar exist at this path on the client from which you are invoking it? Does this jar in the current dir from which your are kicking off the job? Yes and yes. In fact, the job goes

Re: Is counter a static var

2013-06-07 Thread Sai Sai
Is counter like a static var. If so is it persisted on the name node or data node. Any input please. Thanks Sai

Re: Is it possible to define num of mappers to run for a job

2013-06-07 Thread Sai Sai
Is it possible to define num of mappers to run for a job. What r the conditions we need to be aware of when defining such a thing. Please help. Thanks Sai

Re: Pool slot questions

2013-06-07 Thread Sai Sai
1. Can we think of a job pool similar to a queue. 2. Is it possible to configure a slot if so how. Please help. Thanks Sai

question about LinuxResourceCalculatorPlugin

2013-06-07 Thread Alexey Babutin
Hi, LinuxResourceCalculatorPlugin and ProcfsBasedProcessTree get info about memory and cpu, but who use this parameters and why? I want try to make analog for freebsd.

protobuf.ServiceException: OutOfMemoryError

2013-06-07 Thread YouPeng Yang
Hi all I find that some if my DNs go to dead. and the datanode log shows as [1]: I got the java.lang.OutOfMemoryError: Java heap space. I wander how this could come out. [1]: 1496 at file

Re: Pool slot questions

2013-06-07 Thread Shahab Yunus
Sai, This is regarding all your recent emails and questions. I suggest that you read Hadoop: The Definitive Guide by Tom White ( http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1449311520) as it goes through all of your queries in detail and with examples. The questions that you are

RE: Why/When partitioner is used.

2013-06-07 Thread John Lilley
There are kind of two parts to this. The semantics of MapReduce promise that all tuples sharing the same key value are sent to the same reducer, so that you can write useful MR applications that do things like “count words” or “summarize by date”. In order to accomplish that, the shuffle

what is the Interval of BlockPoolSliceScanner.

2013-06-07 Thread YouPeng Yang
Hi all I find that when datanode startup it will run BlockPoolSliceScanner to verificate the BP. What is the Interval of BlockPoolSliceScanner, and when does the BlockPoolSliceScanner begin to work? And

DirectoryScanner's OutOfMemoryError

2013-06-07 Thread YouPeng Yang
Hi All I have found that the DirectoryScanner gets error: Error compiling report because of java.lang.OutOfMemoryError: Java heap space. The log details are as [1]: How does the error come out ,and how to solve this exception? [1] 2013-06-07 22:20:28,199 INFO

History server - Yarn

2013-06-07 Thread Rahul Bhattacharjee
Hello, I was doing some sort of prototyping on top of YARN. I was able to launch AM and then AM in turn was able to spawn a few containers and do certain job.The yarn application terminated successfully. My question is about the history server. I think the history server is an offering from yarn

Re: Why/When partitioner is used.

2013-06-07 Thread Bryan Beaudreault
There are practical applications for defining your own partitioner as well: 1) Controlling database concurrency. For instance, lets say you have a distributed datastore like HBase or even your own mysql sharding scheme. Using the default HashPartitioner, keys will get for the most part randomly

Re: History server - Yarn

2013-06-07 Thread Sandy Ryza
Hi Rahul, The job history server is currently specific to MapReduce. -Sandy On Fri, Jun 7, 2013 at 8:56 AM, Rahul Bhattacharjee rahul.rec@gmail.com wrote: Hello, I was doing some sort of prototyping on top of YARN. I was able to launch AM and then AM in turn was able to spawn a few

Re: History server - Yarn

2013-06-07 Thread Rahul Bhattacharjee
Thanks Sandy. On Fri, Jun 7, 2013 at 9:29 PM, Sandy Ryza sandy.r...@cloudera.com wrote: Hi Rahul, The job history server is currently specific to MapReduce. -Sandy On Fri, Jun 7, 2013 at 8:56 AM, Rahul Bhattacharjee rahul.rec@gmail.com wrote: Hello, I was doing some sort of

Re: Please explain FSNamesystemState TotalLoad

2013-06-07 Thread Nick Niemeyer
Regarding TotalLoad, what would be normal operating tolerances per node for this metric? When should one become concerned? Thanks again to everyone participating in this community. :) Nick From: Suresh Srinivas sur...@hortonworks.commailto:sur...@hortonworks.com Reply-To:

Re: DirectoryScanner's OutOfMemoryError

2013-06-07 Thread Harsh J
Please see https://issues.apache.org/jira/browse/HDFS-4461. You may have to raise your heap for DN if you've accumulated a lot of blocks per DN. On Fri, Jun 7, 2013 at 8:33 PM, YouPeng Yang yypvsxf19870...@gmail.com wrote: Hi All I have found that the DirectoryScanner gets error: Error

Re: Why/When partitioner is used.

2013-06-07 Thread Harsh J
Why not also ask yourself, what if you do not send all keys to the same reducer? Would you get the results you desire that way? :) On Fri, Jun 7, 2013 at 4:47 PM, Sai Sai saigr...@yahoo.in wrote: I always get confused why we should partition and what is the use of it. Why would one want to send

Re: Mapreduce using JSONObjects

2013-06-07 Thread Lance Norskog
A side point for Hadoop experts: a comparator is used for sorting in the shuffle. If a comparator always returns -1 for unequal objects, then sorting will take longer than it should because there will be a certain amount of items that are compared more than once. Is this true? On 06/05/2013

Re: Please explain FSNamesystemState TotalLoad

2013-06-07 Thread Suresh Srinivas
On Fri, Jun 7, 2013 at 9:10 AM, Nick Niemeyer nnieme...@riotgames.comwrote: Regarding TotalLoad, what would be normal operating tolerances per node for this metric? When should one become concerned? Thanks again to everyone participating in this community. :) Why do you want to be

Re: Pool slot questions

2013-06-07 Thread Patai Sangbutsarakum
Totally agree with Shahab, just a quick answer, but detail is your homework Can we think of a job pool similar to a queue. I do think, partition the slot resource into different chunk size. FS. inside can be choose between FIFO or FAIR Queue. it's FIFO. cool thing about queue in Yarn is

Re: Mapreduce using JSONObjects

2013-06-07 Thread Max Lebedev
Hi again. I am attempting to compare the strings as JSON objects using hashcodes with the ultimate goal to remove duplicates. I've have implemented the following solution. 1. I parse the input line into a JsonElement using the Google JSON parser (Gson), 2. I take the hash code of the resulting

Job History files location of 2.0.4

2013-06-07 Thread Boyu Zhang
Dear All, I recently moved from Hadoop0.20.2 to 2.0.4, and I am trying to find the old job history files (used to be in hdfs, output/_logs/history), it records detailed time information for every task attempts. But now it is not on hdfs anymore, I copied the entire / from hdfs to my local dir,

Re: Job History files location of 2.0.4

2013-06-07 Thread Shahab Yunus
See this; http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201302.mbox/%3c1360184802.61630.yahoomail...@web141205.mail.bf1.yahoo.com%3E Regards, Shahab On Fri, Jun 7, 2013 at 4:33 PM, Boyu Zhang boyuzhan...@gmail.com wrote: Dear All, I recently moved from Hadoop0.20.2 to 2.0.4,

Re: Job History files location of 2.0.4

2013-06-07 Thread Boyu Zhang
Thanks Shahab, I saw the link, but it is not the case for me. I copied everything from hdfs ($HADOOP_HOME/bin/hdfs dfs -copyToLocal / $local_dir). But did not see the logs. Did it work for you? Thanks, Boyu On Fri, Jun 7, 2013 at 1:52 PM, Shahab Yunus shahab.yu...@gmail.com wrote: See this;

How to add and remove datanode dynamically?

2013-06-07 Thread Mohammad Mustaqeem
How can we add and remove datanodes dynamically? means that there is a namenode and some datanodes running, in that cluster how can we add more datanodes? -- *With regards ---* *Mohammad Mustaqeem*, M.Tech (CSE) MNNIT Allahabad 9026604270

Re: Job History files location of 2.0.4

2013-06-07 Thread Boyu Zhang
Hi Shahab, How old were they? They are new, I did the copy automatically right after the job completed, in a script. I am assuming they were from the jobs run on the older version, right? I run the job using the hadoop version 2.0.4 if this is what you mean. Or are you looking for new jobs's

Re: Job History files location of 2.0.4

2013-06-07 Thread Shahab Yunus
What value do you have for hadoop.log.dir property? On Fri, Jun 7, 2013 at 5:20 PM, Boyu Zhang boyuzhan...@gmail.com wrote: Hi Shahab, How old were they? They are new, I did the copy automatically right after the job completed, in a script. I am assuming they were from the jobs run on

Re: Job History files location of 2.0.4

2013-06-07 Thread Boyu Zhang
I used a directory that is local to every slave node: export HADOOP_LOG_DIR=/scratch/$USER/$PBS_JOBID/hadoop-$USER/log. I did not change the hadoop.job.history.user.location, I thought if I don't change this property, the job history is going to be stored in hdfs under output/_logs dir. Then

Re: How to add and remove datanode dynamically?

2013-06-07 Thread 王洪军
reference Hadoop:The Definitive Guide(3rd,2012.5) p359 2013/6/8 Mohammad Mustaqeem 3m.mustaq...@gmail.com How can we add and remove datanodes dynamically? means that there is a namenode and some datanodes running, in that cluster how can we add more datanodes? -- *With regards ---*

hdfsConnect/hdfsWrite API writes conetnts of file to local system instead of HDFS system

2013-06-07 Thread Venkivolu, Dayakar Reddy
Hi, I have created the sample program to write the contents into HDFS file system. The file gets created successfully, but unfortunately the file is getting created in Local system instead of HDFS system. Here is the source code of sample program: int main(int argc, char **argv) {