Re: Auto clean DistCache?

2013-03-26 Thread Vinod Kumar Vavilapalli
You can control the limit of these cache files, the default is 10GB (value of 10737418240L): Try changing local.cache.size or mapreduce.tasktracker.cache.local.size in mapred-site.xml Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Mar 25, 2013, at 5:16 PM, Jean

Re: Auto clean DistCache?

2013-03-26 Thread Vinod Kumar Vavilapalli
All the files are not opened at the same time ever, so you shouldn't see any "# of open files exceeds error". Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Mar 26, 2013, at 12:53 PM, Abdelrhman Shettia wrote: > Hi JM , > > Actually thes

Re: Umbilical interface

2013-04-22 Thread Vinod Kumar Vavilapalli
It is the same as all other communication in Hadoop: Hadoop RPC which in turn is java RPC over a TPC connection. Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Apr 22, 2013, at 8:27 AM, Rahul Bhattacharjee wrote: > Hi, > > Tasktracer runs each of the ta

Re: Automatically mapping a job submitted by a particular user to a specific hadoop map-reduce queue

2013-04-25 Thread Vinod Kumar Vavilapalli
The 'standard' way to do this is using queu-acls to enforce a particular user to be able to submit jobs to a sub-set of queues and then let the user decide which of that subset of queues he wishes to submit a job to. Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonwork

Re: Relations ship between HDFS_BYTE_READ and Map input bytes

2013-04-29 Thread Vinod Kumar Vavilapalli
They can be different if maps read HDFS files directly instead of or on top of getting key-val pairs via the map interface. HDFS_BYTES_READ will always be greater than or equal to map-input-bytes. Thanks, +Vinod On Apr 29, 2013, at 1:50 AM, Pralabh Kumar wrote: > Hi > > What's the relationsh

Re: MapReduce - FileInputFormat and Locality

2013-05-08 Thread Vinod Kumar Vavilapalli
I think you misread it. If a given split has only one block, it uses all the locations of that block. If it so happens that a given split has multiple blocks, it uses all the locations of the first block. HTH, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On May 8, 2013

Re: Distribution of native executables and data for YARN-based execution

2013-05-16 Thread Vinod Kumar Vavilapalli
es to work, the corresponding files need to be public on HDFS too. Also if the remote files on HDFS are updated, these local files will be uploaded afresh again on each node where your containers run. HTH Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On May 16, 201

Re: Distribution of native executables and data for YARN-based execution

2013-05-17 Thread Vinod Kumar Vavilapalli
acked files and we cannot depend on the underlying OS or the (un)packing tool to retain permissions. Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On May 17, 2013, at 6:35 AM, John Lilley wrote: > Thanks! This sounds exactly like what I need. PUBLIC is right. >

Re: Distribution of native executables and data for YARN-based execution

2013-05-17 Thread Vinod Kumar Vavilapalli
and shrink elastically Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On May 17, 2013, at 7:20 AM, Tim St Clair wrote: > Hi John - > > If you are doing extensive levels of non-MR C-style batch, you may be better > served to look at myriad universes

Re: Is FileSystem thread-safe?

2013-05-17 Thread Vinod Kumar Vavilapalli
As of today, there is no atomic append, so no, what you say isn't possible. FWIU, it is one appender at a time - achieved through a lease per file, and multiple concurrent leases aren't allowed for any given file. Thanks, +Vinod Kumar Vavilapalli On May 17, 2013, at 6:40 AM, John Li

Re: Memory Leak while using LocalJobRunner

2013-05-17 Thread Vinod Kumar Vavilapalli
It's inconvenient, but you can try DefaultMetricsSystem.shutDown(). Please see if that works and file a ticket to make LocalJobRunner automatically do this on finish. Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On May 17, 2013, at 7:39 AM, Subroto wrote:

Re: Is FileSystem thread-safe?

2013-05-17 Thread Vinod Kumar Vavilapalli
I see. The lots-of-part-files pattern is what most of us end up using. Thanks, +Vinod Kumar Vavilapalli On May 17, 2013, at 10:16 AM, John Lilley wrote: > Vinod, > Thanks, I was mostly asking in the context of attempting to unify the output > of multiple tasks. I’ve seen that in m

Re: What else can be built on top of YARN.

2013-05-29 Thread Vinod Kumar Vavilapalli
mitations. HTH +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On May 29, 2013, at 7:34 AM, Rahul Bhattacharjee wrote: > Hi all, > > I was going through the motivation behind Yarn. Splitting the responsibility > of JT is the major concern.Ultimately the base (Yarn) wa

Re: Compatibility of Hadoop 0.20.x and hadoop 1.0.3

2013-06-12 Thread Vinod Kumar Vavilapalli
eases in 1.x line. Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Jun 12, 2013, at 8:34 PM, Lin Yang wrote: > Hi, all, > > I was wondering could an application written with hadoop 0.20.3 API run on a > hadoop 1.0.3 cluster? > > If not, is th

Re: How are 'PHYSICAL_MEMORY_BYTES' and 'VIRTUAL_MEMORY_BYTES' calculated?

2013-07-12 Thread Vinod Kumar Vavilapalli
They are the running metrics. While the task is running, they will tell you how much pmem/vmem it is using at that point of time. Obviously at the end of job, it will be the last snapshot. Thanks, +Vinod On Jul 12, 2013, at 6:47 AM, Shahab Yunus wrote: > I think they are cumulative but per tas

Re: How are 'PHYSICAL_MEMORY_BYTES' and 'VIRTUAL_MEMORY_BYTES' calculated?

2013-07-12 Thread Vinod Kumar Vavilapalli
No, every so often, 3 seconds IIRC, it capture pmem and vmem which corresponds to the usage of the process and its children at *that* specific point of time. Cumulative = cumulative across the process and its children. Thanks, +Vinod On Jul 12, 2013, at 1:47 PM, hadoop qi wrote: > Thanks for

Re: How are 'PHYSICAL_MEMORY_BYTES' and 'VIRTUAL_MEMORY_BYTES' calculated?

2013-07-12 Thread Vinod Kumar Vavilapalli
usage. Is that means at the end of > the program it still hold several G memory? I am still confused. > > Regards, > Qi > > > On Fri, Jul 12, 2013 at 1:51 PM, Shahab Yunus wrote: > As Vinod Kumar Vavilapalli they are indeed snapshots in point and time. So > they

Re: Only log.index

2013-07-23 Thread Vinod Kumar Vavilapalli
It could either mean that all those task-attempts are crashing before the process itself is getting spawned (check TT logs) or those logs are getting deleted after the fact. Suspect the earlier. Thanks, +Vinod On Jul 23, 2013, at 9:33 AM, Ajay Srivastava wrote: > Hi, > > I see that most of t

Re: Only log.index

2013-07-23 Thread Vinod Kumar Vavilapalli
ing some kind of optimization ? What is purpose of log.index ? > > > Regards, > Ajay Srivastava > > > On 24-Jul-2013, at 11:09 AM, Vinod Kumar Vavilapalli wrote: > >> >> It could either mean that all those task-attempts are crashing before the >> process it

Re: Machine hangs from time to time

2013-08-14 Thread Vinod Kumar Vavilapalli
How many map/reduce slots are you running per TT? How much memory is available per node? Did you enable memory management? - See http://hadoop.apache.org/docs/stable/cluster_setup.html#Memory+monitoring Thanks, +Vinod On Aug 14, 2013, at 6:34 PM, Chun-fan Ivan Liao wrote: > Hi, > > We are u

Re: yarn-site.xml and aux-services

2013-08-22 Thread Vinod Kumar Vavilapalli
Auxiliary services are essentially administer-configured services. So, they have to be set up at install time - before NM is started. +Vinod On Thu, Aug 22, 2013 at 1:38 PM, John Lilley wrote: > Following up on this, how exactly does one *install* the jar(s) for > auxiliary service? Can it be

Re: [yarn] job is not getting assigned

2013-08-29 Thread Vinod Kumar Vavilapalli
This usually means there are no available resources as seen by the ResourceManager. Do you see "Active Nodes" on the RM web UI first page? If not, you'll have to check the NodeManager logs to see if they crashed for some reason. Thanks, +Vinod Kumar Vavilapalli Horton

Re: Hadoop Yarn

2013-08-29 Thread Vinod Kumar Vavilapalli
You'll have to change the MapReduce code. What options are you exactly looking for and why should they be only applied on some nodes? Some kind of sampling? More details can help us help you. Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Aug 29, 2013, at 1:

Re: Job status shows 0's for counters

2013-09-03 Thread Vinod Kumar Vavilapalli
We've observed this internally too. Shinichi, tx for the patch. Will follow up on JIRA to get it committed. Thanks, +Vinod On Sep 3, 2013, at 11:35 AM, Shinichi Yamashita wrote: > Hi, > I reported this issue in MAPREDUCE-5376 > (https://issues.apache.org/jira/browse/MAPREDUCE-5376) and attache

Re: Setting user in yarn in 2.1.0

2013-09-11 Thread Vinod Kumar Vavilapalli
uns as the user running YARN. In secure case, it will run as the app-submitter. Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Sep 11, 2013, at 10:17 AM, Albert Shau wrote: > In 2.1.0, the method to set user in the ApplicationSubmissionContext and > ContainerL

Re: assign tasks to specific nodes

2013-09-11 Thread Vinod Kumar Vavilapalli
dling issues. In Hadoop 2 YARN, the platform does expose this functionality. But MapReduce framework doesn't yet expose this functionality to the end users. What exactly is your use case? Why are some nodes of higher priority than others? Thanks, +Vinod Kumar Vavilapalli Hortonworks

Re: chaining (the output of) jobs/ reducers

2013-09-13 Thread Vinod Kumar Vavilapalli
Other than the short term solutions that others have proposed, Apache Tez solves this exact problem. It can M-M-R-R-R chains, and mult-way mappers and reducers, and your own custom processors - all without persisting the intermediate outputs to HDFS. It works on top of YARN, though the first r

Re: Resource limits with Hadoop and JVM

2013-09-16 Thread Vinod Kumar Vavilapalli
resource requirements, and TTs enforce those limits. HTH +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Sep 16, 2013, at 1:35 PM, Forrest Aldrich wrote: > We recently experienced a couple of situations that brought one or more > Hadoop nodes down (unresponsive). O

Re: Tasktracker Permission Issue?

2013-09-18 Thread Vinod Kumar Vavilapalli
all of /a, /a/b, /a/b/c etc need to be executable by everyone - an executable permission is needed in a linux dir for someone to be able to create files/dir in some of the sub-directories. Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Sep 18, 2013, at 7:26 AM

Re: Hook for Mapper kill

2013-10-18 Thread Vinod Kumar Vavilapalli
There isn't anything in the API as such. You could register your own JVM shut-down hook which does it. OTOH, if you are running this on Linux and a setsid binary is available, Hadoop itself will take care of killing these additional processes - it kills the whole session in this case. Thanks,

Re: only one map or reduce job per time on one node

2013-11-05 Thread Vinod Kumar Vavilapalli
Why do you want to do this? +Vinod On Nov 5, 2013, at 9:17 AM, John wrote: > Is it possible to force the jobtracker executing only 2 map jobs or 1 reduce > job per time? -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is address

Re: Error while running Hadoop Source Code

2013-11-05 Thread Vinod Kumar Vavilapalli
It seems like your pipes mapper is exiting before consuming all the input. Did you check the task-logs on the web UI? Thanks, +Vinod On Nov 5, 2013, at 7:25 AM, Basu,Indrashish wrote: > > Hi, > > Can anyone kindly assist on this ? > > Regards, > Indrashish > > > On Mon, 04 Nov 2013 10:23:2

Re: Error while running Hadoop Source Code

2013-11-06 Thread Vinod Kumar Vavilapalli
_3 >> 2013-11-06 06:40:14,920 INFO org.apache.hadoop.mapred.TaskTracker: Received >> KillTaskAction for task: attempt_201311060636_0001_m_01_3 >> 2013-11-06 06:40:15,161 INFO org.apache.hadoop.mapred.JvmManager: In >> JvmRunner constructed JVM ID: jvm_201311060636_0001

Re: Hadoop 2.2.0: Cannot run PI in under YARN

2013-11-08 Thread Vinod Kumar Vavilapalli
This is just a symptom not the root cause. Please check the YARN web UI at 8088 on ResourceManager machine and browse to the application page. It should give you more details. Thanks, +Vinod On Nov 8, 2013, at 8:57 AM, Ping Luo wrote: > java.io.FileNotFoundException: File does not exist --

Re: Time taken for starting AMRMClientAsync

2013-11-17 Thread Vinod Kumar Vavilapalli
It is just creating a connection to RM and shouldn't take that long. Can you please file a ticket so that we can look at it? JVM class loading overhead is one possibility but 1 sec is a bit too much. Thanks, +Vinod On Oct 21, 2013, at 7:16 AM, Krishna Kishore Bonagiri wrote: > Hi, > I am see

Re: Client mapred tries to renew a token with renewer specified as nobody

2013-12-04 Thread Vinod Kumar Vavilapalli
It is clearly mentioning that the renewer is wrong (renewer marked is 'nobody' but mapred is trying to renew the token), you may want to check this. Thanks, +Vinod On Dec 2, 2013, at 8:25 AM, Rainer Toebbicke wrote: > 2013-12-02 15:57:08,541 ERROR > org.apache.hadoop.security.UserGroupInforma

Re: issue about the MR JOB local dir

2013-12-04 Thread Vinod Kumar Vavilapalli
These are the directories where NodeManager (as configured) will store its local files. Local files includes scripts, jars, libraries - all files sent to nodes via DistributedCache. Thanks, +Vinod On Dec 3, 2013, at 5:26 PM, ch huang wrote: > hi,maillist: > i see three dirs on my l

Re: issue about capacity scheduler

2013-12-04 Thread Vinod Kumar Vavilapalli
If both the jobs in the MR queue are from the same user, CapacityScheduler will only try to run them one after another. If possible, run them as different users. At which point, you will see sharing across jobs because they are from different users. Thanks, +Vinod On Dec 4, 2013, at 1:33 AM,

Re: Writing to remote HDFS using C# on Windows

2013-12-05 Thread Vinod Kumar Vavilapalli
You can try using WebHDFS. Thanks, +Vinod On Thu, Dec 5, 2013 at 6:04 PM, Fengyun RAO wrote: > Hi, All > > Is there a way to write files into remote HDFS on Linux using C# on > Windows? We want to use HDFS as data storage. > > We know there is HDFS java API, but not C#. We tried SAMBA for file

Re: Container [pid=22885,containerID=container_1386156666044_0001_01_000013] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 332.5 GB of 8 GB virtual memo

2013-12-05 Thread Vinod Kumar Vavilapalli
Something looks really bad on your cluster. The JVM's heap size is 200MB but its virtual memory has ballooned to a monstrous 332GB. Does that ring any bell? Can you run regular java applications on this node? This doesn't seem related to YARN per-se. +Vinod Hortonworks Inc. http://hortonworks.com/

Re: unsubscribe

2012-08-23 Thread Vinod Kumar Vavilapalli
Please see http://hadoop.apache.org/common/mailing_lists.html. You should send an email to user-unsubscr...@hadoop.apache.org to unsubscribe. HTH, +Vinod On Aug 23, 2012, at 5:43 AM, sathyavageeswaran wrote: > Once in hadoop, no free exit > > From: msridha...@inautix.co.in [mailto:msridha...

Re: Delays in worker node jobs

2012-08-29 Thread Vinod Kumar Vavilapalli
Do you know if you have enough job-load on the system? One way to look at this is to look for running map/reduce tasks on the JT UI at the same time you are looking at the node's cpu usage. Collecting hadoop metrics via a metrics collection system say ganglia will let you match up the timestam

Re: Questions with regard to scheduling of map and reduce tasks

2012-08-30 Thread Vinod Kumar Vavilapalli
n MR ApplicationMaster is smart and the scheduling isn't random. It runs maps first, and slowly ramps up reduces as maps finish. HTH +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/

Re: Questions with regard to scheduling of map and reduce tasks

2012-08-30 Thread Vinod Kumar Vavilapalli
o and capacity-schedulers which got fixed (not sure of fixed-version). We've tested Capacity-scheduler a lot more if you pick up the latest version - 0.23.2/branch-0.23 HTH +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ signature.asc Description: Message signed with OpenPGP using GPGMail

Re: Questions with regard to scheduling of map and reduce tasks

2012-08-31 Thread Vinod Kumar Vavilapalli
MRAppMaster code: JobImpl.InitTransition for how TAs are created with host information. HTH, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Aug 31, 2012, at 4:17 AM, Vasco Visser wrote: > Thanks again for the reply, it is becoming clear. > > While on the subject of

Re: Yarn defaults for local directories

2012-09-04 Thread Vinod Kumar Vavilapalli
> . I don't seem to be able to add you as a CC, so feel free to add > yourself. Added. Thanks, +Vinod

Re: Hadoop-Windows-Installation

2012-09-05 Thread Vinod Kumar Vavilapalli
This is really useful! If you are okay with contributing this to apache, we can add it as part of the site documentation. A ticket can be created at https://issues.apache.org/jira/browse/HADOOP and we can take it from there. Thanks, +Vinod On Sep 5, 2012, at 12:10 AM, Visioner Sadak wrote: >

Re: build failure - trying to build hadoop trunk checkout

2012-09-05 Thread Vinod Kumar Vavilapalli
This is happening because SecurityUtil.getServerPrincipal(..) eventually is converting host-name to lowercase. Please file a bug here: https://issues.apache.org/jira/browse/HADOOP The shorter-term workaround is to change your hostname to all lower-case, if that is possible. HTH +Vinod Kumar

Re: build failure - trying to build hadoop trunk checkout

2012-09-06 Thread Vinod Kumar Vavilapalli
Never mind filing. I recalled that we debugged this issue long time back and cornered this down to problems with kerberos. See https://issues.apache.org/jira/browse/HADOOP-7988. Given that, Tony, changing your hostname seems to be the only option. Thanks, +Vinod On Sep 6, 2012, at 4:24 AM, Ste

Re: Is there a way to get notificaiton when the job is failed?

2012-09-06 Thread Vinod Kumar Vavilapalli
We can reopen it if we wish it to be fixed in 1.0. Thanks, +Vinod On Sep 6, 2012, at 4:49 AM, Rajiv Chittajallu wrote: > Notifications are sequential and doesn't have timeouts - MAPREDUCE-1688 Not > sure why its closed as dupe of an yarn feature. > > > (hey hemanth, welcome back..) > >> ___

Re: build failure - trying to build hadoop trunk checkout

2012-09-10 Thread Vinod Kumar Vavilapalli
They must resolve to the same hostname on the client and server by their corresponding DNS. And by convention, hostnames are case insensitive. What we observed is that the Kerberos client and server disagree when it comes to hostnames with upper case alphabet. HTH Thanks, +Vinod Kumar Va

Re: Can't run PI example on hadoop 0.23.1

2012-09-10 Thread Vinod Kumar Vavilapalli
The AM corresponding to your MR job is failing continuously. Can you check the container logs for your AM ? They should be in ${yarn.nodemanager.log-dirs}/${application-id}/container_[0-9]*_0001_01_01/stderr Thanks, +Vinod On Sep 10, 2012, at 3:19 PM, Smarty Juice wrote: > Hello Champions

Re: Run mr example wordcount error on hadoop-2.0.1 alpha HA

2012-09-14 Thread Vinod Kumar Vavilapalli
The console output as well as the AM logs show that the TaskAttempts are failing because they are out of heap-space. See your configuration below, you should bump them up. HTH, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Sep 14, 2012, at 6:15 AM, Tao wrote

Re: possible resource leak in capacity scheduler

2012-10-15 Thread Vinod Kumar Vavilapalli
Which version are you running? Can you enabled debug logging for RM and see what's happening? Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Oct 15, 2012, at 2:44 AM, Radim Kolar wrote: > i have simple 2 node cluster. one node with 2 GB and second with

Re: Suitability of HDFS for live file store

2012-10-15 Thread Vinod Kumar Vavilapalli
For your original use case, HDFS indeed sounded like an overkill. But once you start thinking of thumbnail generation, PDFs etc, MapReduce obviously fits the bill. If you wish to do stuff like streaming the stored digital films, clearly, you may want to move your serving somewhere else that wo

Re: DFS respond very slow

2012-10-15 Thread Vinod Kumar Vavilapalli
Try picking up a single operation say "hadoop dfs -ls" and start profiling. - Time the client JVM is taking to start. Enable debug logging on the client side by exporting HADOOP_ROOT_LOGGER=DEBUG,CONSOLE - Time between the client starting and the namenode audit logs showing the read request. Al

Re: DFS respond very slow

2012-10-15 Thread Vinod Kumar Vavilapalli
+Vinod On Oct 15, 2012, at 5:22 PM, Vinod Kumar Vavilapalli wrote: > Try picking up a single operation say "hadoop dfs -ls" and start profiling. > - Time the client JVM is taking to start. Enable debug logging on the client > side by exporting HADOOP_ROOT_LOGGER=DEBUG,CONSOLE

Re: GroupingComparator

2012-10-16 Thread Vinod Kumar Vavilapalli
On Oct 15, 2012, at 12:27 PM, Dave Beech wrote: > This only happens in the new "mapreduce" API - in the older "mapred" > API you get the first key, and it appears to stay the same during the > loop. > > It's sometimes useful behaviour, but it's confusing how the two APIs > don't act the same. Y

Re: speculative execution in yarn

2012-10-17 Thread Vinod Kumar Vavilapalli
r and leave the responsibility of cross-job fairness to the RM and the scheduler. In sum, to answer your question, speculative execution will be triggered only based on task run times, and irrespective of other jobs. Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On O

Re: Job stuck in attempt loop on LocalJobRunner, produces no errors

2012-10-22 Thread Vinod Kumar Vavilapalli
I don't see multiple attempts for a single task in your log, it is one attempt for each task. You should check how many maps your job is resulting into. Multiple attempts have IDs like attempt_local_0001_m_04_1. Thanks, +Vinod On Oct 22, 2012, at 7:05 AM, Bai Shen wrote: > attempt_local_

Re: Java heap space error

2012-10-22 Thread Vinod Kumar Vavilapalli
Did this job ever run successfully for you? With 200m heap size? Seems like your maps are failing. Can you paste your settings for the following: - io.sort.factor - io.sort.mb - mapreduce.map.sort.spill.percent Thanks, +Vinod On Oct 21, 2012, at 6:18 AM, Subash D'Souza wrote: > I'm running

Re: task trackers

2012-10-22 Thread Vinod Kumar Vavilapalli
Even JT accepts exclude/include hosts files for the slaves. If they aren't populated, JT by default accepts TTs that successfully register with it at runtime. Thanks, +Vinod On Oct 22, 2012, at 7:55 AM, Kartashov, Andy wrote: > Gentlemen, > > Can you please explain which property is responsib

Re: Job running on YARN gets automatically killed after 10-12 minutes

2012-11-05 Thread Vinod Kumar Vavilapalli
. This you should do irrespective of whether you have any new container requests or not. The default liveliness interval is 10 mins, so you are seeing that your app is getting killed roughly after that much time. HTH, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Nov 5, 2012

Re: debugging hadoop streaming programs (first code)

2012-11-20 Thread Vinod Kumar Vavilapalli
The mapreduce webUI gives you all the information you need for debugging you code. Depending on where your JobTracker is, you should go hit $JT_HOST_NAME:50030. And check the job link as well the task, taskattempt and logs pages. HTH +Vinod Kumar Vavilapalli Hortonworks Inc. http

Re: Start time, end time, and task tracker of individual tasks of a job

2012-11-20 Thread Vinod Kumar Vavilapalli
Most of this information is already available in the JobHistory files. And there are parsers to read from these files. HTH +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Nov 20, 2012, at 8:13 AM, Jeff LI wrote: > Hello, > > Is there a way to obtain the infor

Re: hadoop multinode with 2 nodes error starting datanode on slave

2012-11-26 Thread Vinod Kumar Vavilapalli
Look at your TaskTracker and DataNode logs. Going by the instructions on the page you linked, they should be in /usr/local/hadoop/logs. +Vinod On Nov 26, 2012, at 5:07 AM, Kshitij Jhamb wrote: > hey hadoop users i need some help for running multi node hadoop > I'm following > > http://www.mic

Re: hadoop task assigner

2012-12-07 Thread Vinod Kumar Vavilapalli
In 1.0 line, you should look at JobTracker side, specifically JobInProgress.java. Thanks, +Vinod On Dec 7, 2012, at 8:28 AM, Jay Vyas wrote: > Hi : Where are the hooks in hadoop for the implementation of locality? I > assume that Mappers are blind to locality - they read directly via their

Re: "attempt*" directories in user logs

2012-12-10 Thread Vinod Kumar Vavilapalli
-default.xml - you should explicitly set it to zero if you don't want reducers. By master, I suppose you mean JobTracker. JobTracker doesn't show all the attempts for a given Task, you should navigate to per-task page to see that. Thanks, +Vinod Kumar Vavilapalli Hortonworks

Re: Stop at CallObjectMethod when daemon running

2012-12-10 Thread Vinod Kumar Vavilapalli
Not familiar with your apr stuff, but you should capture getJobStatus() method instead of getAllJobs(). getJobStatus() is what is called for individual jobs, getAllJobs() is called only when you try to list jobs. Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Dec

Re: Yarn question: All tasks fail without useful diagnose info when running Yarn

2012-12-17 Thread Vinod Kumar Vavilapalli
Can you check AM logs at http://centos-3.localdomain:8088/proxy/application_1354775970476_0001/ ? The ApplicationMaster's syslog/stderr can tell you what happened. Most likely a setup issue. Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Dec 13, 2012, at 1:

Re: Selecting a task for the tasktracker

2012-12-27 Thread Vinod Kumar Vavilapalli
On top of that, the message indicates that you need to have your scheduler class in the mapred package. Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Dec 27, 2012, at 7:38 AM, Hemanth Yamijala wrote: > Hi, > > Firstly, I am talking about Hadoop 1.0. Pl

Re: more reduce tasks

2013-01-03 Thread Vinod Kumar Vavilapalli
Is it that you want the parallelism but a single final output? Assuming your first job's reducers generate a small output, another stage is the way to go. If not, second stage won't help. What exactly are your objectives? Thanks, +Vinod On Jan 3, 2013, at 1:11 PM, Pavel Hančar wrote: > Hell

Re: new to hadoop and first question

2013-01-07 Thread Vinod Kumar Vavilapalli
You are pointing your JobClient to the Namenode. Check your mapred.job.tracker address and make sure it points to the correct JobTracker node. Also 0.14 is very very old. Please use 1.* releases. HTH, +Vinod On Mon, Jan 7, 2013 at 1:11 AM, Prabu wrote: > Jim the Standing Bear writes: > > > >

Re: Capacity Scheduler questions

2013-01-07 Thread Vinod Kumar Vavilapalli
> We would like to configure the equivalent of Fair Scheduler > userMaxJobsDefault = 1 (i.e. we would like to limit a user to a single job > in the cluster). > > > > · By default the Capacity Scheduler allows multiple jobs from a > single user to run concurrently. > > > > · From > h

Re: getExitStatus() in org.apache.hadoop.yarn.api.records

2013-01-09 Thread Vinod Kumar Vavilapalli
I suppose you mean ContainerStatus.getExitStatus(). This is the exit status of the process corresponding to the container if the process actually ran and exited. In case the container didn't even run physically, it could refer to special exit codes like you listed. In sum, - Normal exit status = e

Re: JobCache directory cleanup

2013-01-10 Thread Vinod Kumar Vavilapalli
Can you check the job configuration for these ~100 jobs? Do they have keep.failed.task.files set to true? If so, these files won't be deleted. If it doesn't, it could be a bug. Sharing your configs for these jobs will definitely help. Thanks, +Vinod On Wed, Jan 9, 2013 at 6:41 AM, Ivan Tretyako

Re: Issue in Apache Hadoop Documentation

2013-01-10 Thread Vinod Kumar Vavilapalli
Oh, and user@ is the correct mailing list. +Vinod On Thu, Jan 10, 2013 at 9:32 AM, Vinod Kumar Vavilapalli < vino...@hortonworks.com> wrote: > Great catch, it's a shame it still exists in 1.* stable releases! Can you > please file a ticket and fix it, thanks! > > +Vin

Re: core-site.xml file is being ignored by new Configuration()

2013-01-28 Thread Vinod Kumar Vavilapalli
Can you try prepending the path with file:// or pass in a URL and use the method which takes in a URL? I remember class loading issues with paths without the file: scheme info from a different scenario (while playing with logj4 config files). HTH, +Vinod Kumar Vavilapalli Hortonworks Inc. http

Re: number of mapper tasks

2013-01-28 Thread Vinod Kumar Vavilapalli
directly. W.r.t your custom inputformat, are you sure you job is using this InputFormat and not the default one? HTH, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Jan 28, 2013, at 12:56 PM, Marcelo Elias Del Valle wrote: > Just to complement the last question, I h

Re: warnings from log4j -- where to look to resolve?

2013-01-28 Thread Vinod Kumar Vavilapalli
Try passing -Dlog4j.debug via mapred.child.java.opts. That will clearly show you the problem as part of logs itself. Are these errors appearing in stdout/stderr or syslog? +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Jan 28, 2013, at 2:36 AM, Calvin wrote: >

Re: jobTracker Blacklisted Nodes

2013-01-28 Thread Vinod Kumar Vavilapalli
failed for whatever reason. It is used to make sure that more tasks of this job don't get launched on the same node. Cluster level blacklisting accounts for cumulative failures on nodes across all jobs. Used to heuristically determine generic node issues independent of the jobs. Thanks, +Vinod Ku

Re: number of mapper tasks

2013-01-29 Thread Vinod Kumar Vavilapalli
e these other parameters and get >> the splits by the number of lines. The amount of lines per map can be >> controlled by the same parameter used in NLineInputFormat: >> >> public static final String LINES_PER_MAP = >> "mapreduce.input.lineinputformat.linespermap"; >> However, i

Re: Initial Permission Settings

2013-01-29 Thread Vinod Kumar Vavilapalli
Please check your dfs umask (dfs.umask configuration property). HTH, +Vinod On Tue, Jan 29, 2013 at 12:02 PM, Serge Blazhiyevskyy < serge.blazhiyevs...@nice.com> wrote: > Hi all, > > Quick question about hadoop dfs -put local_file hdfs_file command > > > It seems that regardless of permissions

Re: Issue with Reduce Side join using datajoin package

2013-01-29 Thread Vinod Kumar Vavilapalli
Seems like a bug in your code, can you share the source here? +Vinod On Tue, Jan 29, 2013 at 4:00 AM, Vikas Jadhav wrote: > I am using Hadoop 1.0.3 > > I am getting following Error > > > 13/01/29 06:55:19 INFO mapred.JobClient: Task Id : > attempt_201301290120_0006_r_00_0, Status : FAILED >

Re: Hadoop-Yarn-MR reading InputSplits and processing them by the RecordReader, architecture/design question.

2013-02-01 Thread Vinod Kumar Vavilapalli
You got that mostly right. And it doesn't differ much in Hadoop 1.* either. With MR AM doing the work that was earlier done in JobTracker., the JobClient and the task side doesn't change much. FileInputFormat.getsplits() is called by client itself, so you should look for logs on the client machine

Re: SequenceFile.createWriter - throws FileNotFoundException

2013-02-01 Thread Vinod Kumar Vavilapalli
As it clearly says, check the file permissions of your input directory (* /hama/input/center/*). Also whether you want the input on local file-sytem or DFS. +Vinod On Fri, Feb 1, 2013 at 12:59 AM, Anbarasan Murthy wrote: > The following line in KMeansBSP.java throws the FileNotFoundException **

Re: How does Kerberos work with Hadoop ?

2013-02-21 Thread Vinod Kumar Vavilapalli
You should read the hadoop security design doc which you can find at https://issues.apache.org/jira/browse/HADOOP-4487 HTH, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Feb 21, 2013, at 11:02 AM, rohit sarewar wrote: > > I am looking for an explanation of Ke

Re: Regarding: Merging two hadoop clusters

2013-03-13 Thread Vinod Kumar Vavilapalli
Copy data into one of the clusters using distcp *without* downtime (assuming you have enough capacity) and then merge the clusters? Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Mar 13, 2013, at 9:38 PM, Shashank Agarwal wrote: > Hey Guys, > > I

Re: Why hadoop is spawing two map over file size 1.5 KB ?

2013-03-16 Thread Vinod Kumar Vavilapalli
What's your input-format? Thanks, +Vinod Kumar Vavilapalli On Mar 12, 2013, at 3:27 AM, samir das mohapatra wrote: > Hi All, > I have very fundamental doubt, I have file having size 1.5KB and block size > is default block size, But i could see two mapper it got creted dur

Re: Unsubscribe Please

2013-12-12 Thread Vinod Kumar Vavilapalli
You should send an email to user-unsubscr...@hadoop.apache.org. Thanks, +Vinod On Dec 12, 2013, at 8:36 AM, K. M. Rakibul Islam wrote: > Unsubscribe Please! > > Thanks. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is address

Re: Yarn -- one of the daemons getting killed

2013-12-12 Thread Vinod Kumar Vavilapalli
Is all of this on a single node? Thanks, +Vinod On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri wrote: > Hi, > I am running a small application on YARN (2.2.0) in a loop of 500 times, > and while doing so one of the daemons, node manager, resource manager, or > data node is getting k

Re: Yarn -- one of the daemons getting killed

2013-12-13 Thread Vinod Kumar Vavilapalli
cate > some excessive memory usage by them or something like that, that is causing > them die? If so, how can we resolve this kind of issue? > > Thanks, > Kishore > > > On Fri, Dec 13, 2013 at 10:16 AM, Krishna Kishore Bonagiri > wrote: > No, I am running on 2 n

Re: pipes on hadoop 2.2.0 crashes

2013-12-13 Thread Vinod Kumar Vavilapalli
Could it just be LocalJobRunner? Can you try it on a cluster? We've tested pipes on clusters, so will be surprised if it doesn't work there. Thanks, +Vinod On Dec 13, 2013, at 7:44 AM, Mauro Del Rio wrote: > Hi, I tried to run a simple test with pipes, but it crashes. > > java.lang.Exception

Re: how to create symbolic link in hdfs with c++ code or webhdfs interface?

2013-12-13 Thread Vinod Kumar Vavilapalli
What version of Hadoop? Thanks, +Vinod On Dec 13, 2013, at 1:57 AM, Xiaobin She wrote: > I'm writting an c++ programme, and I need to deal with hdfs. > > What I need is to create some file in hdfs and read the status of these files. > > And I need to be able to create sym link in hdfs and nee

Re: issue about no class find in running MR job

2013-12-13 Thread Vinod Kumar Vavilapalli
That is not the correct usage. You should do "hadoop jar ". Or if you are adventurous, directly invoke your class using java and setting appropriate classpath. Thanks, +Vinod On Dec 12, 2013, at 6:11 PM, ch huang wrote: > hadoop ../test/WordCount -- CONFIDENTIALITY NOTICE NOTICE: This mes

Re: Pluggable distribute cache impl

2013-12-16 Thread Vinod Kumar Vavilapalli
If the files are already on a NFS mount, you don't need to spread files around distributed-cache? BTW, running jobs on NFS mounts isn't going to scale after a while. Thanks, +Vinod On Dec 15, 2013, at 1:15 PM, Jay Vyas wrote: > are there any ways to plug in an alternate distributed cache imp

Re: pipes on hadoop 2.2.0 crashes

2013-12-16 Thread Vinod Kumar Vavilapalli
You should navigate to the ResourceManager UI following the link and see what is happening on the ResourceManager as well as the application-master. Check if any nodes are active first. Then look at ResourceManager and NodeManager logs. +Vinod On Dec 16, 2013, at 10:29 AM, Mauro Del Rio wrote:

Re: Yarn -- one of the daemons getting killed

2013-12-17 Thread Vinod Kumar Vavilapalli
ve "*" as the node name. > > One other thing I suspected was the allowed number of user processes, I > increased that to 31000 from 1024 but that also didn't help. > > Thanks, > Kishore > > > On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapal

Re: Could not find the main class: org.apache.hadoop.mapreduce.v2.app.MRAppMaster

2013-12-23 Thread Vinod Kumar Vavilapalli
Seems like the hadoop common jar is missing, can you check if one of the directories listed in the CLASSPATH has the hadoop-common jar? Thanks, +Vinod On Dec 22, 2013, at 10:27 PM, Hadoop Dev wrote: > Hi All, > I am trying to execute first ever program (Word Count) in hadoop2.2.0 on > Windows

Re: Unable to change the virtual memory to be more than the default 2.1 GB

2014-01-02 Thread Vinod Kumar Vavilapalli
You need to change the application configuration itself to tell YARN that each task needs more than the default. I see that this is a mapreduce app, so you have to change the per-application configuration: mapreduce.map.memory.mb and mapreduce.reduce.memory.mb in either mapred-site.xml or via th

  1   2   >