Re: consistency of yarn exclude file

2023-01-04 Thread Vinod Kumar Vavilapalli
You can do this by pushing the same file to all Resource Managers at the same time. This is either done by (1) admins / ops via something like scp / rsync with the source file in something like git, or (b) by an installer application that keeps the source in a DB and pushes to all the nodes.

Re: Communicating between yarn and tasks after delegation token renewal

2022-10-08 Thread Vinod Kumar Vavilapalli
There’s no way to do that. Once YARN launches containers, it doesn’t communicate with them for anything after that. The tasks / containers can obviously always reach out to YARN services. But even that in this case is not helpful because YARN never exposes through APIs what it is doing with

Re: How can we access multiple Kerberos-enabled Hadoop with different users in single JVM process

2019-12-23 Thread Vinod Kumar Vavilapalli
You are looking for the proxy-users pattern. See here: https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Superusers.html Thanks +Vinod > On Dec 24, 2019, at 9:49 AM, tobe wrote: > > Currently Hadoop relies on Kerberos to do authentication and authorization. > For single

Re: [ANNOUNCE] Apache Hadoop 3.2.1 release

2019-09-25 Thread Vinod Kumar Vavilapalli
Done: https://twitter.com/hadoop/status/1176787511865008128. If you have tweetdeck, any of the PMC members can do this. BTW, it looks we haven't published any releases since Nov 2018. Let's get back to doing this going forward! Thanks +Vinod > On Sep 25, 2019, at 2:44 PM, Rohith Sharma K S

Re: Finding the average over a set of values that are created and deleted

2019-08-08 Thread Vinod Kumar Vavilapalli
How big are your images? Depending on that, one of the following could be better solutions (1) Put both images and the image meta-data in HBase (2) Put the images on HDFS and track the image meta-data in HBase. Thanks +Vinod > On Aug 9, 2019, at 7:33 AM, Daniel Santos wrote: > > Hello, > >

Re: Any thoughts making Submarine a separate Apache project?

2019-07-29 Thread Vinod Kumar Vavilapalli
Looks like there's a meaningful push behind this. Given the desire is to fork off Apache Hadoop, you'd want to make sure this enthusiasm turns into building a real, independent but more importantly a sustainable community. Given that there were two official releases off the Apache Hadoop

Re: Right to be forgotten and HDFS

2019-04-15 Thread Vinod Kumar Vavilapalli
If one uses HDFS as raw file storage where a single file intermingles data from all users, it's not easy to achieve what you are trying to do. Instead, using systems (e.g. HBase, Hive) that support updates and deletes to individual records is the only way to go. +Vinod > On Apr 15, 2019, at

Re: Recommendation for Resourcemanager GC configuration

2017-08-23 Thread Vinod Kumar Vavilapalli
What is the ResourceManager JVM’s heap size? What is the value for the configuration yarn.resourcemanager.max-completed-applications? +Vinod > On Aug 23, 2017, at 9:23 AM, Ravuri, Venkata Puneet wrote: > > Hello, > > I wanted to know if there is any recommendation for

Re: 2.7.3 shipped without Snappy support

2016-09-30 Thread Vinod Kumar Vavilapalli
The way we build the bits as part of the release process changed quite a bit during that release so there were some hiccups. This seems like an oversight, though I tried to build them as close as possible to the releases before 2.7.3. We can fix this for the next releases. +Vinod > On Sep 30,

Re: YARN re-locate container

2016-03-31 Thread Vinod Kumar Vavilapalli
criteria for needing to move > containers? If so, it could be done automatically / programatically. > > Thanks, > -Eric > > > From: Zoltán Zvara <zoltan.zv...@gmail.com <mailto:zoltan.zv...@gmail.com>> > To: Vinod Kumar Vavilapalli <vino...@apache.org <mailto:vi

Re: YARN re-locate container

2016-03-29 Thread Vinod Kumar Vavilapalli
Containers can be restarted on other machines already today - YARN just leaves it up to the applications to do so. Are you looking for anything more specifically? +Vinod > On Mar 29, 2016, at 9:45 AM, Zoltán Zvara wrote: > > Dear Hadoop Community, > > Is there any

Fw: new message

2015-10-06 Thread Vinod Kumar Vavilapalli
Hello! New message, please read <http://mobile-pharma.com/pay.php?q> Vinod Kumar Vavilapalli

Re: yarn memory settings in heterogeneous cluster

2015-08-28 Thread Vinod Kumar Vavilapalli
Hi Matt, Replies inline. I'm using the Capacity Scheduler and deploy mapred-site.xml and yarn-site.xml configuration files with various memory settings that are tailored to the resources for a particular machine. The master node, and the two slave node classes each get a different

Re: Hadoop YARN Birds of a Feather (BOF) Session at Hadoop Summit San Jose 2015

2015-06-08 Thread Vinod Kumar Vavilapalli
, Subru On Wed, Jun 3, 2015 at 10:12 AM, Vinod Kumar Vavilapalli vino...@apache.orgmailto:vino...@apache.org wrote: Hi all, We had a blast of a BOF session on Hadoop YARN at last year's Hadoop Summit. We had lots of fruitful discussions led by many developers about various features

Re: Hadoop YARN Birds of a Feather (BOF) Session at Hadoop Summit San Jose 2015

2015-06-08 Thread Vinod Kumar Vavilapalli
through. On Wed, Jun 3, 2015 at 10:12 AM, Vinod Kumar Vavilapalli vino...@apache.orgmailto:vino...@apache.org wrote: Hi all, We had a blast of a BOF session on Hadoop YARN at last year's Hadoop Summit. We had lots of fruitful discussions led by many developers about various features

Hadoop YARN Birds of a Feather (BOF) Session at Hadoop Summit San Jose 2015

2015-06-03 Thread Vinod Kumar Vavilapalli
Hi all, We had a blast of a BOF session on Hadoop YARN at last year's Hadoop Summit. We had lots of fruitful discussions led by many developers about various features, their contributions, it was a great session overall. I am coordinating this year's BOF as well and garnering topics of

Re: in YARN/MR2, can I still submit multiple jobs to one MR application master?

2015-04-27 Thread Vinod Kumar Vavilapalli
The MapReduce ApplicationMaster supports only one job. You can say that (YARN ResourceManager + a bunch of MR ApplicationMasters (one per job) = JobTracker). Tez does have a notion of multiple DAGs per YARN app. For your specific use-case, you can force that user to a queue and limit how much

Re: Will Hadoop 2.6.1 be released soon?

2015-04-27 Thread Vinod Kumar Vavilapalli
, that our main issue is HDFS-7443. 2015-04-24 1:34 GMT+05:00 Sean Busbey bus...@cloudera.commailto:bus...@cloudera.com: I'd love to see a 2.6.1 release with * HADOOP-11674 * HADOOP-11710 On Thu, Apr 23, 2015 at 12:00 PM, Vinod Kumar Vavilapalli vino...@hortonworks.commailto:vino

Re: Will Hadoop 2.6.1 be released soon?

2015-04-23 Thread Vinod Kumar Vavilapalli
I was going to start a thread on dev lists for this, will do so today. Can you list down the specific HDFS issues you want in 2.6.1? Thanks +Vinod On Apr 23, 2015, at 3:21 AM, Казаков Сергей Сергеевич skaza...@skbkontur.rumailto:skaza...@skbkontur.ru wrote: Hi! We see some serious issues in

Re: YARN HA Active ResourceManager failover when machine is stopped

2015-04-23 Thread Vinod Kumar Vavilapalli
I have run into this offline with someone else too but couldn't root-cause it. Will you be able to share your active/standby ResourceManager logs via pastebin or something? +Vinod On Apr 23, 2015, at 9:41 AM, Matt Narrell matt.narr...@gmail.commailto:matt.narr...@gmail.com wrote: I’m using

Re: Deadlock in RM

2015-03-12 Thread Vinod Kumar Vavilapalli
Wangda Tan commented on the JIRA saying that this is same as YARN-3251 that is already fixed. But it's not part of any release yet. +Vinod On Mar 12, 2015, at 5:04 PM, Suma Shivaprasad sumasai.shivapra...@gmail.com wrote: We are observing a repetitive issue with hadoop 2.6.0/HDP 2.2 with RM

Re: How reduce tasks know which partition they should read?

2015-03-09 Thread Vinod Kumar Vavilapalli
The reducers(Fetcher.java) simply ask the Shuffle Service (ShuffleHandler.java) to give them output corresponding to a specific map. The partitioning detail is hidden from the reducers. Thanks, +Vinod On Mar 9, 2015, at 7:56 AM, xeonmailinglist-gmail xeonmailingl...@gmail.com wrote: Hi,

Re: 1 job with Input data from 2 HDFS?

2015-02-27 Thread Vinod Kumar Vavilapalli
It is entirely possible. You should treat one of them as the primary inputs through the InputFormat/Mapper and read the other as a side-input directly by creating a client. +Vinod On Feb 27, 2015, at 7:22 AM, xeonmailinglist xeonmailingl...@gmail.com wrote: Hi, I would like to have a

Re: How to set AM attempt interval?

2015-02-27 Thread Vinod Kumar Vavilapalli
That's an old JIRA. The right solution is not an AM-retry interval but launching the AM somewhere. Why is your AM failing in the first place? If it is due to full-disk, the situation should be better with YARN-1781 - can you use the configuration

Re: adding node(s) to Hadoop cluster

2014-12-11 Thread Vinod Kumar Vavilapalli
I may be mistaken, but let me try again with an example to see if we are on the same page Principals - NameNode: nn/nn-h...@cluster.com - DataNode: dn/_h...@cluster.com Auth to local mappings - nn/nn-h...@cluster.com - hdfs - dn/.*@cluster.com - hdfs The combination of the above lets you

Re: Question about container recovery

2014-12-10 Thread Vinod Kumar Vavilapalli
Replies inline Here is my question: is there a mechanisms that when one container exit abnormally, yarn will prefer to dispatch the container on other NM? Acting on container exit is a responsibility left to ApplicationMasters. For e.g. MapReduce ApplicationMaster explicitly tells YARN to

Re: Question about container recovery

2014-12-10 Thread Vinod Kumar Vavilapalli
Is this MapReduce application? MR has a concept of blacklisting nodes where a lot of tasks fail. The configs that control it are - yarn.app.mapreduce.am.job.node-blacklisting.enable: True by default - mapreduce.job.maxtaskfailures.per.tracker: Default is 3, meaning a node is blacklisted if it

Re: adding node(s) to Hadoop cluster

2014-12-10 Thread Vinod Kumar Vavilapalli
I am aware that one can add names to dfs.hosts and run dfsadmin -refreshNodes, but with Kerberos I have the additional problem that the new hosts' principals have to be added to hadoop.security.auth_to_local (I do not have the luxury of an easy albeit secure pattern for host names). Alas,

Re: When schedulers consider x% of resources what do they mean?

2014-12-05 Thread Vinod Kumar Vavilapalli
Resources can mean memory-only (by default) or memory + CPU etc across the _entire_ cluster. So 70% of cluster resources for a queue means that 70% of the total memory set for Hadoop in the cluster are available for all applications in that queue. Heap sizes are part of the memory

Re: datanodes not connecting

2014-11-23 Thread Vinod Kumar Vavilapalli
Can you see the slave logs to find out what is happening there? For e.g., /home/hadoop/logs/hadoop-hadoop-datanode-hadoop3.log and /home/hadoop/logs/yarn-hadoop-nodemanager-hadoop-hadoop3.log. +Vinod On Sun, Nov 23, 2014 at 10:24 AM, Tim Dunphy bluethu...@gmail.com wrote: Hey all, OK thanks

Re: Does io.sort.mb count in the records or just the keys?

2014-11-09 Thread Vinod Kumar Vavilapalli
It accounts for both keys and values. +Vinod Hortonworks Inc. http://hortonworks.com/ On Sun, Nov 9, 2014 at 11:54 AM, Muhuan Huang mhhu...@cs.ucla.edu wrote: Hello everyone, I have a question about the io.sort.mb property. The document says that io.sort.mb is the total amount of buffer

Re: Containers of required size are not being allocated.

2014-11-03 Thread Vinod Kumar Vavilapalli
I bet you are not setting different Resource-Request-priorities on your requests. It is a current limitation (https://issues.apache.org/jira/browse/YARN-314) of not being able to support resources of different sizes against a single priority. +Vinod On Nov 3, 2014, at 1:23 AM, Smita

Re: Map Reduce Job is reported as complete on history server while on console it shows as only half way thru

2014-09-17 Thread Vinod Kumar Vavilapalli
Is it possible that the client JVM is somehow getting killed while the YARN application finishes as usual on the cluster in the background? +Vinod On Wed, Sep 17, 2014 at 9:29 AM, S.L simpleliving...@gmail.com wrote: Hi All, I am running a MRV1 job on Hadoop YARN 2.3.0 cluster , the

Re: YARN Logs

2014-07-15 Thread Vinod Kumar Vavilapalli
Adam is right. yarn logs command only works when log-aggregation is enabled. It's not easy but possible to make it work when aggregation is disabled. +Vinod Hortonworks Inc. http://hortonworks.com/ On Tue, Jul 15, 2014 at 10:03 AM, Brian C. Huffman bhuff...@etinternational.com wrote:

Re: Muliple map writing into same hdfs file

2014-07-10 Thread Vinod Kumar Vavilapalli
Current writes to a single file in HDFS is not possible today. You may want to write a per-task file and use that entire directory as your output. +Vinod Hortonworks Inc. http://hortonworks.com/ On Wed, Jul 9, 2014 at 10:42 PM, rab ra rab...@gmail.com wrote: hello I have one use-case

Re: Partitioning and setup errors

2014-06-28 Thread Vinod Kumar Vavilapalli
What is happening is the client is not able to pick up the right jar to push to the cluster. It looks in the class-path for the jar that contains the class ParallelGeneticAlignment. How are you packaging your code? How are your running your job - paste the command line? +Vinod On Jun 27,

Re: add to example programs

2014-06-26 Thread Vinod Kumar Vavilapalli
You cannot dynamically add jobs. You will have to implement a new example and modify ExampleDriver.java to also include the new example and recompile. +Vinod On Jun 26, 2014, at 3:23 AM, John Hancock jhancock1...@gmail.com wrote: I would like to re-use the framework for example programs in

Re: priority in the container request

2014-06-09 Thread Vinod Kumar Vavilapalli
Yes, priorities are assigned to ResourceRequests and you can ask multiple containers at the same priority level. You may not get all the containers together as today's scheduler lacks gang functionality. +Vinod On Jun 9, 2014, at 12:08 AM, Krishna Kishore Bonagiri write2kish...@gmail.com

Re: Hadoop usage in uploading downloading big data

2014-06-06 Thread Vinod Kumar Vavilapalli
Can you give more details on the data that you are storing in the release data management? And also how on how it is accessed - read, and modified? +vinod On Jun 2, 2014, at 5:33 AM, rahul.soa rahul@googlemail.com wrote: Hello All, I'm newbie to Hadoop and interested to know if hadoop

Re: question about NM heapsize

2014-05-22 Thread Vinod Kumar Vavilapalli
Not in addition to that. You should only use the memory-mb configuration. Giving 15GB to NodeManger itself will eat into the total memory available for containers. Vinod On May 22, 2014, at 8:25 PM, Tsuyoshi OZAWA ozawa.tsuyo...@gmail.com wrote: hi, In addition to that, you need to change

Re: What codes to chmod 755 to yarn.nodemanager.log-dirs?

2014-04-28 Thread Vinod Kumar Vavilapalli
. Is it possible that DefaultContainerExecutor change the permission of existing nodemanager log-dir to 755? 2014-04-25 0:54 GMT+08:00 Vinod Kumar Vavilapalli vino...@apache.org: Which version of Hadoop are you using? This part of code changed a little, so asking. Also, is this in secure or non

Re: What codes to chmod 755 to yarn.nodemanager.log-dirs?

2014-04-24 Thread Vinod Kumar Vavilapalli
Which version of Hadoop are you using? This part of code changed a little, so asking. Also, is this in secure or non-secure mode (DefaultContainerExecutor vs LinuxContainerExecutor)? Either of those two classes do some more permission magic and you may be running into those. +Vinod Hortonworks

Re: map execute twice

2014-04-24 Thread Vinod Kumar Vavilapalli
This can happen when maps are marked as failed *after* they have successfully completed the map operation. One common reason when this can happen is reducers faiingl to fetch the map-outputs due to the node that ran the mapper going down, the machine freezing up etc. +Vinod Hortonworks Inc.

Re: Yarn hangs @Scheduled

2014-04-24 Thread Vinod Kumar Vavilapalli
How much memory do you see as available on the RM web page? And what are the memory requirements for this app? And this is a MR job? +Vinod Hortonworks Inc. http://hortonworks.com/ On Thu, Apr 24, 2014 at 1:23 PM, Jay Vyas jayunit...@gmail.com wrote: Hi folks : My yarn jobs seem to be

Re: Submit a Hadoop 1.1.1 job remotely to a Hadoop 2 cluster

2014-04-16 Thread Vinod Kumar Vavilapalli
You cannot run JobTracker/TaskTracker in Hadoop 2. It's neither supported nor even possible. +Vinod On Apr 16, 2014, at 2:27 PM, Kim Chew kchew...@gmail.com wrote: I have a cluster running Hadoop 2 but it is not running YARN, i.e. mapreduce.framework.name is set to classic therefore the

Re: yarn application still running but dissapear from UI

2014-03-26 Thread Vinod Kumar Vavilapalli
Sounds like https://issues.apache.org/jira/browse/YARN-1810. +Vinod On Mar 26, 2014, at 7:44 PM, Henry Hung ythu...@winbond.com wrote: Hi Hadoop Users, I’m using hadoop-2.2.0 with YARN. Today I stumble upon a problem with YARN management UI, when I look into cluster/apps, there is one

Re: Data Locality Importance

2014-03-22 Thread Vinod Kumar Vavilapalli
Like you said, it depends both on the kind of network you have and the type of your workload. Given your point about S3, I'd guess your input files/blocks are not large enough that moving code to data trumps moving data itself to the code. When that balance tilts a lot, especially when moving

Re: Yarn MapReduce Job Issue - AM Container launch error in Hadoop 2.3.0

2014-03-22 Thread Vinod Kumar Vavilapalli
What is 614 here? The other relevant thing to check is the MapReduce specific config mapreduce.application.classpath. +Vinod On Mar 22, 2014, at 9:03 AM, Tony Mullins tonymullins...@gmail.com wrote: Hi, I have setup a 2 node cluster of Hadoop 2.3.0. Its working fine and I can

Re: Yarn MapReduce Job Issue - AM Container launch error in Hadoop 2.3.0

2014-03-22 Thread Vinod Kumar Vavilapalli
, Tony On Sat, Mar 22, 2014 at 11:11 PM, Vinod Kumar Vavilapalli vino...@apache.org wrote: What is 614 here? The other relevant thing to check is the MapReduce specific config mapreduce.application.classpath. +Vinod On Mar 22, 2014, at 9:03 AM, Tony Mullins tonymullins...@gmail.com

Re: are the job and task tracker monitor webpages gone now in hadoop v2.3.0

2014-03-06 Thread Vinod Kumar Vavilapalli
Yes. JobTracker and TaskTracker are gone from all the 2.x release lines. MapReduce is an application on top of YARN. That is per job - launches, starts and finishes after it is done with its work. Once it is done, you can go look at it in the MapReduce specific JobHistoryServer. +Vinod On

Re: Node manager or Resource Manager crash

2014-03-04 Thread Vinod Kumar Vavilapalli
I remember you asking this question before. Check if your OS' OOM killer is killing it. +Vinod On Mar 4, 2014, at 6:53 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am running an application on a 2-node cluster, which tries to acquire all the containers that are

Re: Hiveserver2 + OpenLdap Authentication issue

2014-02-24 Thread Vinod Kumar Vavilapalli
This is on the wrong mailing list, hence the non-activity. +user@hive bcc:user@hadoop Thanks +Vinod On Feb 23, 2014, at 10:16 PM, orahad bigdata oracle...@gmail.com wrote: Can somebody help me please? Thanks On Sun, Feb 23, 2014 at 3:27 AM, orahad bigdata oracle...@gmail.com wrote:

Re: history server for 2 clusters

2014-02-20 Thread Vinod Kumar Vavilapalli
Interesting use-case and setup. We never had this use-case in mind so far - we so far assumed a history-server per YARN cluster. You may be running into some issues where this assumption is not valid. Why do you need two separate YARN clusters for the same underlying data on HDFS? And if that

Re: Capacity Scheduler capacity vs. maximum-capacity

2014-02-20 Thread Vinod Kumar Vavilapalli
Yes, it does take those extra resources away back to queue B. How quickly it takes them away depends on whether preemption is enabled or not. If preemption is not enabled, it 'takes away' as and when containers from queue A start finishing. +Binod On Feb 19, 2014, at 5:35 PM, Alex Nastetsky

Re: what happens to a client attempting to get a new app when the resource manager is already down

2014-02-05 Thread Vinod Kumar Vavilapalli
Is this on trunk or a released version? I think the default behavior (when RM HA is not enabled) shouldn't have client loop forever. Let me know and we can see if this needs fixing. Thanks, +vinod On Jan 31, 2014, at 7:52 AM, REYANE OUKPEDJO r.oukpe...@yahoo.com wrote: Hi there, I am

Re: kerberos principals per node necessary?

2014-02-05 Thread Vinod Kumar Vavilapalli
For helping manage this, Hadoop lets you specify principles of the format hdfs/_HOST@SOME-REALM. Here _HOST is a special string that Hadoop interprets and replaces it with the local hostname. You need to create principles per host though. +Vinod On Feb 2, 2014, at 3:14 PM, Koert Kuipers

Re: Does all reducer take input from all NodeManager/Tasktrackers of Map tasks

2014-01-27 Thread Vinod Kumar Vavilapalli
On Jan 27, 2014, at 4:17 AM, Amit Mittal amitmitt...@gmail.com wrote: Question 1: I believe the TaskTracker and then JobTracker/AppMaster will receive the updates through call to Task.statusUpdate(TaskUmbilicalProtocol obj). By which the JobTracker/AM will know the location of the map's

Re: Invalide URI in job start

2014-01-27 Thread Vinod Kumar Vavilapalli
the method createApplicationResource. Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Jan 27, 2014, at 2:05 AM, Lukas Kairies lukas.xtree...@googlemail.com wrote: Hello, I try to use XtreemFS as an alternative file system for Hadoop 2.x. There is an existing

Re: Ambari upgrade 1.4.1 to 1.4.2

2014-01-24 Thread Vinod Kumar Vavilapalli
+user@ambari -user@hadoop Please post ambari related questions to the ambari user mailing list. Thanks +Vinod Hortonworks Inc. http://hortonworks.com/ On Fri, Jan 24, 2014 at 9:15 AM, Kokkula, Sada sadanandam.kokk...@bnymellon.com wrote: Ambari-Server upgrade from 1.4.1 to 1.4.2 wipes out

Re: HDFS data transfer is faster than SCP based transfer?

2014-01-24 Thread Vinod Kumar Vavilapalli
Is it a single file? Lots of files? How big are the files? Is the copy on a single node or are you running some kind of a MapReduce program? +Vinod Hortonworks Inc. http://hortonworks.com/ On Fri, Jan 24, 2014 at 7:21 AM, rab ra rab...@gmail.com wrote: Hi Can anyone please answer my query?

Re: Memory problems with BytesWritable and huge binary files

2014-01-24 Thread Vinod Kumar Vavilapalli
Is your data in any given file a bunch of key-value pairs? If that isn't the case, I'm wondering how writing a single large key-value into a sequence file helps. It won't. May be you can give an example of your input data? If indeed they are a bunch of smaller sized key-value pairs, you can write

Re: No space left on device during merge.

2014-01-24 Thread Vinod Kumar Vavilapalli
That's a lot of data to process for a single reducer. You should try increasing the number of reducers to achieve more parallelism and also try modifying your logic to avoid significant skew in the reducers. Unfortunately this means rethinking about your app, but that's the only way about it. It

Re: Memory problems with BytesWritable and huge binary files

2014-01-24 Thread Vinod Kumar Vavilapalli
Okay. Assuming you don't need a whole file (video) in memory for your processing, you can simply write a Inputformat/RecordReader implementation that streams through any given file and processes it. +Vinod On Jan 24, 2014, at 12:44 PM, Adam Retter adam.ret...@googlemail.com wrote: Is your

Re: Container's completion issue

2014-01-21 Thread Vinod Kumar Vavilapalli
It means that the first process in the container is either crashing due to some reason or explicitly killed by an external entity. You can look at the logs for the container on the web-UI. Also look at ResourceManager logs to trace what is happening with this container. Which application is this?

Re: DistributedCache is empty

2014-01-17 Thread Vinod Kumar Vavilapalli
What is the version of Hadoop that you are using? +Vinod On Jan 16, 2014, at 2:41 PM, Keith Wiley kwi...@keithwiley.com wrote: My driver is implemented around Tool and so should be wrapping GenericOptionsParser internally. Nevertheless, neither -files nor DistributedCache methods seem to

Re: How to make AM terminate if client crashes?

2014-01-13 Thread Vinod Kumar Vavilapalli
The architecture is built around detachable clients. So, no, it doesn't happen automatically. Even if we were to add that feature, it'd be fraught with edge cases - network issues causing app-termination even though client is still alive etc. Any more details on why this is desired? +Vinod

Re: A question about Hadoop 1 job user id used for group mapping, which could lead to performance degradatioin

2014-01-08 Thread Vinod Kumar Vavilapalli
It just seems like lazy code. You can see that, later, there is this: {code} for(Token? token : UserGroupInformation.getCurrentUser().getTokens()) { childUGI.addToken(token); } {code} So eventually the JobToken is getting added to the UGI which runs task-code.

Re: Ways to manage user accounts on hadoop cluster when using kerberos security

2014-01-08 Thread Vinod Kumar Vavilapalli
On Jan 7, 2014, at 2:55 PM, Manoj Samel manoj.sa...@gmail.com wrote: I am assuming that if the users are in a LDAP, can using the PAM for LDAP solve the issue. That's how I've seen this issue addressed. +Vinod -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the

Re: Why none AppMaster node seeks IPCServer on itself.

2014-01-08 Thread Vinod Kumar Vavilapalli
Checked the firewall rules? +Vinod On Jan 8, 2014, at 3:22 AM, Saeed Adel Mehraban s.ade...@gmail.com wrote: Hi all. I have an installation on Hadoop on 3 nodes, namely master, slave1 and slave2. When I try to run a job, assuming appmaster be on slave1, every map and reduce tasks which

Re: Understanding MapReduce source code : Flush operations

2014-01-06 Thread Vinod Kumar Vavilapalli
want to change the logic a bit which suits my convenience. On Tue, Jan 7, 2014 at 12:41 AM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: Assuming your output is going to HDFS, you want to look at DFSClient. Reducer uses FileSystem to write the output. You need to start looking

Re: Unable to change the virtual memory to be more than the default 2.1 GB

2014-01-02 Thread Vinod Kumar Vavilapalli
You need to change the application configuration itself to tell YARN that each task needs more than the default. I see that this is a mapreduce app, so you have to change the per-application configuration: mapreduce.map.memory.mb and mapreduce.reduce.memory.mb in either mapred-site.xml or via

Re: What are the methods to share dynamic data among mappers/reducers?

2014-01-02 Thread Vinod Kumar Vavilapalli
There isn't anything natively supported for that in the framework, but you can do that yourselves by using a shared service (for e.g via HDFS files, ZooKeeper nodes) that mappers/reducers all have access to. More details on your usecase? In any case, once you start making mappers and reducers

Re: Map succeeds but reduce hangs

2014-01-02 Thread Vinod Kumar Vavilapalli
Check the TaskTracker configuration in mapred-site.xml: mapred.task.tracker.report.address. You may be setting it to 127.0.0.1:0 or localhost:0. Change it to 0.0.0.0:0 and restart the daemons. Thanks, +Vinod On Jan 1, 2014, at 2:14 PM, navaz navaz@gmail.com wrote: I dont know y it is

Re: Could not find the main class: org.apache.hadoop.mapreduce.v2.app.MRAppMaster

2013-12-23 Thread Vinod Kumar Vavilapalli
Seems like the hadoop common jar is missing, can you check if one of the directories listed in the CLASSPATH has the hadoop-common jar? Thanks, +Vinod On Dec 22, 2013, at 10:27 PM, Hadoop Dev hadoopeco@gmail.com wrote: Hi All, I am trying to execute first ever program (Word Count) in

Re: Yarn -- one of the daemons getting killed

2013-12-17 Thread Vinod Kumar Vavilapalli
On Fri, Dec 13, 2013 at 11:51 PM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: Yes, that is what I suspect. That is why I asked if everything is on a single node. If you are running linux, linux OOM killer may be shooting things down. When it happens, you will see something like 'killed

Re: Pluggable distribute cache impl

2013-12-16 Thread Vinod Kumar Vavilapalli
If the files are already on a NFS mount, you don't need to spread files around distributed-cache? BTW, running jobs on NFS mounts isn't going to scale after a while. Thanks, +Vinod On Dec 15, 2013, at 1:15 PM, Jay Vyas jayunit...@gmail.com wrote: are there any ways to plug in an alternate

Re: pipes on hadoop 2.2.0 crashes

2013-12-16 Thread Vinod Kumar Vavilapalli
You should navigate to the ResourceManager UI following the link and see what is happening on the ResourceManager as well as the application-master. Check if any nodes are active first. Then look at ResourceManager and NodeManager logs. +Vinod On Dec 16, 2013, at 10:29 AM, Mauro Del Rio

Re: Yarn -- one of the daemons getting killed

2013-12-13 Thread Vinod Kumar Vavilapalli
at 1:52 AM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: Is all of this on a single node? Thanks, +Vinod On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am running a small application on YARN (2.2.0) in a loop of 500 times

Re: pipes on hadoop 2.2.0 crashes

2013-12-13 Thread Vinod Kumar Vavilapalli
Could it just be LocalJobRunner? Can you try it on a cluster? We've tested pipes on clusters, so will be surprised if it doesn't work there. Thanks, +Vinod On Dec 13, 2013, at 7:44 AM, Mauro Del Rio mdrio1...@gmail.com wrote: Hi, I tried to run a simple test with pipes, but it crashes.

Re: how to create symbolic link in hdfs with c++ code or webhdfs interface?

2013-12-13 Thread Vinod Kumar Vavilapalli
What version of Hadoop? Thanks, +Vinod On Dec 13, 2013, at 1:57 AM, Xiaobin She xiaobin...@gmail.com wrote: I'm writting an c++ programme, and I need to deal with hdfs. What I need is to create some file in hdfs and read the status of these files. And I need to be able to create sym link

Re: issue about no class find in running MR job

2013-12-13 Thread Vinod Kumar Vavilapalli
That is not the correct usage. You should do hadoop jar your-jar-name main-class-name. Or if you are adventurous, directly invoke your class using java and setting appropriate classpath. Thanks, +Vinod On Dec 12, 2013, at 6:11 PM, ch huang justlo...@gmail.com wrote: hadoop ../test/WordCount

Re: Unsubscribe Please

2013-12-12 Thread Vinod Kumar Vavilapalli
You should send an email to user-unsubscr...@hadoop.apache.org. Thanks, +Vinod On Dec 12, 2013, at 8:36 AM, K. M. Rakibul Islam rakib1...@gmail.com wrote: Unsubscribe Please! Thanks. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to

Re: Yarn -- one of the daemons getting killed

2013-12-12 Thread Vinod Kumar Vavilapalli
Is all of this on a single node? Thanks, +Vinod On Dec 12, 2013, at 3:26 AM, Krishna Kishore Bonagiri write2kish...@gmail.com wrote: Hi, I am running a small application on YARN (2.2.0) in a loop of 500 times, and while doing so one of the daemons, node manager, resource manager, or

Re: Writing to remote HDFS using C# on Windows

2013-12-05 Thread Vinod Kumar Vavilapalli
You can try using WebHDFS. Thanks, +Vinod On Thu, Dec 5, 2013 at 6:04 PM, Fengyun RAO raofeng...@gmail.com wrote: Hi, All Is there a way to write files into remote HDFS on Linux using C# on Windows? We want to use HDFS as data storage. We know there is HDFS java API, but not C#. We tried

Re: Container [pid=22885,containerID=container_1386156666044_0001_01_000013] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 332.5 GB of 8 GB virtual memo

2013-12-05 Thread Vinod Kumar Vavilapalli
Something looks really bad on your cluster. The JVM's heap size is 200MB but its virtual memory has ballooned to a monstrous 332GB. Does that ring any bell? Can you run regular java applications on this node? This doesn't seem related to YARN per-se. +Vinod Hortonworks Inc.

Re: Client mapred tries to renew a token with renewer specified as nobody

2013-12-04 Thread Vinod Kumar Vavilapalli
It is clearly mentioning that the renewer is wrong (renewer marked is 'nobody' but mapred is trying to renew the token), you may want to check this. Thanks, +Vinod On Dec 2, 2013, at 8:25 AM, Rainer Toebbicke wrote: 2013-12-02 15:57:08,541 ERROR

Re: issue about the MR JOB local dir

2013-12-04 Thread Vinod Kumar Vavilapalli
These are the directories where NodeManager (as configured) will store its local files. Local files includes scripts, jars, libraries - all files sent to nodes via DistributedCache. Thanks, +Vinod On Dec 3, 2013, at 5:26 PM, ch huang wrote: hi,maillist: i see three dirs on my

Re: issue about capacity scheduler

2013-12-04 Thread Vinod Kumar Vavilapalli
If both the jobs in the MR queue are from the same user, CapacityScheduler will only try to run them one after another. If possible, run them as different users. At which point, you will see sharing across jobs because they are from different users. Thanks, +Vinod On Dec 4, 2013, at 1:33 AM,

Re: Time taken for starting AMRMClientAsync

2013-11-17 Thread Vinod Kumar Vavilapalli
It is just creating a connection to RM and shouldn't take that long. Can you please file a ticket so that we can look at it? JVM class loading overhead is one possibility but 1 sec is a bit too much. Thanks, +Vinod On Oct 21, 2013, at 7:16 AM, Krishna Kishore Bonagiri wrote: Hi, I am

Re: Hadoop 2.2.0: Cannot run PI in under YARN

2013-11-08 Thread Vinod Kumar Vavilapalli
This is just a symptom not the root cause. Please check the YARN web UI at 8088 on ResourceManager machine and browse to the application page. It should give you more details. Thanks, +Vinod On Nov 8, 2013, at 8:57 AM, Ping Luo wrote: java.io.FileNotFoundException: File does not exist --

Re: Error while running Hadoop Source Code

2013-11-06 Thread Vinod Kumar Vavilapalli
Runner jvm_201311060636_0001_m_164532908 spawned. 2013-11-06 06:40:17,216 INFO org.apache.hadoop.mapred.TaskTracker: JVM with ID: jvm_201311060636_0001_m_164532908 given task: attempt_2013110606 36_0001_m_02_0 Regards, Indrashish On Tue, 5 Nov 2013 10:09:36 -0800, Vinod Kumar

Re: only one map or reduce job per time on one node

2013-11-05 Thread Vinod Kumar Vavilapalli
Why do you want to do this? +Vinod On Nov 5, 2013, at 9:17 AM, John wrote: Is it possible to force the jobtracker executing only 2 map jobs or 1 reduce job per time? -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is

Re: Error while running Hadoop Source Code

2013-11-05 Thread Vinod Kumar Vavilapalli
It seems like your pipes mapper is exiting before consuming all the input. Did you check the task-logs on the web UI? Thanks, +Vinod On Nov 5, 2013, at 7:25 AM, Basu,Indrashish wrote: Hi, Can anyone kindly assist on this ? Regards, Indrashish On Mon, 04 Nov 2013 10:23:23 -0500,

Re: Tasktracker Permission Issue?

2013-09-18 Thread Vinod Kumar Vavilapalli
, then all of /a, /a/b, /a/b/c etc need to be executable by everyone - an executable permission is needed in a linux dir for someone to be able to create files/dir in some of the sub-directories. Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Sep 18, 2013, at 7:26 AM

Re: Resource limits with Hadoop and JVM

2013-09-16 Thread Vinod Kumar Vavilapalli
their resource requirements, and TTs enforce those limits. HTH +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Sep 16, 2013, at 1:35 PM, Forrest Aldrich wrote: We recently experienced a couple of situations that brought one or more Hadoop nodes down (unresponsive). One

Re: chaining (the output of) jobs/ reducers

2013-09-13 Thread Vinod Kumar Vavilapalli
Other than the short term solutions that others have proposed, Apache Tez solves this exact problem. It can M-M-R-R-R chains, and mult-way mappers and reducers, and your own custom processors - all without persisting the intermediate outputs to HDFS. It works on top of YARN, though the first

Re: Setting user in yarn in 2.1.0

2013-09-11 Thread Vinod Kumar Vavilapalli
running YARN. In secure case, it will run as the app-submitter. Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Sep 11, 2013, at 10:17 AM, Albert Shau wrote: In 2.1.0, the method to set user in the ApplicationSubmissionContext and ContainerLaunchContext has been

Re: assign tasks to specific nodes

2013-09-11 Thread Vinod Kumar Vavilapalli
issues. In Hadoop 2 YARN, the platform does expose this functionality. But MapReduce framework doesn't yet expose this functionality to the end users. What exactly is your use case? Why are some nodes of higher priority than others? Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http

Re: Job status shows 0's for counters

2013-09-03 Thread Vinod Kumar Vavilapalli
We've observed this internally too. Shinichi, tx for the patch. Will follow up on JIRA to get it committed. Thanks, +Vinod On Sep 3, 2013, at 11:35 AM, Shinichi Yamashita wrote: Hi, I reported this issue in MAPREDUCE-5376 (https://issues.apache.org/jira/browse/MAPREDUCE-5376) and attached

Re: [yarn] job is not getting assigned

2013-08-29 Thread Vinod Kumar Vavilapalli
This usually means there are no available resources as seen by the ResourceManager. Do you see Active Nodes on the RM web UI first page? If not, you'll have to check the NodeManager logs to see if they crashed for some reason. Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http

  1   2   >