HIVE+MAPREDUCE

2014-01-21 Thread Ranjini Rathinam
Hi, Need to load the data into hive table using mapreduce, using java. Please suggest the code related to hive +mapreduce. Thanks in advance Ranjini R

Re: HIVE+MAPREDUCE

2014-01-21 Thread Jeff Zhang
you just need to run mapreduce job to generate the data you want and then upload the data into hive table ( create table first if it is not exists ) these 2 steps are totally separated. On Tue, Jan 21, 2014 at 4:21 PM, Ranjini Rathinam ranjinibe...@gmail.comwrote: Hi, Need to load the data

Re: HIVE+MAPREDUCE

2014-01-21 Thread unmesha sreeveni
Programming in Hive Text Book contains what u want . Chapter 4 Hope that will help u. On Tue, Jan 21, 2014 at 1:51 PM, Ranjini Rathinam ranjinibe...@gmail.comwrote: Hi, Need to load the data into hive table using mapreduce, using java. Please suggest the code related to hive +mapreduce.

How to start historyserver on all nodes automatically?

2014-01-21 Thread Saeed Adel Mehraban
Hi all, I have enabled log aggregation and want to track task logs on hdfs. I need to start historyserver vie mr-jobhistory-daemon.sh start historyserver on all nodes. Is there any way to run historyserver automatically when yarn starts?

Re: Hadoop 2 Namenode HA not working properly

2014-01-21 Thread Bruno Andrade
Hey, this is my hdfs-site.xml - http://pastebin.com/qpELkwH8 this is my core-site.xml: configuration property namefs.defaultFS/name valuehdfs://blabla-hadoop/value /property property namehadoop.tmp.dir/name value/opt/hadoop/hadoop/tmp/value

Shutdown hook for FileSystems

2014-01-21 Thread Lukas Kairies
Hey, I use Hadoop with XtreemFS (with a corresponding FileSystem implementation). The XtreemFS client uses several non-deamon Threads eg. for communication. Therefore the shutdown hooks do not start after a mapper/reducer is finished and the Child processes do not terminate. My question: Is

Re: How to start historyserver on all nodes automatically?

2014-01-21 Thread Harsh J
Hi, You do not need to run an MR HistoryServer on all nodes. If you want start-yarn.sh to cover the history server startup you can also inject the command at its end. On Tue, Jan 21, 2014 at 2:43 PM, Saeed Adel Mehraban s.ade...@gmail.com wrote: Hi all, I have enabled log aggregation and want

Re: HIVE+MAPREDUCE

2014-01-21 Thread Chris Mawata
If you put the sentence Need to load the data into hive table using mapreduce, using java into your google search box you will get tons of information. On 1/21/2014 3:21 AM, Ranjini Rathinam wrote: Need to load the data into hive table using mapreduce, using java

Set number of mappers

2014-01-21 Thread xeon
Hi, I want to set the number of map tasks in the Wordcount example. Is is possible to set this variable in MRv2? Thanks,

Re: Set number of mappers

2014-01-21 Thread Shekhar Sharma
nO of map tasks is determined by number of input splits.you can change the NUM Of map tasks by changing the input split size But you can set to num of reducertasks explicitly On 21 Jan 2014 20:25, xeon xeonmailingl...@gmail.com wrote: Hi, I want to set the number of map tasks in the Wordcount

Re: Shutdown hook for FileSystems

2014-01-21 Thread Jay Vyas
what is happening when you remove the shutdown hook ?is that supposed to trigger an exception -

Re: Set number of mappers

2014-01-21 Thread sudhakara st
Hey sorry for previous answer i thought it reducer. we can't set number mappers for a job it determined by number of input splits as Shekhar said. On Tue, Jan 21, 2014 at 9:44 PM, Shekhar Sharma shekhar2...@gmail.comwrote: nO of map tasks is determined by number of input splits.you can change

Re: Container's completion issue

2014-01-21 Thread Vinod Kumar Vavilapalli
It means that the first process in the container is either crashing due to some reason or explicitly killed by an external entity. You can look at the logs for the container on the web-UI. Also look at ResourceManager logs to trace what is happening with this container. Which application is this?

Is perfect control over mapper num AND split distribution possible?

2014-01-21 Thread Keith Wiley
I am running a job that takes no input from the mapper-input key/value interface. Each job reads the same small file from the distributed cache and processes it independently (to generate Monte Carlo sampling of the problem space). I am using MR purely to parallelize the otherwise redundant

RE: Is perfect control over mapper num AND split distribution possible?

2014-01-21 Thread java8964
You cannot use hadoop NLineInputFormat? If you generate 100 lines of text file, by default, one line will trigger one mapper task. As long as you have 100 task slot available, you will get 100 mapper running concurrently. You want perfect control over mapper num? NLineInputFormat is designed for

Re: Container's completion issue

2014-01-21 Thread REYANE OUKPEDJO
It is my own custom application. But looking at the Resource manager's logs , the container completed as normal with exit code of 0 . This is really weird to me.  On Tuesday, January 21, 2014 1:17 PM, Vinod Kumar Vavilapalli vino...@hortonworks.com wrote: It means that the first process

Re: Is perfect control over mapper num AND split distribution possible?

2014-01-21 Thread Keith Wiley
I'll look it up. Thanks. On Jan 21, 2014, at 11:43 , java8964 wrote: You cannot use hadoop NLineInputFormat? If you generate 100 lines of text file, by default, one line will trigger one mapper task. As long as you have 100 task slot available, you will get 100 mapper running

Re: Hadoop 1.2.1 or 2.2.0 on Windows - XP-SP2 Not using Cygwin

2014-01-21 Thread Arpit Agarwal
Anand, Instructions to build Hadoop 2.2 on Windows are at https://wiki.apache.org/hadoop/Hadoop2OnWindows Chuck Lam's book is great but out of date wrt Windows support. Windows XP is not a supported platform. Windows Server 2008 or later is recommended and Windows Vista is also likely to work.

Re: Building Hadoop 2.2.0 On Windows 7 64-bit

2014-01-21 Thread Arpit Agarwal
Folks, please refer to the wiki page https://wiki.apache.org/hadoop/Hadoop2OnWindows and also BUILDING.txt in the source tree. We believe we captured all the prerequisites in BUILDING.txt so let us know if anything is missing. On Fri, Jan 17, 2014 at 8:16 AM, Steve Lewis lordjoe2...@gmail.com

Re: Is perfect control over mapper num AND split distribution possible?

2014-01-21 Thread Keith Wiley
Seems to work well. Thank you very much! On Jan 21, 2014, at 12:42 , Keith Wiley wrote: I'll look it up. Thanks. On Jan 21, 2014, at 11:43 , java8964 wrote: You cannot use hadoop NLineInputFormat? If you generate 100 lines of text file, by default, one line will trigger one mapper

Re: Shutdown hook for FileSystems

2014-01-21 Thread Oleg Zhurakousky
No, all I do is have my own shutdown hook in the main which closes the FSDataOutputStream. Before I did that it would throw an ugly exception when I hit Ctrl+C, telling me that the stream is already closed, because of this shutdown hook (bad design on the hadoop part), so removing it keeps it open

optimizing HDFS writes with replication=1

2014-01-21 Thread John Lilley
I am writing temp files to HDFS with replication=1, so I expect the blocks to be stored on the writing node. Are there any tips, in general, for optimizing write performance to HDFS? I use 128K buffers in the write() calls. Are there any parameters that can be set on the connection or in

Re: Shutdown hook for FileSystems

2014-01-21 Thread Oleg Zhurakousky
I am not sure either, you have to ask Hadoop guys, but it was giving me a hard time so I found a way around it. On Tue, Jan 21, 2014 at 6:05 PM, Jay Vyas jayunit...@gmail.com wrote: I guess im not sure what the ShutdownHook actually is there for Thats the real question im asking . On

kerberos for outside threads

2014-01-21 Thread Koert Kuipers
i understand kerberos is used on hadoop to provide security in a multi-user environment, and i can totally see its usage for a shared cluster within a company to make sure sensitive data for one department is safe from prying eyes of another department. but for a hadoop cluster that sits behind a

Re: kerberos for outside threads

2014-01-21 Thread Haohui Mai
Hi Koert, I'm wondering what is the end-to-end goal you want to achieve. You can disable security in Hadoop, where the cluster does not perform additional authentication. Obviously you can go without kerberos in this case and protect your clusters with other measures you've mentioned.

[YARN] TestDistributedShell failed in eclipse but successful in maven

2014-01-21 Thread Jeff Zhang
Hi all, TestDistributedShell is a unit test for DistributedShell. I could run it successfully in maven, but when I run it in eclipse, it failed. Do I need any extra setting to make it run in eclipse ? Here's the error message: 2014-01-22 09:38:20,733 INFO [AsyncDispatcher event handler]

How to learn hadoop follow Tom White

2014-01-21 Thread Cooleaf
hi folks, I am new to Hadoop and I am trying to learning Hadoop follow the book(Hadoop: The definitive guide 2nd edition), I found the sample code is under 0.20, should I learn and exercise it under Hadoop 1.0 version? I have installed Hadoop 2.2 which is another branch. thanks,

Re: How to learn hadoop follow Tom White

2014-01-21 Thread Harsh J
The book's contents should still be very relevant as the APIs haven't changed. On Wed, Jan 22, 2014 at 11:23 AM, Cooleaf cool...@gmail.com wrote: hi folks, I am new to Hadoop and I am trying to learning Hadoop follow the book(Hadoop: The definitive guide 2nd edition), I found the