Re: Why is Hadoop always running just 4 tasks?

2013-12-11 Thread Rahul Bhattacharjee
Not sure , If I understand the question correctly. The reducers would start only after all the mappers are complete. =Rahul On Wed, Dec 11, 2013 at 7:59 AM, Dror, Ittay wrote: > I have a cluster of 4 machines with 24 cores and 7 disks each. > > On each node I copied from local a file of 500G.

Re: Folder not created using Hadoop Mapreduce code

2013-11-13 Thread Rahul Bhattacharjee
it might be creating within the user directory of the user in hdfs. trying creating something starting with a forward slash. Thanks, Rahul On Wed, Nov 13, 2013 at 10:40 PM, Amr Shahin wrote: > Do you get an exception or it just fails silently ? > > > On Thu, Nov 14, 2013 at 10:27 AM, unmesha

Re: How to write the contents from mapper into file

2013-11-13 Thread Rahul Bhattacharjee
If you have a map only job , then the output of the mappers would be written by hadoop itself. thanks, Rahul On Wed, Nov 13, 2013 at 9:50 AM, Sahil Agarwal wrote: > I’m not sure if it’s best way, but if all you’re looking for is the > contents produced from the mapper, you could have a reduce

Re: How to Set No of Mappers in program

2013-11-12 Thread Rahul Bhattacharjee
number of mappers depend on the number of dfs blocks you file spans across , considering standard file input formatter. unlike reducers , where you configure as how many reducers you want for your job. thanks, Rahul On Tue, Nov 12, 2013 at 9:00 PM, unmesha sreeveni wrote: > For my 5000 recorded

Re: File formats in Hadoop: Sequence files vs AVRO vs RC vs ORC

2013-09-30 Thread Rahul Bhattacharjee
Sequence files are language neutral as Avro. Yes , but not sure about the support of other language lib for processing seq files. Thanks, Rahul On Mon, Sep 30, 2013 at 11:10 PM, Peyman Mohajerian wrote: > It is not recommended to keep the data at rest in sequences format, > because it is Java

Re: How to best decide mapper output/reducer input for a huge string?

2013-09-22 Thread Rahul Bhattacharjee
One mapper is spawned per hbase table region. You can use the admin ui of hbase to find the number of regions per table. It might happen that all the data is sitting in a single region , so a single mapper is spawned and you are not getting enough parallel work getting done. If that is the case th

Re: Help writing a YARN application

2013-09-20 Thread Rahul Bhattacharjee
I tried something , see if this helps. Its incomplete though. https://github.com/brahul/singular Thanks, Rahul On Fri, Sep 20, 2013 at 11:54 PM, Pradeep Gollakota wrote: > Hi All, > > I've been trying to write a Yarn application and I'm completely lost. I'm > using Hadoop 2.0.0-cdh4.4.0 (Cloude

Re: Bzip2 vs Gzip

2013-09-17 Thread Rahul Bhattacharjee
Yes , bzip2 is splittable. Tradeoffs - I have not done much experimentation with codecs. Thanks, Rahul On Wed, Sep 18, 2013 at 2:07 AM, Amit Sela wrote: > Hi all, > I'm using hadoop 1.0.4 and using gzip to keep the logs processed by hadoop > (logs are gzipped into block size files). > I read t

Re: MAP_INPUT_RECORDS counter in the reducer

2013-09-17 Thread Rahul Bhattacharjee
Shahab, One question - You mentioned - "In the normal configuration, the issue here is that Reducers can start before all the Maps have finished so it is not possible to get the number (or make sense of it even if you are able to,)" I think , reducers would start copying the data form the complet

Re: Map JVMs not being terminated after MAP jobs complettion.

2013-09-13 Thread Rahul Bhattacharjee
After the map task finishes , the VM terminates and all resources are reclaimable. Reduces might start pulling data from completed map tasks much before all the maps are finished. However there is a property using which you can delay the reduce step till all the maps are completed. Rahul On Fri

Re: How to ignore empty file comming out from hive map side join

2013-09-13 Thread Rahul Bhattacharjee
See if this makes sense . You mean in the output directory there are some empty files as well. Those might have come from maps which didn't emit any pair. Not sure if there is anything like lazy output formater which will not create the files till its required. Rahul On Fri, Sep 13, 2013 at 5:1

Re: Question related to resource allocation in Yarn!

2013-09-05 Thread Rahul Bhattacharjee
how you are requesting for containers…so that > we can help you better..**** > > ** ** > > Thanks > > Devaraj k > > ** ** > > *From:* Rahul Bhattacharjee [mailto:rahul.rec@gmail.com] > *Sent:* 06 September 2013 09:43 > *To:* user@hadoop.apache.org >

Re: Question related to resource allocation in Yarn!

2013-09-05 Thread Rahul Bhattacharjee
minimumCapability {, memory: 1024, virtual_cores: 1, }, maximumCapability {, memory: 8192, virtual_cores: 32, }, Thanks, Rahul On Thu, Sep 5, 2013 at 8:33 PM, Rahul Bhattacharjee wrote: > Hi, > > I am trying to make a small poc on top of yarn. > > Within the launched application master ,

Question related to resource allocation in Yarn!

2013-09-05 Thread Rahul Bhattacharjee
Hi, I am trying to make a small poc on top of yarn. Within the launched application master , I am trying to request for 50 containers and launch a same task on those allocated containers. My config : AM registration response minimumCapability {, memory: 1024, virtual_cores: 1, }, maximumCapabil

Re: Multidata center support

2013-09-03 Thread Rahul Bhattacharjee
Under replicated blocks are also consistent from a consumers point. Care of explain the relation to weak consistency to hadoop. Thanks, Rahul On Wed, Sep 4, 2013 at 9:56 AM, Rahul Bhattacharjee wrote: > Adam's response makes more sense to me to offline replicate generated data &g

Re: Multidata center support

2013-09-03 Thread Rahul Bhattacharjee
6:26:54 PM > *Subject: *Re: Multidata center support > > > Nothing has changed. DR best practice is still one (or more) clusters per > site and replication is handled via distributed copy or some variation of > it. A cluster spanning multiple data centers is a poor idea right now. >

Re: Is there any way to set Reducer to output to multi-places?

2013-09-02 Thread Rahul Bhattacharjee
This might help http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.html Thanks, Rahul On Mon, Sep 2, 2013 at 2:38 PM, Francis.Hu wrote: > hi, All > > ** ** > > Is there any way to set Reducer to output to multi-places ? For example: > a reducer's

Re: Multidata center support

2013-08-29 Thread Rahul Bhattacharjee
My take on this. Why hadoop has to know about data center thing. I think it can be installed across multiple data centers , however topology configuration would be required to tell which node belongs to which data center and switch for block placement. Thanks, Rahul On Fri, Aug 30, 2013 at 12:4

Re: How Yarn execute MRv1 job?

2013-06-19 Thread Rahul Bhattacharjee
Thanks Arun and Devraj , good to know. On Wed, Jun 19, 2013 at 11:24 AM, Arun C Murthy wrote: > Not true, the CapacityScheduler has support for both CPU & Memory now. > > On Jun 18, 2013, at 10:41 PM, Rahul Bhattacharjee > wrote: > > Hi Devaraj, > > As for the c

Re: How Yarn execute MRv1 job?

2013-06-18 Thread Rahul Bhattacharjee
by please correct , i meant - please correct me if my statement is wrong. On Wed, Jun 19, 2013 at 11:11 AM, Rahul Bhattacharjee < rahul.rec@gmail.com> wrote: > Hi Devaraj, > > As for the container request request for yarn container , currently only > memory is considered

Re: How Yarn execute MRv1 job?

2013-06-18 Thread Rahul Bhattacharjee
Hi Devaraj, As for the container request request for yarn container , currently only memory is considered as resource , not cpu. Please correct. Thanks, Rahul On Wed, Jun 19, 2013 at 11:05 AM, Devaraj k wrote: > Hi Sam, > > Please find the answers for your queries. > > > >- Yarn c

Re: Error in command: bin/hadoop fs -put conf input

2013-06-18 Thread Rahul Bhattacharjee
no data nodes in cluster. go to cluster web portal. Thanks, Rahul On Sun, Jun 16, 2013 at 2:38 AM, sumit piparsania wrote: > Hi, > > I am getting the below error while executing the command. Kindly assist me > in resolving this issue. > > > $ bin/hadoop fs -put conf input > bin/hadoop: line 3

Re: hprof profiler output location

2013-06-18 Thread Rahul Bhattacharjee
In the same directory from which the job has been triggered. Thanks, Rahul On Sun, Jun 16, 2013 at 3:33 PM, YouPeng Yang wrote: > > Hi All > > I want to profile a fraction of the tasks in a job,so I configured my > job as [1]. > However I could not get the hprof profiler output on the ho

Re: Environment variable representing classpath for AM launch

2013-06-17 Thread Rahul Bhattacharjee
N_HOME/share/hadoop/common/*,$HADOOP_COMMON_HOME/share/hadoop/common/lib/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/*,$HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,$HADOOP_YARN_HOME/share/hadoop/yarn/*,$HADOOP_YARN_HOME/share/hadoop/yarn/lib/* > > > > > ** ** > > Thanks > > De

Environment variable representing classpath for AM launch

2013-06-17 Thread Rahul Bhattacharjee
Hi, Is ther's any environment variable (available in all nodes of an Yarn cluster) which represents a java classpath containing all the core jars of yarn. I was thinking to use that variable to setup the environment where to run the application master. Thanks, Rahul

Re: Application errors with one disk on datanode getting filled up to 100%

2013-06-14 Thread Rahul Bhattacharjee
n 2013 16:39:02 +0530 > Subject: Re: Application errors with one disk on datanode getting filled > up to 100% > From: mail2may...@gmail.com > To: user@hadoop.apache.org > > > No, as of this moment we've no ideas about the reasons for that behavior. > > > On Fri, Jun 14, 2013 at

Re: Application errors with one disk on datanode getting filled up to 100%

2013-06-14 Thread Rahul Bhattacharjee
I wasnt aware of data node level balancer procedure , I was thinking about the hdfs balancer . http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F Thanks, Rahul On Fri, Jun 14, 2013 at 5:50 PM, Rahul Bhattacharjee < rahul.

Re: Application errors with one disk on datanode getting filled up to 100%

2013-06-14 Thread Rahul Bhattacharjee
filled > up to 100% > From: mail2may...@gmail.com > To: user@hadoop.apache.org > > > No, as of this moment we've no ideas about the reasons for that behavior. > > > On Fri, Jun 14, 2013 at 4:04 PM, Rahul Bhattacharjee < > rahul.rec@gmail.com> wrote

Re: Application errors with one disk on datanode getting filled up to 100%

2013-06-14 Thread Rahul Bhattacharjee
e_disk.3F) > and also reserved 30 GB of space for non dfs usage via > dfs.datanode.du.reserved and restarted our apps. > > Things have been going fine till now. > > Keeping fingers crossed :) > > > On Wed, Jun 12, 2013 at 12:58 PM, Rahul Bhattacharjee < > rahul.rec.

Re: Assigning the same partition number to the mapper output

2013-06-14 Thread Rahul Bhattacharjee
Some flexibility is there when it comes to changing the name of the output. Check out MultipleOutputs Never used it with a map only job. Thanks, Rahul On Thu, Jun 13, 2013 at 8:33 AM, Maysam Yabandeh wrote: > Hi, > > I was wondering if it is possible in hadoop to assign the same partition > nu

Re: Reducer not getting called

2013-06-13 Thread Rahul Bhattacharjee
The programming error is already mentioned. You are actually not overriding base classes method , rather created a new method. Thanks, Rahul On Thu, Jun 13, 2013 at 11:12 AM, Omkar Joshi wrote: > Ok but that link is broken - can you provide a working one? > > Regards, > Omkar Joshi > > > -O

Re: Now give .gz file as input to the MAP

2013-06-12 Thread Rahul Bhattacharjee
les are splittable but compress-decompress is fast) > > Thanks > Sanjay > > From: Rahul Bhattacharjee > Reply-To: "user@hadoop.apache.org" > Date: Tuesday, June 11, 2013 9:53 PM > To: "user@hadoop.apache.org" > Subject: Re: Now give .gz file as i

Re: Application errors with one disk on datanode getting filled up to 100%

2013-06-12 Thread Rahul Bhattacharjee
I have a few points to make , these may not be very helpful for the said problem. +All data nodes are bad exception is kind of not pointing to the problem related to disk space full. +hadoop.tmp.dir acts as base location of other hadoop related properties , not sure if any particular directory is

Re: Now give .gz file as input to the MAP

2013-06-11 Thread Rahul Bhattacharjee
Nothing special is required for process .gz files using MR. however , as Sanjay mentioned , verify the codec's configured in core-site and another thing to note is that these files are not splittable. You might want to use bz2 , these are splittable. Thanks, Rahul On Wed, Jun 12, 2013 at 10:14

Re: History server - Yarn

2013-06-07 Thread Rahul Bhattacharjee
Thanks Sandy. On Fri, Jun 7, 2013 at 9:29 PM, Sandy Ryza wrote: > Hi Rahul, > > The job history server is currently specific to MapReduce. > > -Sandy > > > On Fri, Jun 7, 2013 at 8:56 AM, Rahul Bhattacharjee < > rahul.rec@gmail.com> wrote: > >

History server - Yarn

2013-06-07 Thread Rahul Bhattacharjee
Hello, I was doing some sort of prototyping on top of YARN. I was able to launch AM and then AM in turn was able to spawn a few containers and do certain job.The yarn application terminated successfully. My question is about the history server. I think the history server is an offering from yarn

Resource manager question - Yarn

2013-06-07 Thread Rahul Bhattacharjee
Hello, I have a basic question related to RM of Yarn. Why is the allocate container request to RM doesn't always return with containers. The call can actually block and return when containers are available. Scenario : I launcher AM. The AM requests for 5 containers. The response returns without

Re: Mapreduce using JSONObjects

2013-06-04 Thread Rahul Bhattacharjee
I agree with Shahab , you have to ensure that the key are writable comparable and values are writable in order to be used in MR. You can have writable comparable implementation wrapping the actual json object. Thanks, Rahul On Wed, Jun 5, 2013 at 5:09 AM, Mischa Tuffield wrote: > Hello, > > O

Re: yarn-site.xml and aux-services

2013-06-04 Thread Rahul Bhattacharjee
Going by what I have read ,I think its a general purpose hook of Yarn arch. to run any service in node managers. Hadoop uses this for shuffle service . Other yarn based applications might use this as well. Thanks, Rahul On Wed, Jun 5, 2013 at 4:00 AM, John Lilley wrote: > I notice the yarn-sit

Re: how to locate the replicas of a file in HDFS?

2013-06-03 Thread Rahul Bhattacharjee
hadoop fsck mytext.txt -files -locations -blocks Thanks, Rahul On Tue, Jun 4, 2013 at 10:19 AM, 一凡 李 wrote: > Hi, > > Could you tell me how to locate where store each replica of a file in HDFS? > > Correctly speaking, if I create a file in HDFS(replicate factor:3),how to > find the DataNodes

Re: How to get the intermediate mapper output file name

2013-06-03 Thread Rahul Bhattacharjee
Thanks Dino , good to know this. On Mon, Jun 3, 2013 at 3:12 PM, Dino Kečo wrote: > Hi Samir, > > File naming is defined in FileOutputFormat class and there is property > mapreduce.output.basename > which you can use to tweak things with file naming. > > Please check this code > http://grepcod

Re: How to get the intermediate mapper output file name

2013-06-03 Thread Rahul Bhattacharjee
I think the format of the mapper and reducer split files are hard wired into hadoop code , however you can prepend something in the beginning of the filename or even a directory using multiple output format. thanks, Rahul On Mon, Jun 3, 2013 at 3:04 PM, samir das mohapatra wrote: > Hi all, >

Re: size of input files

2013-06-02 Thread Rahul Bhattacharjee
Counters can help. Input to mr is a directory. The counters can point to the number of bytes read from that fs directory. Rahul On Sun, Jun 2, 2013 at 11:22 PM, Siddharth Tiwari wrote: > Hi Friends, > > Is there a way to find out what was the size of the input file to each of > the jobs from t

Re: MR2 submit job help

2013-06-01 Thread Rahul Bhattacharjee
file name file.txt or file that you are trying to upload? Have you >> made sure about that? Is any other command working? Have you tried >> copyFromLocal? >> >> Regards, >> Shahab >> >> >> On Sat, Jun 1, 2013 at 4:05 AM, Rahul Bhattacharjee <

Re: MR2 submit job help

2013-06-01 Thread Rahul Bhattacharjee
you should be able to use hadoop fs -put . file in the directory where you are running the command. On Sat, Jun 1, 2013 at 5:31 AM, Shashidhar Rao wrote: > Hi Users, > > Please help me with some documentation on how to submit job in YARN and > upload files in HDFS. Can I still use the MR1 comm

Re: What else can be built on top of YARN.

2013-06-01 Thread Rahul Bhattacharjee
n on to YARN instead of making it work >> in the Map Reduce framework. The main purpose of me using YARN is to >> exploit the resource management capabilities of YARN. >> >> Thanks, >> Kishore >> >> >> On Wed, May 29, 2013 at 11:00 PM, Rahul Bhattach

Re: Reduce side question on MR

2013-06-01 Thread Rahul Bhattacharjee
Thanks Harsh for the response. It very much answers what I was looking for. Regards, Rahul On Wed, May 29, 2013 at 8:10 PM, Rahul Bhattacharjee < rahul.rec@gmail.com> wrote: > Hi, > > I have one question related to the reduce phase of MR jobs. > > The intermediate out

Re: MapReduce on Local FileSystem

2013-05-31 Thread Rahul Bhattacharjee
; NFS-mount. With NFS-mount it is possible to do it. > > ** ** > > Thanks & Regards, > > Nikhil**** > > ** ** > > *From:* Rahul Bhattacharjee [mailto:rahul.rec@gmail.com] > *Sent:* Friday, May 31, 2013 12:33 PM > *To:* user@hadoop.apache.org > *Subj

Re: MapReduce on Local FileSystem

2013-05-31 Thread Rahul Bhattacharjee
Just a hunch. Can have a filer directory mounted to all the DN and then file:/// should be usuable in a distributed fashion. (Just a guess) Thanks, Rahul On Fri, May 31, 2013 at 12:07 PM, Agarwal, Nikhil wrote: > Hi, > > ** ** > > Is it possible to run MapReduce on *multiple nodes* using

Re: Reading json format input

2013-05-29 Thread Rahul Bhattacharjee
Whatever you have mentioned Jamal should work.you can debug this. Thanks, Rahul On Thu, May 30, 2013 at 5:14 AM, jamal sasha wrote: > Hi, > For some reason, this have to be in java :( > I am trying to use org.json library, something like (in mapper) > JSONObject jsn = new JSONObject(value.to

Re: What else can be built on top of YARN.

2013-05-29 Thread Rahul Bhattacharjee
being able to run it on a dynamically > selected resources whichever are available at the time of running the > application. > > Thanks, > Kishore > > > On Wed, May 29, 2013 at 8:04 PM, Rahul Bhattacharjee < > rahul.rec@gmail.com> wrote: > >> Hi all, >

Reduce side question on MR

2013-05-29 Thread Rahul Bhattacharjee
Hi, I have one question related to the reduce phase of MR jobs. The intermediate outputs of map tasks are pulled in from the nodes which ran map tasks to the node where reducers is going to run and those intermediate data is written to the reducers local fs. My question is that if there is a job

What else can be built on top of YARN.

2013-05-29 Thread Rahul Bhattacharjee
Hi all, I was going through the motivation behind Yarn. Splitting the responsibility of JT is the major concern.Ultimately the base (Yarn) was built in a generic way for building other generic distributed applications too. I am not able to think of any other parallel processing use case that woul

Re: issue launching mapreduce job with kerberos secured hadoop

2013-05-28 Thread Rahul Bhattacharjee
The error looks a little low level , network level . The http server for some reason couldn't bind to the port. Might have nothing to do with Kerborose. Thanks, Rahul On Tue, May 28, 2013 at 6:36 PM, Neeraj Chaplot wrote: > Hi All, > > When hadoop started with Kerberos authentication hadoop fs

Re: splittable vs seekable compressed formats

2013-05-24 Thread Rahul Bhattacharjee
Yeah , I think John meant seeking to record boundaries. Thanks, Rahul On Fri, May 24, 2013 at 12:22 PM, Harsh J wrote: > SequenceFiles should be seekable provided you know/manage their sync > points during writes I think. With LZO this may be non-trivial. > > On Thu, May 23, 2013 at 11:01 PM,

Re: splittable vs seekable compressed formats

2013-05-23 Thread Rahul Bhattacharjee
I think seeking is a property of the fs , so any file stored in hdfs is seekable. Inputstream is seekable and outputstream isn't. FileSystem supports seekable. Thanks, Rahul On Thu, May 23, 2013 at 11:01 PM, John Lilley wrote: > I’ve read about splittable compressed formats in Hadoop. Are any

Re: Shuffle phase replication factor

2013-05-22 Thread Rahul Bhattacharjee
There are properties/configuration to control the no. of copying threads for copy. tasktracker.http.threads=40 Thanks, Rahul On Wed, May 22, 2013 at 8:16 PM, John Lilley wrote: > This brings up another nagging question I’ve had for some time. Between > HDFS and shuffle, there seems to be the p

Re: Hadoop Development on cloud in a secure and economical way.

2013-05-21 Thread Rahul Bhattacharjee
Amazon elastic cloud computer. Pay per use Thanks, Rahul On Wed, May 22, 2013 at 11:41 AM, Sai Sai wrote: > > Is it possible to do Hadoop development on cloud in a secure and > economical way without worrying about our source being taken away. We > would like to have Hadoop and eclipse install

Re: Low latency data access Vs High throughput of data

2013-05-21 Thread Rahul Bhattacharjee
Waoh! I know what latency is and what throughput is , but when someone asks me this question , I was never able to answer it to me satisfaction. Now I can. Thanks a lot! On Wed, May 22, 2013 at 12:21 AM, Jens Scheidtmann < jens.scheidtm...@gmail.com> wrote: > Hi Chris, hi Raj, > > in relational

Re: Viewing snappy compressed files

2013-05-21 Thread Rahul Bhattacharjee
I haven't tried this with snappy , but you can try using hadoop fs -text On Tue, May 21, 2013 at 8:28 PM, Robert Rapplean < robert.rappl...@trueffect.com> wrote: > Hey, there. My Google skills have failed me, and I hope someone here can > point me in the right direction. > > ** ** > > We’r

Re: Project ideas

2013-05-21 Thread Rahul Bhattacharjee
You want to add any new simple feature to Hadoop or develop an application using hadoop. Sometime back another university student wanted to add encryption to HDFS.Its just a pointer. Just a problem which might interest your university. Talk to the IT dept of NSU and collect as much as server log

Re: Keep Kerberos credentials valid after logging out

2013-05-21 Thread Rahul Bhattacharjee
I think you can have a keytab file for the user and use that for authentication. It would renew the credentials when it expires. On Tue, May 21, 2013 at 4:01 PM, zheyi rong wrote: > Hi all, > > I would like to run my hadoop job in a bash file for several times, e.g. > #!/usr/bin/env bash > for

Re: HDFS write failures!

2013-05-18 Thread Rahul Bhattacharjee
.DFSOutputStream.java.html > > I believe the assumption here is that the NN should independently discover > the failed node. Also, some failures might not be worthy of being reported > because the DN is expected to recover from them. > > Ravi. > > > -----

HDFS write failures!

2013-05-17 Thread Rahul Bhattacharjee
Hi, I was going through some documents about HDFS write pattern. It looks like the write pipeline is closed when a error is encountered and the faulty node is taken out of the pipeline and the write continues.Few other intermediate steps are to move the un-acked packets from ack queue to the data

Re: Question about writing HDFS files

2013-05-16 Thread Rahul Bhattacharjee
Hi Harsh, I think what John meant by writing to local disk is writing to the same data node first which has initiated the write call. John can further clarify. On Fri, May 17, 2013 at 4:23 AM, Harsh J wrote: > That is not true. HDFS writes are not staged to a local disk first > before being w

Re: no _SUCCESS file in MR output directory.

2013-05-15 Thread Rahul Bhattacharjee
tion to disable that: > > > ... > > > > mapreduce.fileoutputcommitter.marksuccessfuljobs > false > > ... > > ... > > > > -- > *From:* Rahul Bh

Re: Hadoop schedulers!

2013-05-13 Thread Rahul Bhattacharjee
b-1 which requires 30 map slot to finish. But the same time, another > job-2 require only 2 map slots to finish - Here slots will be provided to > job-2 to get finished quickly while job-1 will be keep running. > > > > On Tue, May 14, 2013 at 12:02 AM, Rahul Bhattacharjee < >

Re: Hadoop schedulers!

2013-05-13 Thread Rahul Bhattacharjee
ution or that still waits for the first job to finish? Thanks, Rahul On Sat, May 11, 2013 at 8:31 PM, Rahul Bhattacharjee < rahul.rec@gmail.com> wrote: > Hi, > > I was going through the job schedulers of Hadoop and could not see any > major operational difference between the

Re: Number of records in an HDFS file

2013-05-13 Thread Rahul Bhattacharjee
out manual intervention > > > On Mon, May 13, 2013 at 11:13 AM, Rahul Bhattacharjee < > rahul.rec@gmail.com> wrote: > >> How about the second approach , get the application/job id which the pig >> creates and submits to cluster and then find the job output cou

Re: Number of records in an HDFS file

2013-05-13 Thread Rahul Bhattacharjee
d to copy file from HDFS and then use wc, and > this may take time. Is there a way without copying file from HDFS to local > directory? > > Thanks > > > On Mon, May 13, 2013 at 11:04 AM, Rahul Bhattacharjee < > rahul.rec@gmail.com> wrote: > >> few pointer

Re: Number of records in an HDFS file

2013-05-13 Thread Rahul Bhattacharjee
few pointers. what kind of files are we talking about. for text you can use wc , for avro data files you can use avro-tools. or get the job that pig is generating , get the counters for that job from the jt of your hadoop cluster. Thanks, Rahul On Mon, May 13, 2013 at 11:21 PM, Mix Nin wrote:

Re: Hadoop noob question

2013-05-12 Thread Rahul Bhattacharjee
just a single map task would be fired? As per what I have read , a mapper is launcher for a complete file or a set of files. It doesn't operate at block level.So no parallelism even if the file resides in HDFS. Thanks, Rahul On Sun, May 12, 2013 at 6:28 PM, Rahul Bhattacharjee < r

Re: Hadoop noob question

2013-05-12 Thread Rahul Bhattacharjee
machine. So no multiple TTs. > > Please comment if you think I am wring somewhere. > > Warm Regards, > Tariq > cloudfront.blogspot.com > > > On Sun, May 12, 2013 at 6:15 PM, Rahul Bhattacharjee < > rahul.rec@gmail.com> wrote: > >> Yes , it'

Re: Hadoop noob question

2013-05-12 Thread Rahul Bhattacharjee
pot.com > > > On Sun, May 12, 2013 at 6:00 PM, Rahul Bhattacharjee < > rahul.rec@gmail.com> wrote: > >> Thanks to both of you! >> >> Rahul >> >> >> On Sun, May 12, 2013 at 5:36 PM, Nitin Pawar wrote: >> >>> you can do that u

Re: Need help about task slots

2013-05-12 Thread Rahul Bhattacharjee
sorry for my blunder as well. my previous post for for Tariq in a wrong post. Thanks. Rahul On Sun, May 12, 2013 at 6:03 PM, Rahul Bhattacharjee < rahul.rec@gmail.com> wrote: > Oh! I though distcp works on complete files rather then mappers per > datablock. > So I guess pa

Re: Need help about task slots

2013-05-12 Thread Rahul Bhattacharjee
on your requirement >> you could choose how many mappers and reducers you want to use. With 18 MR >> slots, you could have 9 mappers and 9 reducers or 12 mappers and 9 reducers >> or whatever you think is OK with you. >> >> I don't know if it ,makes much sense, but it

Re: Hadoop noob question

2013-05-12 Thread Rahul Bhattacharjee
Thanks to both of you! Rahul On Sun, May 12, 2013 at 5:36 PM, Nitin Pawar wrote: > you can do that using file:/// > > example: > > hadoop distcp hdfs://localhost:8020/somefile file:///Users/myhome/Desktop/ > > > > > On Sun, May 12, 2013 at 5:23 PM, Rah

Re: Hadoop noob question

2013-05-12 Thread Rahul Bhattacharjee
7;r welcome :) > > Warm Regards, > Tariq > cloudfront.blogspot.com > > > On Sat, May 11, 2013 at 10:46 PM, Rahul Bhattacharjee < > rahul.rec@gmail.com> wrote: > >> Thanks Tariq! >> >> >> On Sat, May 11, 2013 at 10:34 PM, Mohammad Tariq wrot

Re: Hadoop noob question

2013-05-11 Thread Rahul Bhattacharjee
1, 2013 at 9:20 PM, Nitin Pawar >>>> wrote: >>>> >>>>> NN would still be in picture because it will be writing a lot of meta >>>>> data for each individual file. so you will need a NN capable enough which >>>>> can store the

Re: Hadoop noob question

2013-05-11 Thread Rahul Bhattacharjee
se it will be writing a lot of meta >>> data for each individual file. so you will need a NN capable enough which >>> can store the metadata for your entire dataset. Data will never go to NN >>> but lot of metadata about data will be on NN so its always good idea to >&

Re: Need help on coverting Audio files to text

2013-05-11 Thread Rahul Bhattacharjee
. Just like you have OCR or Face Recognition. > > > On Sat, May 11, 2013 at 11:35 AM, Rahul Bhattacharjee < > rahul.rec@gmail.com> wrote: > >> I could not understand the question.Care to elaborate ? >> >> Audio is in binary and so what do you mean

Re: Hadoop noob question

2013-05-11 Thread Rahul Bhattacharjee
@Nitin , parallel dfs to write to hdfs is great , but could not understand the meaning of capable NN. As I know , the NN would not be a part of the actual data write pipeline , means that the data would not travel through the NN , the dfs would contact the NN from time to time to get locations of D

Re: Need help on coverting Audio files to text

2013-05-11 Thread Rahul Bhattacharjee
I could not understand the question.Care to elaborate ? Audio is in binary and so what do you mean by converting it to text? Just for the representation if you mean , then you use base64 for converting any binary to printable characters. Hadoop can be used to parallelize the process of convertin

Re: Need help about task slots

2013-05-11 Thread Rahul Bhattacharjee
Hi, I am also new to Hadoop world , here is my take on your question , if there is something missing then others would surely correct that. For per-YARN , the slots are fixed and computed based on the crunching capacity of the datanode hardware , once the slots per data node is ascertained , they

Hadoop schedulers!

2013-05-11 Thread Rahul Bhattacharjee
Hi, I was going through the job schedulers of Hadoop and could not see any major operational difference between the capacity scheduler and the fair share scheduler apart from the fact that fair share scheduler supports preemption and capacity scheduler doesn't. Another thing is the former creates

Re: get recent changed files in hadoop

2013-05-07 Thread Rahul Bhattacharjee
Is any such option available in other posix shells? On Wednesday, May 8, 2013, Winston Lin wrote: > Any idea to get recent changed file in hadoop? e.g. files created > yesterday? > > fs -ls will only give us all the files. > > Thanks > Winston > -- Sent from Gmail Mobile

Re: no _SUCCESS file in MR output directory.

2013-05-07 Thread Rahul Bhattacharjee
> > > mapreduce.fileoutputcommitter.marksuccessfuljobs > > false > > > > ... > > > > ... > > > > > > > > > > > > From: Rahul Bhattacharjee > > To: "user@hadoop.apach

Re: no _SUCCESS file in MR output directory.

2013-05-06 Thread Rahul Bhattacharjee
>> >> >> mapreduce.fileoutputcommitter.marksuccessfuljobs >> false >> >> ... >> >> ... >> >> >> >> >> >> From: Rahul Bhattacharjee &

Re: Uber Job!

2013-05-06 Thread Rahul Bhattacharjee
l at least create two containers , one for app master and > the other for the map , if uber mode is enabled with the yarn , yarn will > only create 1 container for both app master and the map. > > > 发自我的 iPhone > > 在 2013-5-6,22:45,Rahul Bhattacharjee > 'rahul.rec@gm

Re: Uber Job!

2013-05-06 Thread Rahul Bhattacharjee
rm Regards, > Tariq > https://mtariq.jux.com/ > cloudfront.blogspot.com > > > On Mon, May 6, 2013 at 8:15 PM, Rahul Bhattacharjee < > rahul.rec@gmail.com 'rahul.rec@gmail.com');>> wrote: > >> Hi, >> >> I was going through the definition

Uber Job!

2013-05-06 Thread Rahul Bhattacharjee
Hi, I was going through the definition of Uber Job of Hadoop. A job is considered uber when it has 10 or less maps , one reducer and the complete data is less than one dfs block size. I have some doubts here- Splits are created as per the dfs block size.Creating 10 mappers are possible from one

Re: no _SUCCESS file in MR output directory.

2013-05-06 Thread Rahul Bhattacharjee
file output committer's configuration (mapreduce.fileoutputcommitter. marksuccessfuljobs ) to true.It generates the success file. Wanted to confirm if oozie does the disabling of success file creation. ​Thanks, Rahul​ On Mon, May 6, 2013 at 12:34 PM, Rahul Bhattacharjee < rahul.rec@gma

Re: no _SUCCESS file in MR output directory.

2013-05-06 Thread Rahul Bhattacharjee
plicitly behaving in a certain way, it > may have a good reason to do so thats worth investigating before > toggling. > > On Mon, May 6, 2013 at 12:34 PM, Rahul Bhattacharjee > wrote: > > Oozie is being used for triggering the MR job. Looks like oozie disables > the > >

Re: no _SUCCESS file in MR output directory.

2013-05-06 Thread Rahul Bhattacharjee
Oozie is being used for triggering the MR job. Looks like oozie disables the success file creation using the configuration that you have mentioned for FileOutputCommitter. I have enabled it by setting this property in conf. Rahul On Mon, May 6, 2013 at 9:38 AM, Rahul Bhattacharjee wrote

Re: no _SUCCESS file in MR output directory.

2013-05-05 Thread Rahul Bhattacharjee
th something that > doesn't do success marking. > 3. Job specifically asked to not create such files, via config > mapreduce.fileoutputcommitter.marksuccessfuljobs or so, set to false. > > On Sun, May 5, 2013 at 9:54 PM, Rahul Bhattacharjee > wrote: > > Hi, > > >

Re: Hardware Selection for Hadoop

2013-05-05 Thread Rahul Bhattacharjee
Thanks Mohit and Ted! On Mon, May 6, 2013 at 9:11 AM, Rahul Bhattacharjee wrote: > OK. I do not know if I understand the spindle / core thing. I will dig > more into that. > > Thanks for the info. > > One more thing , whats the significance of multiple NIC. > > Than

Re: Hardware Selection for Hadoop

2013-05-05 Thread Rahul Bhattacharjee
very weak on disk > relative to network speed. The worst problem, however, is likely to be > small memory. This will likely require us to decrease the number of slots > by half or more making it impossible to even use the 6 disks that we have > and making the network even more outra

Re: M/R job optimization

2013-05-05 Thread Rahul Bhattacharjee
I do not think the hint of skewed reducer is the problem here as Han mentioned that he has to wait for 5 minutes after the job shows progress as 100% map and 100% reduce. There may be something to do with the output committer , FileOutputCommitter needs to be looked at as what its doing for 5 min.

Re: Hardware Selection for Hadoop

2013-05-05 Thread Rahul Bhattacharjee
IMHO ,64 G looks bit high for DN. 24 should be good enough for DN. On Tue, Apr 30, 2013 at 12:19 AM, Patai Sangbutsarakum < patai.sangbutsara...@turn.com> wrote: > 2 x Quad cores Intel > 2-3 TB x 6 SATA > 64GB mem > 2 NICs teaming > > my 2 cents > > > On Apr 29, 2013, at 9:24 AM, Raj Hadoop

Re: How can I add a new hard disk in an existing HDFS cluster?

2013-05-05 Thread Rahul Bhattacharjee
I think the question here is as how to add new HDD volumn into an already existing formatted HDFS cluster. Not sure , by just adding the directory in data.dfs.dir would help. On Fri, May 3, 2013 at 3:28 PM, Håvard Wahl Kongsgård < haavard.kongsga...@gmail.com> wrote: > go for ext3 or ext4 > > >

  1   2   >