Re: I am struggling with random job failures with no log traces

2014-05-29 Thread Harsh J
=true&taskid=attempt_201405161445_0053_r_03_0&filter=stderr > 14/05/26 10:12:55 INFO mapred.JobClient: Task Id : > attempt_201405161445_0053_m_000101_1, Status : FAILED > java.io.IOException: Task process exit with nonzero status of 1. > at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418) > > -- Harsh J

Re: A couple of Questions on InputFormat

2013-09-23 Thread Harsh J
y saving any hyphenated words > from the last line (ignoring hyphenated words that > cross a split boundary) as long as LineRecordReader guarantees that each > line in the split is sent to the same mapper in the order read. > This seems to be the case - right? > > > On Mon, Sep 23

Re: A couple of Questions on InputFormat

2013-09-23 Thread Harsh J
rd is at the end of a split) If you speak of the LineRecordReader, each map() will simply read a line, i.e. until \n. It is not language-aware to understand meaning of hyphens, etc.. You can implement a custom reader to do this however - there should be no problems so long as your logic covers the

Re: Increasing Java Heap Space in Slave Nodes

2013-09-07 Thread Harsh J
mapred.child.java.opts > -Xmx2000m > > > However, I don't want to request the administrator to change settings as it > is a long process. > > Is there a way I can ask Hadoop to use more Heap Space in the Slave nodes > without changing the conf files via some command line parameter? > > Thanks & regards > Arko -- Harsh J

Re: MultipleInputs with MongoDb and file

2013-08-04 Thread Harsh J
w > can I implement the join? Does MultipleInputs can handle this scenario or > what else i need to do to implement the desired join. > > > -- > Regards > Akhtar Muhammad Din -- Harsh J

Re: distcp in Hadoop 2.0.4 over http?

2013-06-13 Thread Harsh J
FS: > $ cat etc/hadoop/core-site.xml > >fs.default.name > hdfs://zk1.host:9000 > hadoop.proxyuser.myuser.hostszk1.host > >hadoop.proxyuser.myuser.groups * > > > > $ cat etc/hadoop/httpfs-env.sh > #!/bin/bash > export HTTPFS_HTTP_PORT=3888 > export HTTPFS_HTTP_HOSTNAME=`hostname -f` > > > -- > Best regards, > -- Harsh J

Re: how launch mapred in hadoop 2.0.4?

2013-06-13 Thread Harsh J
.sh start nodemanager) is the same thing? > > -- > Best regards, -- Harsh J

Re: Mapreduce queues

2013-05-27 Thread Harsh J
clusters? > > -- > Best regards, -- Harsh J

Re: Combine data from different HDFS FS

2013-04-08 Thread Harsh J
--v > Hdfs data in Cluster2 -> this job reads the data from Cluster1, 2 > > > Thanks, > -- > Best regards, -- Harsh J

Re: FSDataOutputStream hangs in out.close()

2013-03-27 Thread Harsh J
The error that I got >> is: >> >> java.io.IOException: Got error for OP_READ_BLOCK, >> self=/XXX.XXX.XXX.123:44734, >> >> remote=ip-XXX-XXX-XXX-123.eu-west-1.compute.internal/XXX.XXX.XXX.123:50010, >> for file >> >> ip-XXX-XXX-XXX-123.eu-west-1.compute

Re: FSDataOutputStream hangs in out.close()

2013-03-27 Thread Harsh J
new FsPermission(JobSubmissionFiles.JOB_FILE_PERMISSION), > splitVersion, > info); > } > > 1 - The FSDataOutputStream hangs in the out.close() instruction. Why it > hangs? What should I do to solve this? > > > -- > Best regards, -- Harsh J

Re: Hbase instalation

2013-03-22 Thread Harsh J
mmand ,it shows that they are starting once again. > When i check the logs there are no errors in that. > > Could you guys please help me in this Do you mind sharing the .log and .out files both of the HBase Master alone? -- Harsh J

Re: JAVA heap error for the tasks in mapreduce

2013-03-21 Thread Harsh J
6M); > > It still doesn't help. Should we do something else? If I enter the > HADOOP_HEAPSIZE beyond this, it doesn't run the hadoop command and fails to > instantiate a JVM. > > Any comments would be appreciated! > > Thank you! > > With Regards, > Abhishek S -- Harsh J

Re: Bug in LocalJobRunner?

2013-03-21 Thread Harsh J
etClassLoader(classLoader); > } > > I.e. we need to set classloader for job configuration so that it can load > classes from the jar. > > If the above makes sense I will file JIRA with patch, otherwise, what am I > missing? > > > Thank you, > Alex Baranau -- Harsh J

Re: unable to get simple hadoop streaming example to run

2013-03-07 Thread Harsh J
t 9:37 PM, Pj wrote: > Hey there > > were you able to find a resolution to this problem? > > -- > > > -- Harsh J

Re: how to find top N values using map-reduce ?

2013-02-02 Thread Harsh J
t; Otherway is to just sort all of them in 1 reducer and then do the cat of > top-N. > > Wondering if there is any better approach to do this ? > > Regards > Praveenesh -- Harsh J

Re: How does mapper process partial records?

2013-01-25 Thread Harsh J
Apache Hadoop CDH4 (95%) > http://www.thecloudavenue.com/ > http://stackoverflow.com/users/614157/praveen-sripati > > If you aren’t taking advantage of big data, then you don’t have big data, > you have just a pile of data. > > > On Fri, Jan 25, 2013 at 12:52 AM, Harsh J wrot

Re: How does mapper process partial records?

2013-01-24 Thread Harsh J
at the first record is > incomplete and should process starting from the second record in the block > (b2)? > > Thanks, > Praveen -- Harsh J

Re: Save configuration data in job configuration file.

2013-01-20 Thread Harsh J
an 20, 2013 at 4:41 PM, Pedro Sá da Costa wrote: > This does not save in the xml file. I think this just keep the > variable in memory. > > On 19 January 2013 18:48, Arun C Murthy wrote: > > jobConf.set(String, String)? > > > > > -- > Best regards, > -- Harsh J

Re: MPI and hadoop on same cluster

2013-01-16 Thread Harsh J
The patch has not been contributed yet. Upstream at open-mpi there does seem to be a branch that makes some reference to Hadoop, but I think the features are yet to be made available there too. Apparently waiting on some form of a product release first? That's all I could gather from some sleuthing

RE: Limitation of key-value pairs for a particular key.

2013-01-16 Thread Harsh J
We don't sort values (only keys) nor apply any manual limits in MR. Can your post a reproduceable test case to support your suspicion? On Jan 16, 2013 4:34 PM, "Utkarsh Gupta" wrote: > Hi, > > Thanks for the response. There was some issues with my code. I have > checked that in detail.

Re: Map tasks allocation in reduce slots?

2012-12-29 Thread Harsh J
at 6:51 PM, Pedro Sá da Costa wrote: > MapReduce framework has map and reduce slots, that are used to track which > tasks are running. When map tasks are just running, the reduce slots that > the job have will be filled by map tasks? > > -- > Best regards, -- Harsh J

Re: muti-thread mapreduce

2012-12-26 Thread Harsh J
oreFileScanner.java:104) > at > org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:77) > at > org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:1408) > at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner > -- Harsh J

Re: Map output files and partitions.

2012-12-13 Thread Harsh J
a map > output file? Is there an API for that? > > -- > Best regards, -- Harsh J

Re: muti-thread mapreduce

2012-12-13 Thread Harsh J
tilize the CPU >> while IO is going on >> >> >> On Wed, Dec 12, 2012 at 10:47 AM, Harsh J wrote: >>> >>> Exactly - A job is already designed to be properly parallel w.r.t. its >>> input, and this would just add additional overheads of job setup and

Re: muti-thread mapreduce

2012-12-12 Thread Harsh J
); >> } >> FileOutputFormat.setOutputPath(job, new Path(x3)); >> >> try { >>job.submit(); >> } catch (IOException e) { >>// TODO Auto-generated catch block >>e.printStackTrace(); >> } catch (InterruptedException e) { >>// TODO Auto-generated catch block >>e.printStackTrace(); >> } catch (ClassNotFoundException e) { >>// TODO Auto-generated catch block >>e.printStackTrace(); >> } >> >> } >> >> >> public static void main(String args[]){ >> LogProcessApp lpa1=new LogProcessApp(args[0],args[1],args[3]); >> LogProcessApp lpa2=new LogProcessApp(args[4],args[5],args[6]); >> LogProcessApp lpa3=new LogProcessApp(args[7],args[8],args[9]); >> lpa1.start(); >> lpa2.start(); >> lpa3.start(); >> } >> } > > -- Harsh J

Re: Running from a client machine does not work under 1.03

2012-12-07 Thread Harsh J
client system rather than opening an ssh >> channel to the cluster? >> >> >> String hdfshost = "hdfs://MyCluster:9000"; >> conf.set("fs.default.name", hdfshost); >> String jobTracker = "MyCluster:9001"; >> conf.set("mapred.job.tracker", jobTracker); >> >> On the cluster in hdfs >> >> -- >> Steven M. Lewis PhD >> 4221 105th Ave NE >> Kirkland, WA 98033 >> 206-384-1340 (cell) >> Skype lordjoe_com >> >> > > > > -- > Harsh J -- Harsh J

Re: Running from a client machine does not work under 1.03

2012-12-07 Thread Harsh J
", hdfshost); > String jobTracker = "MyCluster:9001"; > conf.set("mapred.job.tracker", jobTracker); > > On the cluster in hdfs > > -- > Steven M. Lewis PhD > 4221 105th Ave NE > Kirkland, WA 98033 > 206-384-1340 (cell) > Skype lordjoe_com > > -- Harsh J

Re: Get JobInProgress given jobId

2012-11-28 Thread Harsh J
ARN org.apache.hadoop.ipc.Server: Incorrect >> header or version mismatch from 127.0.0.1:60089 got version 4 expected >> version 3 >> >> >> >> On 28 November 2012 18:28, Pedro Sá da Costa wrote: >>> On 28 November 2012 18:12, Harsh J wrote: >>>

Re: Get JobInProgress given jobId

2012-11-28 Thread Harsh J
ssion.run(JobProgression.java:98) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > at org.apache.hadoop.tools.JobProgression.main(JobProgression.java:69) > > > > On 28 Novemb

Re: Get JobInProgress given jobId

2012-11-28 Thread Harsh J
; wrote: >>> >>> I'm building a Java class and given a JobID, how can I get the >>> JobInProgress? Can anyone give me an example? >>> >>> -- >>> Best regards, >> >> > > > > -- > Best regards, -- Harsh J

Re: Get JobInProgress given jobId

2012-11-28 Thread Harsh J
v 28, 2012 at 10:41 PM, Pedro Sá da Costa wrote: > I'm building a Java class and given a JobID, how can I get the > JobInProgress? Can anyone give me an example? > > -- > Best regards, -- Harsh J

Re: Job progress in bash

2012-11-28 Thread Harsh J
hat > they start, ended, the duration of the shuffle. I want as much > information as the "hadoop job history all" command can give it, but I > want as the job progress. > > > > On 28 November 2012 11:32, Harsh J wrote: >> hadoop job -status > > > > -- > Best regards, -- Harsh J

Re: Job progress in bash

2012-11-28 Thread Harsh J
m to print the progress in the terminal? > > Thanks, > > -- > Best regards, -- Harsh J

Re: Hadoopn1.03 There is insufficient memory for the Java Runtime Environment to continue.

2012-10-06 Thread Harsh J
trivial word count process. It is >> true I am generating a jar for a larger job but only running a version of >> wordcount that worked well under 0.2 >> Any bright ideas??? >> This is a new 1.03 installation and nothing is known to work >> >> Steven M. Lewis PhD >> 4221 105th Ave NE >> Kirkland, WA 98033 >> cell 206-384-1340 >> skype lordjoe_com -- Harsh J

Re: Cannot run program "autoreconf"

2012-09-25 Thread Harsh J
FAILED > /home/xeon/Projects/hadoop-1.0.3/build.xml:618: Execute failed: > java.io.IOException: Cannot run program "autoreconf" (in directory > "/home/xeon/Projects/hadoop-1.0.3/src/native"): java.io.IOException: > error=2, No such file or directory > > What this error means? > > > -- > Best regards, > -- Harsh J

Re: splits and maps

2012-09-19 Thread Harsh J
-boundary fetch happen at the bytes level :) On Wed, Sep 19, 2012 at 10:03 PM, Tim Robertson wrote: > Thanks for the explanation HJ - I always meant to look into that bit of code > to work out how it did it. > > Tim > > > > > On Wed, Sep 19, 2012 at 6:24 PM, Harsh J wrote:

Re: splits and maps

2012-09-19 Thread Harsh J
;> If I've an input file of 640MB in size, and a split size of 64Mb, this >> file will be partitioned in 10 splits, and each split will be processed by a >> map task, right? >> >> -- >> Best regards, >> > -- Harsh J

Re: configure hadoop-0.22 fairscheduler

2012-09-07 Thread Harsh J
; And the pools.xml in $HADOOP_HOME/conf 's content: > > > > >72 >16 >20 >3.0 >60 > > > > 9 >2 >10 >2.0 >60 > > > >9 >2 >10 >1.0 >60 > > > 60 > 60 > > > Can someone help me? > > > 专注于Mysql,MSSQL,Oracle,Hadoop -- Harsh J

Re: How to debug

2012-08-26 Thread Harsh J
ication" run configuration in eclipse. >> The "Conection Type" should be Standard (Socket Listen), and the port >> should be 9987. >> >> Happy debugging! >> Yaron >> >> >> >> >> On Sat, Aug 25, 2012 at 10:48 AM, Manoj Babu wrote: >>> >>> Hi All, >>> >>> how to debug mapreduce programs in pseudo mode? >>> >>> Thanks in Advance. >> >> > -- Harsh J

Re: submit a job in a remote jobtracker

2012-08-14 Thread Harsh J
:57 PM, Pedro Sá da Costa wrote: > But this solution implies that a user must access the remote machine before > submit the job. This is not what I want. I want to submit a job in my local > machine, and it will be forwarded to the remote JobTracker. > > > On 14 August 2012 14:15,

Re: doubt on Hadoop job submission process

2012-08-13 Thread Harsh J
oop activity classes and its > supporting libraries to all the nodes since i can't create two jar's. > Is there anyway to do it optimized? > > > Cheers! > Manoj. > > > > On Mon, Aug 13, 2012 at 5:20 PM, Harsh J wrote: >> >> Sure, you may separate

Re: Locks in M/R framework

2012-08-13 Thread Harsh J
h is to query the jobtracker for running jobs and go over > all the input files, in the job XML to know if The swap should block until > the input path is no longer in any current executed input path job. > > > > -- Harsh J

Re: doubt on Hadoop job submission process

2012-08-13 Thread Harsh J
main program i am doing so many > activities(Reading/writing/updating non hadoop activities) before invoking > JobClient.runJob(conf); > Is it anyway to separate the process flow by programmatic instead of going > for any workflow engine? > > Cheers! > Manoj. > > &g

Re: doubt on Hadoop job submission process

2012-08-13 Thread Harsh J
7;s jar is the one we submitted to hadoop or hadoop will build based > on the job configuration object? It is the former, as explained above. -- Harsh J

Re: processing multiple blocks by single JVM

2012-08-12 Thread Harsh J
VM, it is called setup() again? > > I discovered that my setup() method needs 15 seconds to execute. -- Harsh J

Re: Webuser id: no such user

2012-08-08 Thread Harsh J
> 2012-08-08 10:22:52,515 WARN > org.apache.hadoop.security.ShellBasedUnixGroupsMapping: got exception trying > to get groups for user webuser > org.apache.hadoop.util.Shell$ExitCodeException: id: webuser: No such user > > where do I change the webuser? > > > -- > Best regards, > -- Harsh J

Re: Keeping Map-Tasks alive

2012-08-05 Thread Harsh J
enter. Only then, after the > reducers finish their work, the stand-by mappers get back to life and > perform their work. > > > On Sun, Aug 5, 2012 at 7:49 PM, Harsh J wrote: > >> Sure you can, as we provide pluggable code points via the API. Just write >> a cust

Re: Keeping Map-Tasks alive

2012-08-05 Thread Harsh J
ext round of map-tasks on the same node will save a lot of communication > cost. > > Thanks, > Yaron > -- Harsh J

Re: Newest version of Hadoop?

2012-08-03 Thread Harsh J
uce? What number version is > it? > > ** ** > > *Andrew Botelho* > > EMC Corporation > > 55 Constitution Blvd., Franklin, MA > > andrew.bote...@emc.com > > Mobile: 508-813-2026 > > ** ** > -- Harsh J

Re: task jvm bootstrapping via distributed cache

2012-08-03 Thread Harsh J
need the path in order to configure '-javaagent'. > > > > Is this currently possible with the distributed cache? If not, is the > > use case appealing enough to open a jira ticket? > > > > Thanks, > > > > stan > > > > > > -- > > Arun C. Murthy > > Hortonworks Inc. > > http://hortonworks.com/ > > > > > -- Harsh J

Re: Reading fields from a Text line

2012-08-03 Thread Harsh J
Thanks a lot. > > But if IdentityMapper is being used shouldn't the job.xml reflect that? But > Job.xml always shows mapper as our CustomMapper. > > Regards > Bejoy KS > > Sent from handheld, please excuse typos. > > -Original Message- > From: Ha

Re: Reading fields from a Text line

2012-08-03 Thread Harsh J
ustom mapper provided > with the configuration. > > This seems like a bug to me . Filed a jira to track this issue > https://issues.apache.org/jira/browse/MAPREDUCE-4507 > > > Regards > Bejoy KS -- Harsh J

Re: All reducers are not being utilized

2012-08-02 Thread Harsh J
** ** > > Saurabh**** > > ** ** > > *From:* Harsh J [mailto:ha...@cloudera.com] > *Sent:* Thursday, August 02, 2012 4:05 PM > *To:* mapreduce-user@hadoop.apache.org > *Subject:* Re: All reducers are not being utilized > > ** ** > > Saurabh, > > ** ** > &g

Re: All reducers are not being utilized

2012-08-02 Thread Harsh J
ensure that its electronic > communications are free from viruses. However, given Internet > accessibility, the Company cannot accept liability for any virus introduced > by this e-mail or any attachment and you are advised to use up-to-date > virus checking software. > -- Harsh J

Re: Reading fields from a Text line

2012-08-01 Thread Harsh J
t; But it seems I am not doing things in correct way. Need some > guidance. Many thanks. > > Regards, > Mohammad Tariq -- Harsh J

Re: Deserialization issue.

2012-07-30 Thread Harsh J
Btw, do speak to Gora folks on fixing or at least documenting this flaw. I can imagine others hitting the same issue :) On Mon, Jul 30, 2012 at 9:22 PM, Harsh J wrote: > I've mostly done it with logging, but this JIRA may interest you if > you still wish to attach a remote debugg

Re: Deserialization issue.

2012-07-30 Thread Harsh J
28, 2012 at 6:20 AM, Sriram Ramachandrasekaran > wrote: >> >> aah! I always thought about setting io.serializations at the job level. I >> never thought about this. will try this site wide thing. thanks again. >> >> On 28 Jul 2012 06:16, "Harsh J" wrote:

Re: how to set huge memory for reducer in streaming

2012-07-29 Thread Harsh J
ent in first place and not cerate >> this kind of dictionary, but still can I finish this job with giving more >> memory in jar command ? >> >> >> Thanks, >> JJ >> -- Harsh J

Re: how to set huge memory for reducer in streaming

2012-07-29 Thread Harsh J
bigger value for reducers > to succeed ? > > I know we shuold not have this requirement in first place and not cerate > this kind of dictionary, but still can I finish this job with giving more > memory in jar command ? > > > Thanks, > JJ > -- Harsh J

Re: Deserialization issue.

2012-07-27 Thread Harsh J
28, 2012 at 6:04 AM, Sriram Ramachandrasekaran wrote: > okay. But this issue didn't present itself when run in standalone mode. :) > > On 28 Jul 2012 06:02, "Harsh J" wrote: >> >> I find it easier to run jobs via MRUnit (http://mrunit.apache.org, >> TD

Re: Deserialization issue.

2012-07-27 Thread Harsh J
(). > > OTOH, i've not tried the job.xml thing. I should give it a try n I shall > keep the loop posted. > > I would also like to hear about standard practices for debugging distributed > MR tasks. > > - > reply from a hh device. Pl excuse typos n lack of formatt

Re: Deserialization issue.

2012-07-27 Thread Harsh J
t work. > > I understand that, when SerializationFactory tries to deSerialize > 'something', it does not find an appropriate unmarshaller and so it fails. > But, I would like to know a way to find that 'something' and I would like to > get some idea on how (pseudo) distributed MR jobs should be generally > debugged. I tried searching, did not find anything useful. > > Any help/pointers would be greatly useful. > > Thanks! > > -- > It's just about how deep your longing is! > -- Harsh J

Re: Fail to start mapreduce tasks across nodes

2012-07-23 Thread Harsh J
;t attempt to send ANY jobs to my second node > > Any clues? > > -steve > > > On Fri, Jul 20, 2012 at 11:52 PM, Harsh J wrote: >> >> A 2-node cluster is a fully-distributed cluster and cannot use a >> file:/// FileSystem as thats not a distributed filesystem (u

Re: Fail to start mapreduce tasks across nodes

2012-07-20 Thread Harsh J
at > org.apache.hadoop.mapred.TaskTracker$TaskInProgress.kill(TaskTracker.java:3142) > at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2440) > at java.lang.Thread.run(Thread.java:636) > > On both systems, ownership of all files directories under /tmp/hadoop-hadoop > is the user/group hadoop/hadoop. > > > Any ideas? > > Thanks > > > -- > Steve Sonnenberg > -- Harsh J

Re: Comparing input hdfs file to a distributed cache files

2012-07-20 Thread Harsh J
gt; > conf.setInputFormat(TextInputFormat.class); > conf.setOutputFormat(TextOutputFormat.class); > > FileInputFormat.setInputPaths(conf, new Path(args[0])); > FileOutputFormat.setOutputPath(conf, new Path(args[1])); > > > > DistributedCache.addCacheFile(new > Path("/user/ss/cacheFiles/File1.txt").toUri(), conf); > > JobClient.runJob(conf); > > }// end of main > > } // end of class > > > And I put my File1.txt and File2.txt in hdfs as follows: > > $HADOOP_HOME/bin/hadoop fs -mkdir input > $HADOOP_HOME/bin/hadoop fs -mkdir cacheFiles > $HADOOP_HOME/bin/hadoop fs -put /u/home/File2.txt input > $HADOOP_HOME/bin/hadoop fs -put /u/home/File1.txt cacheFiles > > My problem is that my code compiles fine, but it would just not proceed from > map 0% reduce 0% stage. > > What I am doing wrong? > > Any suggestion would be of great help. > > Best, > SS -- Harsh J

Re: Distributing Keys across Reducers

2012-07-20 Thread Harsh J
doing, the function should > probably only depend on the key. > > Good luck. > > The information contained in this email message is considered confidential > and proprietary to the sender and is intended solely for review and use by > the named recipient. Any unauthorized review, use or distribution is strictly > prohibited. If you have received this message in error, please advise the > sender by reply email and delete the message. -- Harsh J

Re: use S3 as input to MR job

2012-07-19 Thread Harsh J
().setOutputKeyClass(ImmutableBytesWritable.class); > this.getHadoopJob().setOutputValueClass(Put.class); > } > > > > > * > > Dan Yi | Software Engineer, Analytics Engineering > Medio Systems Inc | 701 Pike St. #1500 Seattle, WA 98101 > Predictive Analytics for a Connected World > * > > -- Harsh J <>

Re: location of Java heap dumps

2012-07-19 Thread Harsh J
ssage in my mapred process: >> >> java.lang.OutOfMemoryError: Java heap space >> Dumping heap to java_pid10687.hprof ... >> Heap dump file created [1385031743 bytes in 30.259 secs] >> >> Where do I locate those dumps? Cause I can't find them anywhere. >> >> >> Thanks, >> Marek M. >> -- Harsh J

Re: OutputFormat Theory Question

2012-07-19 Thread Harsh J
fact that each file can only be opened for writing once, it > is very important in this use case to know if the records arrive at the > OutputFormat in-order so I know it is safe to close file A when I encounter > a record that belongs in B. > > > > Sincerely, > > Matthew Berry -- Harsh J

Re: Basic Question : How to increase Reducer Task in HBASE Map Reduce

2012-07-18 Thread Harsh J
m > . > > I had increased in configuration mapred.tasktracker.reduce.tasks.maximum = > 2 . Is there a way where we can increase in Program? > > > > Thanks and Regards, > S SYED ABDUL KATHER > -- Harsh J

Re: Basic question on how reducer works

2012-07-14 Thread Harsh J
uling, so not streaming/java specific). -- Harsh J

Re: Basic question on how reducer works

2012-07-13 Thread Harsh J
;> >>> I have some questions related to basic functionality in Hadoop. >>> >>> 1. When a Mapper process the intermediate output data, how it knows how >>> many partitions to do(how many reducers will be) and how much data to go >>> in >>> each partition for each reducer ? >>> >>> 2. A JobTracker when assigns a task to a reducer, it will also specify >>> the >>> locations of intermediate output data where it should retrieve it right ? >>> But how a reducer will know from each remote location with intermediate >>> output what portion it has to retrieve only ? >>> >>> >>> To add to Harsh's comment. Essentially the TT *knows* where the output of >>> a given map-id/reduce-id pair is present via an output-file/index-file >>> combination. >>> >>> Arun >>> >>> -- >>> Arun C. Murthy >>> Hortonworks Inc. >>> http://hortonworks.com/ >>> >>> >>> >>> >>> >>> -- >>> Arun C. Murthy >>> Hortonworks Inc. >>> http://hortonworks.com/ >>> >>> >>> >>> >>> >> -- Harsh J

Re: suggest Best way to upload xml files to HDFS

2012-07-12 Thread Harsh J
ogram to read all the files from local folder and writing it to HDFS as a > single file. Is this a right way? > If there any best practices or optimized way to achieve this Kindly let me > know. > > Thanks in advance! > > Cheers! > Manoj. > -- Harsh J

Re: Jobs randomly not starting

2012-07-12 Thread Harsh J
and resubmit it and generally it works. > > A couple of times I have seen similar problems with reduce tasks that get > stuck while 'initializing'. > > Any ideas? > -- Harsh J

Re: StreamXMLRecordReader

2012-07-12 Thread Harsh J
> Is it good go for Mahout XMLInputFormat API ? > > Thanks, > Siv -- Harsh J

Re: Extra output files from mapper ?

2012-07-12 Thread Harsh J
Since my goal is to run an existing Python program, as is, > under MR, it looks like I need the os.system(copy-local-to-hdfs) technique. > > Chuck > > > > -Original Message- > From: Harsh J [mailto:ha...@cloudera.com] > Sent: Thursday, July 12, 2012 1:15 PM > To:

Re: Extra output files from mapper ?

2012-07-12 Thread Harsh J
Python open statement, when running within MR, like this? It does not > seem to work as intended. > > outfile1 = open("hdfs://localhost/tmp/out1.txt", 'w') > > Chuck > > > > -Original Message- > From: Harsh J [mailto:ha...@cloudera.com] &g

Re: How to use CombineFileInputFormat in Hadoop?

2012-07-12 Thread Harsh J
mentioned in Tom White's Hadoop Definitive Guide but he has not shown > how to do it. Instead, he moves on to Sequence Files. > > I am pretty confused on what is the meaning of processed variable in a > record reader. Any code example would be of tremendous help. > > Thanks in advance.. > > Cheers! > Manoj. > -- Harsh J

Re: Extra output files from mapper ?

2012-07-12 Thread Harsh J
ject: Extra output files from mapper ? > > > > I am using MapReduce streaming with Python code. It works fine, for basic > for stdin and stdout. > > > > But I have a mapper-only application that also emits some other output > files. So in addition to stdout, the program also creates files named > output1.txt and output2.txt. My code seems to be running correctly, and I > suspect the proper output files are being created somewhere, but I cannot > find them after the job finishes. > > > > I tried using the –files option to create a link to the location I want the > file, but no luck. I tried using some of the –jobconf options to change the > various working directories, but no luck. > > > > Thank you. > > > > Chuck Connell > > Nuance R&D Data Team > > Burlington, MA > > -- Harsh J

Re: descentralized write operation in HDFS

2012-07-11 Thread Harsh J
answers. On Thu, Jul 12, 2012 at 2:37 AM, Grandl Robert wrote: > Hi, > > It is possible to write to a HDFS datanode w/o relying on Namenode, i.e. to > find the location of Datanodes from somewhere else ? > > Thanks, > Robert -- Harsh J

Re: equivalent of "-file" option in the programmatic call, (access jobID before submit())

2012-07-10 Thread Harsh J
DH3u3) > > Zhu, Guojun > Modeling Sr Graduate > 571-3824370 > guojun_...@freddiemac.com > Financial Engineering > Freddie Mac -- Harsh J

Re: Writting to HBASE in map phase

2012-07-10 Thread Harsh J
into the table, but I am not sure it is > a good idea. > > > > Thanks, > > Pablo -- Harsh J

Re: WholeFileInputFormat format

2012-07-10 Thread Harsh J
can be handled using WholeFileInputFormat format??I mean, if the file >>> is very big, then is it feasible to use WholeFileInputFormat as the >>> entire load will go to one mapper??Many thanks. >>> >>> Regards, >>> Mohammad Tariq >> >> >> >> -- >> Harsh J -- Harsh J

Re: WholeFileInputFormat format

2012-07-10 Thread Harsh J
apper??Many thanks. > > Regards, > Mohammad Tariq -- Harsh J

Re: Emitting Java Collection as mapper output

2012-07-10 Thread Harsh J
) > at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) > 12/07/10 16:41:47 INFO mapred.JobClient: map 0% reduce 0% > 12/07/10 16:41:47 INFO mapred.JobClient: Job complete: job_local_0001 > 12/07/10 16:41:47 INFO mapred.JobClient: Counters: 0 > > Need some guidance from the experts. Please let me know where I am > going wrong. Many thanks. > > Regards, > Mohammad Tariq -- Harsh J

Re: How to change name node storage directory?

2012-07-10 Thread Harsh J
gt; Also How to clean data node? > > Thanks in Advance! > > Cheers! > Manoj. > > > > On Tue, Jul 10, 2012 at 11:58 AM, Harsh J wrote: >> >> Manoj, >> >> If you change your dfs.name.dir (Which is the right property for >> 0.20.x/1.x) or dfs.n

Re: How to change name node storage directory?

2012-07-09 Thread Harsh J
hange name node storage directory?[i tried > hadoop.tmp.dir,hadoop.name.dir but it leads to other issue but reverting > back works fine] > > 2,Can we provide the path of my home directory since my /user and /var > directories where less in space? > > > > Cheers! > Manoj. > -- Harsh J

Re: Basic question on how reducer works

2012-07-09 Thread Harsh J
hould retrieve it right ? >> But how a reducer will know from each remote location with intermediate >> output what portion it has to retrieve only ? >> >> >> To add to Harsh's comment. Essentially the TT *knows* where the output of >> a given map-id/reduce-id pair is present via an output-file/index-file >> combination. >> >> Arun >> >> -- >> Arun C. Murthy >> Hortonworks Inc. >> http://hortonworks.com/ >> >> > -- Harsh J

Re: Basic question on how reducer works

2012-07-08 Thread Harsh J
t;> >> I see. I was looking into tasktracker log :). >> >> Thanks a lot, >> Robert >> >> >> From: Harsh J >> To: Grandl Robert ; mapreduce-user >> >> Sent: Sunday, July 8, 2012 9:16 PM >> >> Subject: R

Re: mapred yarn kill job/application

2012-07-08 Thread Harsh J
ob_1341398677537_0020 > Could not find job job_1341398677537_0020 > > I tried the application id, but it is invalid. > > I'm using CDH4. > > ++ > benoit -- Harsh J

Re: Basic usage of hadoop job -list

2012-07-08 Thread Harsh J
; > hadoop job -fs hdfs://nn01:8020/ -list > > 0 jobs currently running > JobId State StartTime UserName Priority > SchedulingInfo > > > > -- Harsh J

Re: Basic question on how reducer works

2012-07-08 Thread Harsh J
called and which not. Even more in ReduceTask.java. > > Do you have any ideas ? > > Thanks a lot for your answer, > Robert > > > From: Harsh J > To: mapreduce-user@hadoop.apache.org; Grandl Robert > Sent: Sunday, July 8, 2012 1:34 AM &g

Re: Basic question on how reducer works

2012-07-07 Thread Harsh J
k ID is also its partition ID, so it merely has to ask the data for its own task ID # and the TT serves, over HTTP, the right parts of the intermediate data to it. Feel free to ping back if you need some more clarification! :) -- Harsh J

Re: sequence_no generation in pig

2012-07-06 Thread Harsh J
> there any possible solution??? -- Harsh J

Re: Operations after killing a job

2012-06-30 Thread Harsh J
> > Thanks in advance > Subbu -- Harsh J

Re: Map Reduce Theory Question, getting OutOfMemoryError while reducing

2012-06-29 Thread Harsh J
upposing I do as you suggest, am I in danger of having their > list consume all the memory if a user decides to log 2x or 3x as much as > they did this time? > > ~Matt > > -Original Message- > From: Harsh J [mailto:ha...@cloudera.com] > Sent: Friday, June 29, 2012 6:52

Re: Map Reduce Theory Question, getting OutOfMemoryError while reducing

2012-06-29 Thread Harsh J
xxx..x.xx.reduce(x.java:26) >        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176) >        at > org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566) >        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408) >        at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216) > > P.S. Already used jmap to dump the heap and trim each object down to its bare > minimum and to also confirm there are no slow memory leaks. -- Harsh J

Re: Folder as Input and update the folder while the task is running

2012-06-27 Thread Harsh J
put: FileInputFormat.setInputPaths(job, new > Path("/folder")); > > What happens when the task is running and I write new files in the folder? > The task receive the new files or not? > > Thanks -- Harsh J

Re: Hadoop yarn tasklevel logging with log4j

2012-06-27 Thread Harsh J
e missed in the configurations to make it work > properly. I think I should eventually get these messages in the > nodemanager.out log file? > > Thanks in advance, > > Sherif -- Harsh J

  1   2   3   4   5   >