=true&taskid=attempt_201405161445_0053_r_03_0&filter=stderr
> 14/05/26 10:12:55 INFO mapred.JobClient: Task Id :
> attempt_201405161445_0053_m_000101_1, Status : FAILED
> java.io.IOException: Task process exit with nonzero status of 1.
> at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
>
>
--
Harsh J
y saving any hyphenated words
> from the last line (ignoring hyphenated words that
> cross a split boundary) as long as LineRecordReader guarantees that each
> line in the split is sent to the same mapper in the order read.
> This seems to be the case - right?
>
>
> On Mon, Sep 23
rd is at the end of a split)
If you speak of the LineRecordReader, each map() will simply read a
line, i.e. until \n. It is not language-aware to understand meaning of
hyphens, etc..
You can implement a custom reader to do this however - there should be
no problems so long as your logic covers the
mapred.child.java.opts
> -Xmx2000m
>
>
> However, I don't want to request the administrator to change settings as it
> is a long process.
>
> Is there a way I can ask Hadoop to use more Heap Space in the Slave nodes
> without changing the conf files via some command line parameter?
>
> Thanks & regards
> Arko
--
Harsh J
w
> can I implement the join? Does MultipleInputs can handle this scenario or
> what else i need to do to implement the desired join.
>
>
> --
> Regards
> Akhtar Muhammad Din
--
Harsh J
FS:
> $ cat etc/hadoop/core-site.xml
>
>fs.default.name
> hdfs://zk1.host:9000
> hadoop.proxyuser.myuser.hostszk1.host
>
>hadoop.proxyuser.myuser.groups *
>
>
>
> $ cat etc/hadoop/httpfs-env.sh
> #!/bin/bash
> export HTTPFS_HTTP_PORT=3888
> export HTTPFS_HTTP_HOSTNAME=`hostname -f`
>
>
> --
> Best regards,
>
--
Harsh J
.sh start nodemanager) is the same thing?
>
> --
> Best regards,
--
Harsh J
clusters?
>
> --
> Best regards,
--
Harsh J
--v
> Hdfs data in Cluster2 -> this job reads the data from Cluster1, 2
>
>
> Thanks,
> --
> Best regards,
--
Harsh J
The error that I got
>> is:
>>
>> java.io.IOException: Got error for OP_READ_BLOCK,
>> self=/XXX.XXX.XXX.123:44734,
>>
>> remote=ip-XXX-XXX-XXX-123.eu-west-1.compute.internal/XXX.XXX.XXX.123:50010,
>> for file
>>
>> ip-XXX-XXX-XXX-123.eu-west-1.compute
new FsPermission(JobSubmissionFiles.JOB_FILE_PERMISSION),
> splitVersion,
> info);
> }
>
> 1 - The FSDataOutputStream hangs in the out.close() instruction. Why it
> hangs? What should I do to solve this?
>
>
> --
> Best regards,
--
Harsh J
mmand ,it shows that they are starting once again.
> When i check the logs there are no errors in that.
>
> Could you guys please help me in this
Do you mind sharing the .log and .out files both of the HBase Master alone?
--
Harsh J
6M);
>
> It still doesn't help. Should we do something else? If I enter the
> HADOOP_HEAPSIZE beyond this, it doesn't run the hadoop command and fails to
> instantiate a JVM.
>
> Any comments would be appreciated!
>
> Thank you!
>
> With Regards,
> Abhishek S
--
Harsh J
etClassLoader(classLoader);
> }
>
> I.e. we need to set classloader for job configuration so that it can load
> classes from the jar.
>
> If the above makes sense I will file JIRA with patch, otherwise, what am I
> missing?
>
>
> Thank you,
> Alex Baranau
--
Harsh J
t 9:37 PM, Pj wrote:
> Hey there
>
> were you able to find a resolution to this problem?
>
> --
>
>
>
--
Harsh J
t; Otherway is to just sort all of them in 1 reducer and then do the cat of
> top-N.
>
> Wondering if there is any better approach to do this ?
>
> Regards
> Praveenesh
--
Harsh J
Apache Hadoop CDH4 (95%)
> http://www.thecloudavenue.com/
> http://stackoverflow.com/users/614157/praveen-sripati
>
> If you aren’t taking advantage of big data, then you don’t have big data,
> you have just a pile of data.
>
>
> On Fri, Jan 25, 2013 at 12:52 AM, Harsh J wrot
at the first record is
> incomplete and should process starting from the second record in the block
> (b2)?
>
> Thanks,
> Praveen
--
Harsh J
an 20, 2013 at 4:41 PM, Pedro Sá da Costa wrote:
> This does not save in the xml file. I think this just keep the
> variable in memory.
>
> On 19 January 2013 18:48, Arun C Murthy wrote:
> > jobConf.set(String, String)?
>
>
>
>
> --
> Best regards,
>
--
Harsh J
The patch has not been contributed yet. Upstream at open-mpi there does
seem to be a branch that makes some reference to Hadoop, but I think the
features are yet to be made available there too. Apparently waiting on some
form of a product release first? That's all I could gather from some
sleuthing
We don't sort values (only keys) nor apply any manual limits in MR. Can
your post a reproduceable test case to support your suspicion?
On Jan 16, 2013 4:34 PM, "Utkarsh Gupta" wrote:
> Hi,
>
> Thanks for the response. There was some issues with my code. I have
> checked that in detail.
at 6:51 PM, Pedro Sá da Costa wrote:
> MapReduce framework has map and reduce slots, that are used to track which
> tasks are running. When map tasks are just running, the reduce slots that
> the job have will be filled by map tasks?
>
> --
> Best regards,
--
Harsh J
oreFileScanner.java:104)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:77)
> at
> org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:1408)
> at org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner
>
--
Harsh J
a map
> output file? Is there an API for that?
>
> --
> Best regards,
--
Harsh J
tilize the CPU
>> while IO is going on
>>
>>
>> On Wed, Dec 12, 2012 at 10:47 AM, Harsh J wrote:
>>>
>>> Exactly - A job is already designed to be properly parallel w.r.t. its
>>> input, and this would just add additional overheads of job setup and
);
>> }
>> FileOutputFormat.setOutputPath(job, new Path(x3));
>>
>> try {
>>job.submit();
>> } catch (IOException e) {
>>// TODO Auto-generated catch block
>>e.printStackTrace();
>> } catch (InterruptedException e) {
>>// TODO Auto-generated catch block
>>e.printStackTrace();
>> } catch (ClassNotFoundException e) {
>>// TODO Auto-generated catch block
>>e.printStackTrace();
>> }
>>
>> }
>>
>>
>> public static void main(String args[]){
>> LogProcessApp lpa1=new LogProcessApp(args[0],args[1],args[3]);
>> LogProcessApp lpa2=new LogProcessApp(args[4],args[5],args[6]);
>> LogProcessApp lpa3=new LogProcessApp(args[7],args[8],args[9]);
>> lpa1.start();
>> lpa2.start();
>> lpa3.start();
>> }
>> }
>
>
--
Harsh J
client system rather than opening an ssh
>> channel to the cluster?
>>
>>
>> String hdfshost = "hdfs://MyCluster:9000";
>> conf.set("fs.default.name", hdfshost);
>> String jobTracker = "MyCluster:9001";
>> conf.set("mapred.job.tracker", jobTracker);
>>
>> On the cluster in hdfs
>>
>> --
>> Steven M. Lewis PhD
>> 4221 105th Ave NE
>> Kirkland, WA 98033
>> 206-384-1340 (cell)
>> Skype lordjoe_com
>>
>>
>
>
>
> --
> Harsh J
--
Harsh J
", hdfshost);
> String jobTracker = "MyCluster:9001";
> conf.set("mapred.job.tracker", jobTracker);
>
> On the cluster in hdfs
>
> --
> Steven M. Lewis PhD
> 4221 105th Ave NE
> Kirkland, WA 98033
> 206-384-1340 (cell)
> Skype lordjoe_com
>
>
--
Harsh J
ARN org.apache.hadoop.ipc.Server: Incorrect
>> header or version mismatch from 127.0.0.1:60089 got version 4 expected
>> version 3
>>
>>
>>
>> On 28 November 2012 18:28, Pedro Sá da Costa wrote:
>>> On 28 November 2012 18:12, Harsh J wrote:
>>>
ssion.run(JobProgression.java:98)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
> at org.apache.hadoop.tools.JobProgression.main(JobProgression.java:69)
>
>
>
> On 28 Novemb
; wrote:
>>>
>>> I'm building a Java class and given a JobID, how can I get the
>>> JobInProgress? Can anyone give me an example?
>>>
>>> --
>>> Best regards,
>>
>>
>
>
>
> --
> Best regards,
--
Harsh J
v 28, 2012 at 10:41 PM, Pedro Sá da Costa wrote:
> I'm building a Java class and given a JobID, how can I get the
> JobInProgress? Can anyone give me an example?
>
> --
> Best regards,
--
Harsh J
hat
> they start, ended, the duration of the shuffle. I want as much
> information as the "hadoop job history all" command can give it, but I
> want as the job progress.
>
>
>
> On 28 November 2012 11:32, Harsh J wrote:
>> hadoop job -status
>
>
>
> --
> Best regards,
--
Harsh J
m to print the progress in the terminal?
>
> Thanks,
>
> --
> Best regards,
--
Harsh J
trivial word count process. It is
>> true I am generating a jar for a larger job but only running a version of
>> wordcount that worked well under 0.2
>> Any bright ideas???
>> This is a new 1.03 installation and nothing is known to work
>>
>> Steven M. Lewis PhD
>> 4221 105th Ave NE
>> Kirkland, WA 98033
>> cell 206-384-1340
>> skype lordjoe_com
--
Harsh J
FAILED
> /home/xeon/Projects/hadoop-1.0.3/build.xml:618: Execute failed:
> java.io.IOException: Cannot run program "autoreconf" (in directory
> "/home/xeon/Projects/hadoop-1.0.3/src/native"): java.io.IOException:
> error=2, No such file or directory
>
> What this error means?
>
>
> --
> Best regards,
>
--
Harsh J
-boundary fetch happen
at the bytes level :)
On Wed, Sep 19, 2012 at 10:03 PM, Tim Robertson
wrote:
> Thanks for the explanation HJ - I always meant to look into that bit of code
> to work out how it did it.
>
> Tim
>
>
>
>
> On Wed, Sep 19, 2012 at 6:24 PM, Harsh J wrote:
;> If I've an input file of 640MB in size, and a split size of 64Mb, this
>> file will be partitioned in 10 splits, and each split will be processed by a
>> map task, right?
>>
>> --
>> Best regards,
>>
>
--
Harsh J
; And the pools.xml in $HADOOP_HOME/conf 's content:
>
>
>
>
>72
>16
>20
>3.0
>60
>
>
>
> 9
>2
>10
>2.0
>60
>
>
>
>9
>2
>10
>1.0
>60
>
>
> 60
> 60
>
>
> Can someone help me?
>
>
> 专注于Mysql,MSSQL,Oracle,Hadoop
--
Harsh J
ication" run configuration in eclipse.
>> The "Conection Type" should be Standard (Socket Listen), and the port
>> should be 9987.
>>
>> Happy debugging!
>> Yaron
>>
>>
>>
>>
>> On Sat, Aug 25, 2012 at 10:48 AM, Manoj Babu wrote:
>>>
>>> Hi All,
>>>
>>> how to debug mapreduce programs in pseudo mode?
>>>
>>> Thanks in Advance.
>>
>>
>
--
Harsh J
:57 PM, Pedro Sá da Costa wrote:
> But this solution implies that a user must access the remote machine before
> submit the job. This is not what I want. I want to submit a job in my local
> machine, and it will be forwarded to the remote JobTracker.
>
>
> On 14 August 2012 14:15,
oop activity classes and its
> supporting libraries to all the nodes since i can't create two jar's.
> Is there anyway to do it optimized?
>
>
> Cheers!
> Manoj.
>
>
>
> On Mon, Aug 13, 2012 at 5:20 PM, Harsh J wrote:
>>
>> Sure, you may separate
h is to query the jobtracker for running jobs and go over
> all the input files, in the job XML to know if The swap should block until
> the input path is no longer in any current executed input path job.
>
>
>
>
--
Harsh J
main program i am doing so many
> activities(Reading/writing/updating non hadoop activities) before invoking
> JobClient.runJob(conf);
> Is it anyway to separate the process flow by programmatic instead of going
> for any workflow engine?
>
> Cheers!
> Manoj.
>
>
&g
7;s jar is the one we submitted to hadoop or hadoop will build based
> on the job configuration object?
It is the former, as explained above.
--
Harsh J
VM, it is called setup() again?
>
> I discovered that my setup() method needs 15 seconds to execute.
--
Harsh J
> 2012-08-08 10:22:52,515 WARN
> org.apache.hadoop.security.ShellBasedUnixGroupsMapping: got exception trying
> to get groups for user webuser
> org.apache.hadoop.util.Shell$ExitCodeException: id: webuser: No such user
>
> where do I change the webuser?
>
>
> --
> Best regards,
>
--
Harsh J
enter. Only then, after the
> reducers finish their work, the stand-by mappers get back to life and
> perform their work.
>
>
> On Sun, Aug 5, 2012 at 7:49 PM, Harsh J wrote:
>
>> Sure you can, as we provide pluggable code points via the API. Just write
>> a cust
ext round of map-tasks on the same node will save a lot of communication
> cost.
>
> Thanks,
> Yaron
>
--
Harsh J
uce? What number version is
> it?
>
> ** **
>
> *Andrew Botelho*
>
> EMC Corporation
>
> 55 Constitution Blvd., Franklin, MA
>
> andrew.bote...@emc.com
>
> Mobile: 508-813-2026
>
> ** **
>
--
Harsh J
need the path in order to configure '-javaagent'.
> >
> > Is this currently possible with the distributed cache? If not, is the
> > use case appealing enough to open a jira ticket?
> >
> > Thanks,
> >
> > stan
> >
> >
> > --
> > Arun C. Murthy
> > Hortonworks Inc.
> > http://hortonworks.com/
> >
> >
>
--
Harsh J
Thanks a lot.
>
> But if IdentityMapper is being used shouldn't the job.xml reflect that? But
> Job.xml always shows mapper as our CustomMapper.
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
>
> -Original Message-
> From: Ha
ustom mapper provided
> with the configuration.
>
> This seems like a bug to me . Filed a jira to track this issue
> https://issues.apache.org/jira/browse/MAPREDUCE-4507
>
>
> Regards
> Bejoy KS
--
Harsh J
** **
>
> Saurabh****
>
> ** **
>
> *From:* Harsh J [mailto:ha...@cloudera.com]
> *Sent:* Thursday, August 02, 2012 4:05 PM
> *To:* mapreduce-user@hadoop.apache.org
> *Subject:* Re: All reducers are not being utilized
>
> ** **
>
> Saurabh,
>
> ** **
>
&g
ensure that its electronic
> communications are free from viruses. However, given Internet
> accessibility, the Company cannot accept liability for any virus introduced
> by this e-mail or any attachment and you are advised to use up-to-date
> virus checking software.
>
--
Harsh J
t; But it seems I am not doing things in correct way. Need some
> guidance. Many thanks.
>
> Regards,
> Mohammad Tariq
--
Harsh J
Btw, do speak to Gora folks on fixing or at least documenting this
flaw. I can imagine others hitting the same issue :)
On Mon, Jul 30, 2012 at 9:22 PM, Harsh J wrote:
> I've mostly done it with logging, but this JIRA may interest you if
> you still wish to attach a remote debugg
28, 2012 at 6:20 AM, Sriram Ramachandrasekaran
> wrote:
>>
>> aah! I always thought about setting io.serializations at the job level. I
>> never thought about this. will try this site wide thing. thanks again.
>>
>> On 28 Jul 2012 06:16, "Harsh J" wrote:
ent in first place and not cerate
>> this kind of dictionary, but still can I finish this job with giving more
>> memory in jar command ?
>>
>>
>> Thanks,
>> JJ
>>
--
Harsh J
bigger value for reducers
> to succeed ?
>
> I know we shuold not have this requirement in first place and not cerate
> this kind of dictionary, but still can I finish this job with giving more
> memory in jar command ?
>
>
> Thanks,
> JJ
>
--
Harsh J
28, 2012 at 6:04 AM, Sriram Ramachandrasekaran
wrote:
> okay. But this issue didn't present itself when run in standalone mode. :)
>
> On 28 Jul 2012 06:02, "Harsh J" wrote:
>>
>> I find it easier to run jobs via MRUnit (http://mrunit.apache.org,
>> TD
().
>
> OTOH, i've not tried the job.xml thing. I should give it a try n I shall
> keep the loop posted.
>
> I would also like to hear about standard practices for debugging distributed
> MR tasks.
>
> -
> reply from a hh device. Pl excuse typos n lack of formatt
t work.
>
> I understand that, when SerializationFactory tries to deSerialize
> 'something', it does not find an appropriate unmarshaller and so it fails.
> But, I would like to know a way to find that 'something' and I would like to
> get some idea on how (pseudo) distributed MR jobs should be generally
> debugged. I tried searching, did not find anything useful.
>
> Any help/pointers would be greatly useful.
>
> Thanks!
>
> --
> It's just about how deep your longing is!
>
--
Harsh J
;t attempt to send ANY jobs to my second node
>
> Any clues?
>
> -steve
>
>
> On Fri, Jul 20, 2012 at 11:52 PM, Harsh J wrote:
>>
>> A 2-node cluster is a fully-distributed cluster and cannot use a
>> file:/// FileSystem as thats not a distributed filesystem (u
at
> org.apache.hadoop.mapred.TaskTracker$TaskInProgress.kill(TaskTracker.java:3142)
> at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2440)
> at java.lang.Thread.run(Thread.java:636)
>
> On both systems, ownership of all files directories under /tmp/hadoop-hadoop
> is the user/group hadoop/hadoop.
>
>
> Any ideas?
>
> Thanks
>
>
> --
> Steve Sonnenberg
>
--
Harsh J
gt;
> conf.setInputFormat(TextInputFormat.class);
> conf.setOutputFormat(TextOutputFormat.class);
>
> FileInputFormat.setInputPaths(conf, new Path(args[0]));
> FileOutputFormat.setOutputPath(conf, new Path(args[1]));
>
>
>
> DistributedCache.addCacheFile(new
> Path("/user/ss/cacheFiles/File1.txt").toUri(), conf);
>
> JobClient.runJob(conf);
>
> }// end of main
>
> } // end of class
>
>
> And I put my File1.txt and File2.txt in hdfs as follows:
>
> $HADOOP_HOME/bin/hadoop fs -mkdir input
> $HADOOP_HOME/bin/hadoop fs -mkdir cacheFiles
> $HADOOP_HOME/bin/hadoop fs -put /u/home/File2.txt input
> $HADOOP_HOME/bin/hadoop fs -put /u/home/File1.txt cacheFiles
>
> My problem is that my code compiles fine, but it would just not proceed from
> map 0% reduce 0% stage.
>
> What I am doing wrong?
>
> Any suggestion would be of great help.
>
> Best,
> SS
--
Harsh J
doing, the function should
> probably only depend on the key.
>
> Good luck.
>
> The information contained in this email message is considered confidential
> and proprietary to the sender and is intended solely for review and use by
> the named recipient. Any unauthorized review, use or distribution is strictly
> prohibited. If you have received this message in error, please advise the
> sender by reply email and delete the message.
--
Harsh J
().setOutputKeyClass(ImmutableBytesWritable.class);
> this.getHadoopJob().setOutputValueClass(Put.class);
> }
>
>
>
>
> *
>
> Dan Yi | Software Engineer, Analytics Engineering
> Medio Systems Inc | 701 Pike St. #1500 Seattle, WA 98101
> Predictive Analytics for a Connected World
> *
>
>
--
Harsh J
<>
ssage in my mapred process:
>>
>> java.lang.OutOfMemoryError: Java heap space
>> Dumping heap to java_pid10687.hprof ...
>> Heap dump file created [1385031743 bytes in 30.259 secs]
>>
>> Where do I locate those dumps? Cause I can't find them anywhere.
>>
>>
>> Thanks,
>> Marek M.
>>
--
Harsh J
fact that each file can only be opened for writing once, it
> is very important in this use case to know if the records arrive at the
> OutputFormat in-order so I know it is safe to close file A when I encounter
> a record that belongs in B.
>
>
>
> Sincerely,
>
> Matthew Berry
--
Harsh J
m
> .
>
> I had increased in configuration mapred.tasktracker.reduce.tasks.maximum =
> 2 . Is there a way where we can increase in Program?
>
>
>
> Thanks and Regards,
> S SYED ABDUL KATHER
>
--
Harsh J
uling, so not streaming/java specific).
--
Harsh J
;>
>>> I have some questions related to basic functionality in Hadoop.
>>>
>>> 1. When a Mapper process the intermediate output data, how it knows how
>>> many partitions to do(how many reducers will be) and how much data to go
>>> in
>>> each partition for each reducer ?
>>>
>>> 2. A JobTracker when assigns a task to a reducer, it will also specify
>>> the
>>> locations of intermediate output data where it should retrieve it right ?
>>> But how a reducer will know from each remote location with intermediate
>>> output what portion it has to retrieve only ?
>>>
>>>
>>> To add to Harsh's comment. Essentially the TT *knows* where the output of
>>> a given map-id/reduce-id pair is present via an output-file/index-file
>>> combination.
>>>
>>> Arun
>>>
>>> --
>>> Arun C. Murthy
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>>
>>>
>>>
>>> --
>>> Arun C. Murthy
>>> Hortonworks Inc.
>>> http://hortonworks.com/
>>>
>>>
>>>
>>>
>>>
>>
--
Harsh J
ogram to read all the files from local folder and writing it to HDFS as a
> single file. Is this a right way?
> If there any best practices or optimized way to achieve this Kindly let me
> know.
>
> Thanks in advance!
>
> Cheers!
> Manoj.
>
--
Harsh J
and resubmit it and generally it works.
>
> A couple of times I have seen similar problems with reduce tasks that get
> stuck while 'initializing'.
>
> Any ideas?
>
--
Harsh J
> Is it good go for Mahout XMLInputFormat API ?
>
> Thanks,
> Siv
--
Harsh J
Since my goal is to run an existing Python program, as is,
> under MR, it looks like I need the os.system(copy-local-to-hdfs) technique.
>
> Chuck
>
>
>
> -Original Message-
> From: Harsh J [mailto:ha...@cloudera.com]
> Sent: Thursday, July 12, 2012 1:15 PM
> To:
Python open statement, when running within MR, like this? It does not
> seem to work as intended.
>
> outfile1 = open("hdfs://localhost/tmp/out1.txt", 'w')
>
> Chuck
>
>
>
> -Original Message-
> From: Harsh J [mailto:ha...@cloudera.com]
&g
mentioned in Tom White's Hadoop Definitive Guide but he has not shown
> how to do it. Instead, he moves on to Sequence Files.
>
> I am pretty confused on what is the meaning of processed variable in a
> record reader. Any code example would be of tremendous help.
>
> Thanks in advance..
>
> Cheers!
> Manoj.
>
--
Harsh J
ject: Extra output files from mapper ?
>
>
>
> I am using MapReduce streaming with Python code. It works fine, for basic
> for stdin and stdout.
>
>
>
> But I have a mapper-only application that also emits some other output
> files. So in addition to stdout, the program also creates files named
> output1.txt and output2.txt. My code seems to be running correctly, and I
> suspect the proper output files are being created somewhere, but I cannot
> find them after the job finishes.
>
>
>
> I tried using the –files option to create a link to the location I want the
> file, but no luck. I tried using some of the –jobconf options to change the
> various working directories, but no luck.
>
>
>
> Thank you.
>
>
>
> Chuck Connell
>
> Nuance R&D Data Team
>
> Burlington, MA
>
>
--
Harsh J
answers.
On Thu, Jul 12, 2012 at 2:37 AM, Grandl Robert wrote:
> Hi,
>
> It is possible to write to a HDFS datanode w/o relying on Namenode, i.e. to
> find the location of Datanodes from somewhere else ?
>
> Thanks,
> Robert
--
Harsh J
DH3u3)
>
> Zhu, Guojun
> Modeling Sr Graduate
> 571-3824370
> guojun_...@freddiemac.com
> Financial Engineering
> Freddie Mac
--
Harsh J
into the table, but I am not sure it is
> a good idea.
>
>
>
> Thanks,
>
> Pablo
--
Harsh J
can be handled using WholeFileInputFormat format??I mean, if the file
>>> is very big, then is it feasible to use WholeFileInputFormat as the
>>> entire load will go to one mapper??Many thanks.
>>>
>>> Regards,
>>> Mohammad Tariq
>>
>>
>>
>> --
>> Harsh J
--
Harsh J
apper??Many thanks.
>
> Regards,
> Mohammad Tariq
--
Harsh J
)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
> 12/07/10 16:41:47 INFO mapred.JobClient: map 0% reduce 0%
> 12/07/10 16:41:47 INFO mapred.JobClient: Job complete: job_local_0001
> 12/07/10 16:41:47 INFO mapred.JobClient: Counters: 0
>
> Need some guidance from the experts. Please let me know where I am
> going wrong. Many thanks.
>
> Regards,
> Mohammad Tariq
--
Harsh J
gt; Also How to clean data node?
>
> Thanks in Advance!
>
> Cheers!
> Manoj.
>
>
>
> On Tue, Jul 10, 2012 at 11:58 AM, Harsh J wrote:
>>
>> Manoj,
>>
>> If you change your dfs.name.dir (Which is the right property for
>> 0.20.x/1.x) or dfs.n
hange name node storage directory?[i tried
> hadoop.tmp.dir,hadoop.name.dir but it leads to other issue but reverting
> back works fine]
>
> 2,Can we provide the path of my home directory since my /user and /var
> directories where less in space?
>
>
>
> Cheers!
> Manoj.
>
--
Harsh J
hould retrieve it right ?
>> But how a reducer will know from each remote location with intermediate
>> output what portion it has to retrieve only ?
>>
>>
>> To add to Harsh's comment. Essentially the TT *knows* where the output of
>> a given map-id/reduce-id pair is present via an output-file/index-file
>> combination.
>>
>> Arun
>>
>> --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>>
>>
>
--
Harsh J
t;>
>> I see. I was looking into tasktracker log :).
>>
>> Thanks a lot,
>> Robert
>>
>>
>> From: Harsh J
>> To: Grandl Robert ; mapreduce-user
>>
>> Sent: Sunday, July 8, 2012 9:16 PM
>>
>> Subject: R
ob_1341398677537_0020
> Could not find job job_1341398677537_0020
>
> I tried the application id, but it is invalid.
>
> I'm using CDH4.
>
> ++
> benoit
--
Harsh J
;
> hadoop job -fs hdfs://nn01:8020/ -list
>
> 0 jobs currently running
> JobId State StartTime UserName Priority
> SchedulingInfo
>
>
>
>
--
Harsh J
called and which not. Even more in ReduceTask.java.
>
> Do you have any ideas ?
>
> Thanks a lot for your answer,
> Robert
>
>
> From: Harsh J
> To: mapreduce-user@hadoop.apache.org; Grandl Robert
> Sent: Sunday, July 8, 2012 1:34 AM
&g
k ID is also its
partition ID, so it merely has to ask the data for its own task ID #
and the TT serves, over HTTP, the right parts of the intermediate data
to it.
Feel free to ping back if you need some more clarification! :)
--
Harsh J
> there any possible solution???
--
Harsh J
>
> Thanks in advance
> Subbu
--
Harsh J
upposing I do as you suggest, am I in danger of having their
> list consume all the memory if a user decides to log 2x or 3x as much as
> they did this time?
>
> ~Matt
>
> -Original Message-
> From: Harsh J [mailto:ha...@cloudera.com]
> Sent: Friday, June 29, 2012 6:52
xxx..x.xx.reduce(x.java:26)
> at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
> at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:566)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:408)
> at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:216)
>
> P.S. Already used jmap to dump the heap and trim each object down to its bare
> minimum and to also confirm there are no slow memory leaks.
--
Harsh J
put: FileInputFormat.setInputPaths(job, new
> Path("/folder"));
>
> What happens when the task is running and I write new files in the folder?
> The task receive the new files or not?
>
> Thanks
--
Harsh J
e missed in the configurations to make it work
> properly. I think I should eventually get these messages in the
> nodemanager.out log file?
>
> Thanks in advance,
>
> Sherif
--
Harsh J
1 - 100 of 494 matches
Mail list logo