of the -file option. But it feels very
hacky. It would be nice that Hadoop provides some more formal way to handle
this. Thanks.
BTW: I am using 0.20.2 (CDH3u3)
Zhu, Guojun
Modeling Sr Graduate
571-3824370
guojun_...@freddiemac.com
Financial Engineering
Freddie Mac
--
Harsh J
(Hadoop20Shims.java:68)
at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:451)
... 12 more
--
Harsh J
comment. Essentially the TT *knows* where the output of
a given map-id/reduce-id pair is present via an output-file/index-file
combination.
Arun
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
--
Harsh J
in ReduceTask.java.
Do you have any ideas ?
Thanks a lot for your answer,
Robert
From: Harsh J ha...@cloudera.com
To: mapreduce-user@hadoop.apache.org; Grandl Robert rgra...@yahoo.com
Sent: Sunday, July 8, 2012 1:34 AM
Subject: Re: Basic question on how
:8020/ -list
0 jobs currently running
JobId State StartTime UserNamePriority
SchedulingInfo
--
Harsh J
Could not find job job_1341398677537_0020
I tried the application id, but it is invalid.
I'm using CDH4.
++
benoit
--
Harsh J
into tasktracker log :).
Thanks a lot,
Robert
From: Harsh J ha...@cloudera.com
To: Grandl Robert rgra...@yahoo.com; mapreduce-user
mapreduce-user@hadoop.apache.org
Sent: Sunday, July 8, 2012 9:16 PM
Subject: Re: Basic question on how reducer works
The changes
taking very long time to move data into trash. Can you please help me
how to stop this process of deleting and restart process with skip
trash??
--
Harsh J
Vyas
MMSB/UCHC
--
Harsh J
it merely has to ask the data for its own task ID #
and the TT serves, over HTTP, the right parts of the intermediate data
to it.
Feel free to ping back if you need some more clarification! :)
--
Harsh J
.
Hadoop 1.0.3
hive 0.9.0
flume 1.2.0
Hbase 0.92.1
sqoop 1.4.1
my questions are.
1. the above tools are compatible with all the versions.
2. any tool need to change the version
3. list out all the tools with compatible versions.
Please suggest on this?
--
Harsh J
code.Is
there any possible solution???
--
Harsh J
...@gmail.com wrote:
I have one MBP with 10.7.4 and one laptop with Ubuntu 12.04. Is it possible
to set up a hadoop cluster by such mixed environment?
Best Regards,
--
Welcome to my ET Blog http://www.jdxyw.com
--
Harsh J
:
Thanks, I'll look at that tool.
I still wish to iterate the blocks from the Java interface since I want to
look at their metadata. I'll look at the source code of the command line
tools you mentioned.
Thanks again.
On Jul 6, 2012 9:07 PM, Harsh J ha...@cloudera.com wrote:
Does HDFS's
then is this client decompressing the file for me?
--
Harsh J
are
not sequence file and just a custom binary file.
--
Kai Voigt
k...@123.org
--
Harsh J
of these
blocks must be in a different storage point is there a way to specify this?
Thanks
--
Harsh J
not interested to the
replicas for the moment.
Thanks
Giulio
2012/7/4 Harsh J ha...@cloudera.com
With the default replication/block placement policy, all replicas are
anyway kept in separate nodes. Is that what you want?
Or are you looking to specify a list of locations you want your
to run one by one, and have to wait till the time that at
least 4 slots are available.
is it possible to force hadoop to do this?
thanks!
yang
--
Harsh J
but at some point we add new field4 in
the input. What's the best way to deal with such scenarios? Keep a catalog
of changes that timestamped?
--
Harsh J
to drop to info in log3j while the namenode is running, I
set it to WARN in http://NAMENODE:50070/logLevel but the file is still
showing INFO messages.
Thanks,
Tom Hall
--
Harsh J
reading, is there a way to read K,V entry using the
DataInputStream got from the getMetaBlock().
Thanks
DaRe
--
Harsh J
Subbu
--
Harsh J
the memory if a user decides to log 2x or 3x as much as
they did this time?
~Matt
-Original Message-
From: Harsh J [mailto:ha...@cloudera.com]
Sent: Friday, June 29, 2012 6:52 AM
To: mapreduce-user@hadoop.apache.org
Subject: Re: Map Reduce Theory Question, getting OutOfMemoryError while
.
Regards,
Prajakta
--
Harsh J
-8264 Direct
444 Route 35 South
Building B
Eatontown, NJ 07724 USA
hank.co...@altior.com
www.altior.com
[image: Description: EmailBug]
** **
--
Harsh J
image001.png
eventually get these messages in the
nodemanager.out log file?
Thanks in advance,
Sherif
--
Harsh J
as input: FileInputFormat.setInputPaths(job, new
Path(/folder));
What happens when the task is running and I write new files in the folder?
The task receive the new files or not?
Thanks
--
Harsh J
to record the
position. Is there a better way?
--
Harsh J
?
--
Harsh J
-emptier thread).
--
Harsh J
, at 10:00 AM, Harsh J wrote:
Hi,
On Tue, Jun 26, 2012 at 8:21 PM, Michael Segel
michael_se...@hotmail.com wrote:
Hi,
Yes you can remove a file while there is a node or node(s) being
decommissioned.
I wonder if there's a way to manually clear out the .trash which may also
give you more
??
--
Thanks,
sandeep
--
--
Harsh J
tasktracker and by the
time they get picked, the current tasktracker has no free slots. Wouldnt a
shortest job first scheduling algorithm make a lot more sense w.r.t
throughput and latency?
Best,
Subramanian
--
Harsh J
job has _logs dir that has history dir taking 1 block.
Is it safe to delete such _logs directories as we hv lot of them ?
Thanks,
JJ
--
Harsh J
:132)
I tried this both on 0.20.2 and 1.0.0. Both of them exit with exception
like the above.
Can anyone help me on this?
Thanks!!
Sheng
--
Harsh J
configuration settings I should change? I'm using all the
defaults, for the indexing phase, and I'm not using any custom plugins
either.
Thanks,
Safdar
--
Harsh J
--
Harsh J
, 2097152);
FileInputFormat.setMinInputSplitSize(job, 1048576);
now u can read each file and can use split function as
String record = line.split(,);
On Thu, Jun 14, 2012 at 10:56 AM, Harsh J ha...@cloudera.com wrote:
You may use TextInputFormat with textinputformat.record.delimiter
))? It seems like in this usage of
MapReduce the keyout would be only used when chaining jobs.
-Kevin
--
Harsh J
call with the same parameters, everything goes all right.
But creating instance of Reader for each reduce() call creates big slow
down.
Do you have any idea what am I doing wrong.
Thanks
Ondrej Klimpera
--
Harsh J
that split brain does not occur in hadoop?
Or am I missing any other situation where split brain happens and the
namenode directory is not locked, thus allowing the standby namenode also
to start up?
Has anybody encountered this?
Any help is really appreciated.
Harshad
--
Harsh J
: word unexpected (expecting ))
$
this long list of error just say that the java home properly can some
one help as to what is the error ?? plz :)
--
Harsh J
to query about the list of datanode's ips and ports.
I am aware the default port is 50075, but on this scenario I might have two
versions of Hadoop with datanode running on non-default ports.
Thanks,
Keren
--
Harsh J
and
org.apache.hadoop.mapred.jobcontrol.Job
What's the difference between these?
Which should be used and when? Any suggestions?
Regards
Girish
Ph: +91-9916212114
--
Harsh J
?
Can anyone give me an example on why to use streaming in mapreduce?
Thanks,
Pedro
--
Harsh J
back. - Piet Hein
(via Tom White)
--
Harsh J
?
Thanks for advice.
Ondrej Klimpera
Regards
Bejoy KS
Sent from handheld, please excuse typos.
--
Harsh J
at
http://www.tid.es/ES/PAGINAS/disclaimer.aspx
--
Harsh J
19:57:27 WARN mapred.JobClient: Error reading task
outputhttp:/node1:50060/sklog?plaintext=trueattemptid=attempt_201206141136_0002_m_01_0filter=stdout
Thank you,
--Shamshad
--
Harsh J
,
how to assembly them into 1 Multi Node Cluster?
Because when I search for documentation, i've just got configuration for
Hadoop 0.20.x
Would you mind to assist me?
--
Harsh J
problem.
Greetings,
Mat
2012/6/13 Harsh J ha...@cloudera.com
That is a per job property and you can raise it when submitting a job
itself.
You can pass it via -D args (See
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/util/Tool.html
)
or via a conf.set(name, value) from
to
Hadoop, or it's just user benefit to set this variable.
Thanks for reply.
On 06/14/2012 07:46 AM, Harsh J wrote:
Hi Ondřej,
Due to a new packaging format, the Apache Hadoop 1.x has deprecated
the HADOOP_HOME env-var in favor of a new env-var called
'HADOOP_PREFIX'. You can set
but difficute to debug code.
Each session i have to rebuild file jar into lib and restart hadoop cluster.
How to write new patch in contrib an debug.
--
Harsh J
we are trying to achieve is to get more than 20GB to
be processed on this cluster.
Is there a way how to distribute the data on the cluster?
There is also one shared NFS storage disk with 1TB of available space, which
is now unused.
Thanks for your reply.
Ondrej Klimpera
--
Harsh J
with each having 20 GB hard disk, as replicas will be evenly
distributed across the cluster, right ?
Regards,
Praveenesh
On Thu, Jun 14, 2012 at 7:08 PM, Harsh J ha...@cloudera.com wrote:
Ondřej,
If by processing you mean trying to write out (map outputs) 20 GB of
data per map task, that may
klimp...@fit.cvut.cz wrote:
Hello,
you're right. That's exactly what I ment. And your answer is exactly what I
thought. I was just wondering if Hadoop can distribute the data to other
node's local storages if own local space is full.
Thanks
On 06/14/2012 03:38 PM, Harsh J wrote:
Ondřej
slaves nodes?
Ondrej
On 06/14/2012 04:03 PM, Harsh J wrote:
Ondřej,
That isn't currently possible with local storage FS. Your 1 TB NFS
point can help but I suspect it may act as a slow-down point if nodes
use it in parallel. Perhaps mount it only on 3-4 machines (or less),
instead of all
to configure in browser? Thanks!
--
Best wishes!
My Friend~
--
Harsh J
/browse/MAPREDUCE-4327
On Wed, Jun 6, 2012 at 10:38 PM, Keith Wiley kwi...@keithwiley.com wrote:
On Jun 6, 2012, at 03:42 , Harsh J wrote:
I think mapred.tasktracker.map.tasks.maximum sets the number of map
tasks and not slots.
This is incorrect. The property does configure slots. Please also
that there is a mapred.input.dir key that contains the path
passed as a command line argument and assigned to inputDir in my main
method, but the processed filename within that path is still
inaccessible. Any ideas how to get this?
Thanks,
Mike
--
Harsh J
configuration because it needs more
memory to finish.
We are using CDH3.
Greetings,
Mat
--
Harsh J
another one with a slightly different configuration because it needs more
memory to finish.
We are using CDH3.
Greetings,
Mat
--
Harsh J
--
Harsh J
streaming job, whose mapper is simply bash -c '
ulimit -a ',
I found that it has a 1G virtual memory limit.
why is this??
looks like a Cloudera bug
we are using CDH3U3
Thanks
Yang
--
Harsh J
customloader for my mapreduce jobs, but
even then is it possible to reach out across hdfs nodes if the files
are not aligned with recoird boundaries ?
Thanks,
Prasenjit
--
Sent from my mobile device
--
Harsh J
FUTURO, CONECTADOS A LA REVOLUCION
http://www.uci.cu
http://www.facebook.com/**universidad.ucihttp://www.facebook.com/universidad.uci
http://www.flickr.com/photos/**universidad_ucihttp://www.flickr.com/photos/universidad_uci
--
Thanks Regards,
Anil Gupta
--
Harsh J
, 2012 at 11:11 AM, Ondřej Klimpera klimp...@fit.cvut.cz wrote:
Hello, why when running Hadoop, there is always HADOOP_HOME shell variable
being told to be deprecated. How to set installation directory on cluster
nodes, which variable is correct.
Thanks
Ondrej Klimpera
--
Harsh J
dir
base /
I am running the tests in SuSE 11,
can anyone please tell me what could be the problem
Thanks and Regards
Amith
--
Harsh J
common-user@ and CC'd you.
On Mon, Jun 11, 2012 at 9:46 PM, abhishek dodda
abhishek.dod...@gmail.com wrote:
hi all,
Map side join with distributed cache how to do this? can any one help
me on this.
Regards
Abhishek.
--
Harsh J
this in the UI of cloudera. When I set
it in the config files directly (mapred-site.xml) the value is not used
after a restart.
Does anyone has any experience how those values can be overridden?
Thanks
Markus
--
Harsh J
that. How can we do this so that we can achieve parallelism in a
hadoop cluster. Is there any way to generate multiple input splits from the
single input file.
Thanks
Sharat
--
Harsh J
for this or what can I do to
track the root cause.
Regards,
Arpit Wanchoo
--
Harsh J
.--
--
Harsh J
manages open
files during WAL-recovery using lightweight recoverLease APIs that
were added for its benefit, so it doesn't need to wait for an hour for
WALs to close and recover data.
--
Harsh J
there
might be a bigger problem at play here. Anyone have any thoughts on the
matter?
Thanks,
DR
--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
--
Harsh J
data in other nodes?
--
Harsh J
about 2GB, which is equals to my memory allocation
(mapred.child.java.opts=-Xmx2048m). so it is using twice as much memory as
i expected! why is that?
Thanks,
--
Harsh J
. Is this possible?
Thanks.
Tony Dean
SAS Institute Inc.
Senior Software Developer
919-531-6704
OLE Object: Picture (Device Independent Bitmap)
--
Harsh J
)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
Thank you !
Best,
Huanchen
2012-06-08
huanchen.zhang
--
Harsh J
other distributions out there, the patch at
https://issues.apache.org/jira/browse/HDFS-1560 already makes this a
default value for security.
--
Harsh J
such that it doesn't overload NameNode.
--
Harsh J
for the shuffling that is done in the
reduce task?
http://search-hadoop.com/c/Hadoop:/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/task/reduce/Shuffle.java%7C%7Cshuffle+sort
-sb
-Original Message-
From: Harsh J
...@intel.com wrote:
So I'm assuming that there is a push side also? Is it part of the map output?
-sb
-Original Message-
From: Harsh J [mailto:ha...@cloudera.com]
Sent: Wednesday, June 06, 2012 9:33 AM
To: common-user@hadoop.apache.org
Subject: Re: Shuffle/sort
Sean,
Yes thats the one
intended us to forgo their use.
-- Galileo Galilei
--
https://github.com/zinnia-phatak-dev/Nectar
--
Harsh J
to know the details and see whether my
intuition
is right. If I can find out that in the source code, where should I start
with?
I saw this question online and no one replied to it. does anyone know where I
go to study the source code for the shuffle and sort.
-sean
--
Harsh J
,
waqas
--
Harsh J
FSDataOutputStream#sync which is
actually hflush semantically (data not durable in case of data center
wide power outage). hsync implementation is not yet in 2.0. HDFS-744
just brought hsync in trunk.
__Luke
On Fri, May 25, 2012 at 9:30 AM, Harsh J ha...@cloudera.com wrote:
Mohit,
Not if you
to use Dynamic Priority Scheduler
--
Harsh J
the
inputs and every thing. Any one know what the problem might be?
--
Thanks in advance,
Rohit
--
Harsh J
?
Only thing I worry about is what happens if the server crashes before I am
able to cleanly close the file. Would I lose all previous data?
--
Harsh J
Thanks for following up here Subroto. Lets continue the discussion on the JIRA.
On Wed, May 23, 2012 at 5:07 PM, Subroto ssan...@datameer.com wrote:
Thanks Harsh….
Filed MAPREDUCE-4280 for the same….
Cheers,
Subroto Sanyal
On May 23, 2012, at 1:18 PM, Harsh J wrote:
This is related
What is the idiomatic way to create a Job in hadoop ? And why have the job
constructors been deprecated ?
--
Jay Vyas
MMSB/UCHC
--
Harsh J
the algorithm depends on lots of third-party DLLs, I was wondering would I
call the DLL written in C++ in the Hadoop-version MapReduce by using JNI?
Thanks.
--
YANG, Lin
--
Harsh J
://hdmaster:54310/tmp/temp-1842686846/tmp-2027515206,
It make me confuse because if it has some issues, it would not work,
however it may work. So I need some help, thank you for your help!
Best Regards
Malone
2012-05-23
--
Harsh J
, May 24, 2012 at 1:17 AM, samir das mohapatra
samir.help...@gmail.com wrote:
Hi All,
How to compare to input file In M/R Job.
let A Log file around 30GB
and B Log file size is around 60 GB
I wanted to know how i will define K,V inside the mapper.
Thanks
samir.
--
Harsh
variable in the jobs tracker setup to reduce this
allocation time?
Thanks to all!
Best regards,
Andrés Durán
--
Harsh J
heartbeats, which
should do good on your single 32-task machine.
Do give it a try and let us know!
On Tue, May 22, 2012 at 5:51 PM, Harsh J ha...@cloudera.com wrote:
Hi,
This may be cause, depending on your scheduler, only one Reducer may
be allocated per TT heartbeat. A reasoning of why
)
at
org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:172)
... 7 more
--
Harsh J
.
--
Harsh J
--
Harsh J
1101 - 1200 of 2355 matches
Mail list logo