On a side note, it is 'foul', not 'fowl', which is a whole different animal
(pun intended ;)
Regards,
Shahab
On Wed, Dec 2, 2015 at 4:12 PM, Ted Yu wrote:
> For #1, please see:
> https://issues.apache.org/jira/browse/INFRA-10725
>
> Unfortunately, as of yesterday this
Are you properly implementing the Tool interface?
https://hadoopi.wordpress.com/2013/06/05/hadoop-implementing-the-tool-interface-for-mapreduce-driver/
Also, there needs to be space between -D and the param name.
Regards,
Shahab
On Tue, Oct 6, 2015 at 9:22 AM, Istabrak Abdul-Fatah
What is the different between the mappers? Is the input data suppose to go
to all mappers or it is dependent on the source data?
Regards,
Shahab
On Fri, Aug 21, 2015 at 1:35 PM, ☼ R Nair (रविशंकर नायर)
ravishankar.n...@gmail.com wrote:
All,
I have three mappers, followed by a reducer. I
I am confused. The linked posted above tells you exactly that that how you
interact with hdfs to do various tasks and features with examples.
What else are you looking for?
Regards,
Shahab
On Aug 14, 2015 12:14 AM, Adaryl Bob Wakefield, MBA
adaryl.wakefi...@hotmail.com wrote:
That’s a
This seems to be a Java issue rather than Hadoop?
Have you seen these below, regarding intricacies involved in reading a
resource file in Java jar?
http://javarevisited.blogspot.com/2014/07/how-to-load-resources-from-classpath-in-java-example.html
Ravikant,
How is the output that you sent in the email maps to the one you are are
printing in the code (using SOP statements)?
Where do you see reducer being called again for the same key? Maybe, I am
missing something but the output statements in the code look different.
Regards,
Shahab
On
You asked a similar question earlier also so I will copy those comments
here with what I replied then:
http://hadoop-common.472056.n3.nabble.com/how-to-assign-unique-ID-Long-Value-in-mapper-td4078062.html
Basically, to summarize, you shouldn't incorporate common sharable state
among reducers. You
I see 2 issues here which go kind of against the architecture and idea of
M/R (or distributed and parallel programming models.)
1- The map and reduce tasks are suppose to be shared-nothing and
independent tasks. If you add a functionality like this where you need more
sure that some data is
I think that poster wanted to unsubscribe from the mailing list?
Gopy, if that is the case then please see this for that:
https://hadoop.apache.org/mailing_lists.html
Regards,
Shahab
On Mon, May 18, 2015 at 9:42 AM, xeonmailinglist-gmail
xeonmailingl...@gmail.com wrote:
Why Remove?
On
? Are these both solve the same purpose or something
else ?
Thanks,
On Sat, May 16, 2015 at 8:48 PM, Shahab Yunus shahab.yu...@gmail.com
wrote:
You can either pass them on as command line argument using -D option.
Assuming your job is implementing the standard Tool interface:
https
You can either pass them on as command line argument using -D option.
Assuming your job is implementing the standard Tool interface:
https://hadoop.apache.org/docs/current/api/org/apache/hadoop/util/Tool.html
Or you can set them in the code using the various 'set' methods to set
key/value values
Here are some examples of how to use custom counters:
http://www.ashishpaliwal.com/blog/2012/05/hadoop-recipe-using-custom-java-counters/
Regards,
Shahab
On May 12, 2015 1:29 PM, Shahab Yunus shahab.yu...@gmail.com wrote:
Better options than using static variable are, imo:
One option it use
Better options than using static variable are, imo:
One option it use Counters. Check that API. We are using that for values
that are numeric and we need those in the driver once the job finishes. You
can create your custom counters too.
Other option is (if you need more than just one value or
getLocalCacheFiles is deprecated and can only access files that were
downloaded locally to the node running the task.
Use of getCacheFiles is encouraged now which downloads using a URI.
Have you seen this?
();
threshold = conf.getInt( threshold, -1 );
}
#
Best,
Peter
On 11.05.2015 19:26, Shahab Yunus wrote:
What is the type of the threshold variable? sum I believe is a Java int.
Regards,
Shahab
On Mon, May 11, 2015 at 1:08 PM, Peter Ruch
What is the type of the threshold variable? sum I believe is a Java int.
Regards,
Shahab
On Mon, May 11, 2015 at 1:08 PM, Peter Ruch rutschifen...@gmail.com wrote:
Hi,
I am currently playing around with Hadoop and have some problems when
trying to filter in the Reducer.
I extended the
What version are you using?
Have you seen this?
Regards,
Shahab
On Mon, May 11, 2015 at 5:25 PM, marko.di...@nissatech.com wrote:
Hello,
I'm new to Hadoop and I'm having a problem reading from a sequence file
that I add to distributed cache.
I didn't have problems when I ran it in
Have you tried Storm's mailing list? They would perhaps be able yo guide
you better.
Regards,
Shahab
On May 9, 2015 2:36 AM, mani kandan mankand...@gmail.com wrote:
Hi
I'm new to Storm, and I would like to create a Storm topology to stream
tweets, do analysis and store on hdfs.
Is there a
The reason is that the Json parsing code is in a 3rd party library which is
not included in the default map reduce/hadoop distribution. You have to
add them in your classpath at *runtime*. There are multiple ways to do it
(which also depends upon how you plan to run and package/deploy your code.)
Can you try sudo?
https://www.linux.com/learn/tutorials/306766:linux-101-introduction-to-sudo
Regards,
Shahab
On Wed, Apr 22, 2015 at 8:26 AM, Anand Murali anand_vi...@yahoo.com wrote:
Dear Sandeep:
many thanks. I did find hosts, but I do not have write priveleges,
eventhough I am
You can kill t by using the following yarn command
yarn application -kill application id
https://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/YarnCommands.html
Or use old hadoop job command
http://stackoverflow.com/questions/11458519/how-to-kill-hadoop-jobs
Regards,
Shahab
On
Your package seems different.
Have you tried the following package and class?
org.apache.hadoop.io.compress.BZip2Codec
Regards,
Shahab
On Sun, Apr 5, 2015 at 9:45 AM, xeonmailinglist-gmail
xeonmailingl...@gmail.com wrote:
Hi,
I have run the command [1] to create compressed data from my
I hope I understood your requirement correctly.
If your requirement is to write into multiple folders from the reducers AND
in each folder append the data in the file in that folder, right?
Reducer-output=
folder1/file1
folder2/file2
This can be done with standard MultipleOutputFormat and
On Wed, Apr 1, 2015 at 11:03 AM, Shahab Yunus shahab.yu...@gmail.com
wrote:
As the error tells you, you cannot use a class as a Partitioner if it does
not satisfy the interface requirements of the partitioning mechanism. You
need to set a class a Partitioner which extends or implements
As the error tells you, you cannot use a class as a Partitioner if it does
not satisfy the interface requirements of the partitioning mechanism. You
need to set a class a Partitioner which extends or implements the Partioner
contract.
Regards,
Shahab
On Wed, Apr 1, 2015 at 10:54 AM,
What is the reason of using the queue?
job.getConfiguration().set(mapred.job.queue.name, exp_dsa);
Is your mapper or reducer even been called?
Try adding the override annotation to the map/reduce methods as below:
@Override
public void map(Object key, Text value, Context context) throws
As far as I understand cleanup is called per task. In your case I.e.
per map task. To get an overall count or measure, you need to aggregate
it yourself after the job is done.
One way to do that is to use counters and then merge them programmatically
at the end of the job.
Regards,
Shahab
On
There seem to be some work done on this here:
https://issues.apache.org/jira/browse/HADOOP-9209
3rd party tool:
https://github.com/rdsr/hdfs-checksum
Regards,
Shahab
On Fri, Feb 20, 2015 at 12:39 PM, xeonmailinglist xeonmailingl...@gmail.com
wrote:
Hi,
Is it possible to use SHA-256, or MD5
Nope. You can use the Standalone setup too to test things. Details here:
http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/SingleNodeSetup.html#Standalone_Operation
Regards,
Shahab
On Fri, Feb 20, 2015 at 12:40 AM, Jonathan Aquilina jaquil...@eagleeyet.net
wrote:
Hey
First try:
You should use @Override annotation before map and reduce methods so they
are actually called.
Like this:
*@Override*
public void map(LongWritable k,Text v,Context con)throws
IOException,InterruptedException
{...
Do same for 'reduce' method.
Regards,
Shahab
On Wed, Jan 7,
.
Regards,
Shahab
On Wed, Jan 7, 2015 at 8:18 AM, Shahab Yunus shahab.yu...@gmail.com wrote:
First try:
You should use @Override annotation before map and reduce methods so they
are actually called.
Like this:
*@Override*
public void map(LongWritable k,Text v,Context con)throws
IOException
Distributed Cache has been deprecated for a while. You can use the new
mechanism, which is functionally the same thing, discussed here in this
thread:
http://stackoverflow.com/questions/21239722/hadoop-distributedcache-is-deprecated-what-is-the-preferred-api
Regards,
Shahab
On Mon, Jan 5, 2015
You should not use DistrubutedCache. It is deprecated.
See this:
http://stackoverflow.com/questions/21239722/hadoop-distributedcache-is-deprecated-what-is-the-preferred-api
Regards,
Shahab
On Mon, Dec 22, 2014 at 6:22 AM, Marko Dinic marko.di...@nissatech.com
wrote:
Thanks a lot, it works!
Look at this thread. It has alternatives to DistributedCache.
http://stackoverflow.com/questions/21239722/hadoop-distributedcache-is-deprecated-what-is-the-preferred-api
Basically you can use the new method job.addCacheFiles to pass on stuff to
the individual tasks.
Regards,
Shahab
On Thu, Dec
Check this out:
http://ofirm.wordpress.com/2014/02/01/exploring-the-hdfs-default-value-behaviour/
It seems that the value of *dfs.block.size* is dictated directly by the
client, regarding of the cluster setting. If a value is not specified, the
client just picks the default value. This finding is
Are you asking about the type for the numberOfRuns variable which you are
declaring as a Java primitive int?
If yes, then you can use IntWritable class in Hadoop to define a integer
variable which will work with M/R
Regards,
Shahab
On Tue, Dec 9, 2014 at 3:47 AM, steven commercial...@yahoo.de
Jakub, are you saying that we can't change the mappers per job through the
script, right? Because, otherwise, if invoking through command line or
code, then we can, I think. We do have this property mapreduce.job.maps.
Regards,
Shahab
On Tue, Oct 21, 2014 at 2:42 AM, Jakub Stransky
What aspects of Tez and Spark are you comparing? They have different
purposes and thus not directly comparable, as far as I understand.
Regards,
Shahab
On Fri, Oct 17, 2014 at 2:06 PM, Adaryl Bob Wakefield, MBA
adaryl.wakefi...@hotmail.com wrote:
Does anybody have any performance figures on
It depends on memory settings as well, that how much you want to assign
resources to each container. Then yarn will run as many mappers in parallel
as possible.
See this:
http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/
On Wednesday 15 October 2014 05:45 PM, Shahab Yunus wrote:
It depends on memory settings as well, that how much you want to assign
resources to each container. Then yarn will run as many mappers in parallel
as possible.
See this:
http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0
there is one property as
mapreduce.map.memory.mb = 2*1024 MB
mapreduce.reduce.memory.mb = 2 * 2 = 4*1024 MB
what are these properties mapreduce.map.memory.mb and
mapreduce.reduce.memory.mb
On Wednesday 15 October 2014 06:17 PM, Shahab Yunus wrote:
It cannot run more mappers
this property
On Wednesday 15 October 2014 07:06 PM, Shahab Yunus wrote:
Explanation here.
http://stackoverflow.com/questions/24070557/what-is-the-relation-between-mapreduce-map-memory-mb-and-mapred-map-child-jav
https://support.pivotal.io/hc/en-us/articles/201462036-Mapreduce-YARN-Memory
Your write will not succeed. You will get an exception like could
only be replicated to 0 nodes, instead of 1
More details here:
http://www.bigdataplanet.info/2013/10/Hadoop-Tutorial-Part-4-Write-Operations-in-HDFS.html
Interesting. I thought that the write would fail in case if # of nodes
downs is greater than min-replication property. So in reality we only get a
warning while writing (and a info message through fsck.)
Regards,
Shahab
On Fri, Sep 19, 2014 at 9:26 AM, Abirami V abiramipand...@gmail.com wrote:
. Is there an automatic way to make it?
Or should I write myself a parse?
And regarding the tests on windows, any experience?
Thanks again!!
Best regards,
Blanca
*Von:* Shahab Yunus [mailto:shahab.yu...@gmail.com]
*Gesendet:* Mittwoch, 17. September 2014 17:20
*An:* user
Can you provide the driver code for this job?
Regards,
Shahab
On Wed, Sep 17, 2014 at 10:28 AM, Blanca Hernandez
blanca.hernan...@willhaben.at wrote:
Hi again, I changed the String objects with org.apache.hadoop.io.Text
objects (why is String not accepted?), and now I get another exception,
.
}
}
Best regards,
Blanca
*Von:* Shahab Yunus [mailto:shahab.yu...@gmail.com]
*Gesendet:* Mittwoch, 17. September 2014 16:37
*An:* user@hadoop.apache.org
*Betreff:* Re: ClassCastException on running map-reduce jobs + tests on
Windows (mongo-hadoop)
Can you provide
How did you fix it? And what is your question now?
Regards,
Shahab
On Mon, Sep 15, 2014 at 9:18 AM, YIMEN YIMGA Gael
gael.yimen-yi...@sgcib.com wrote:
Hello Dear Hadoopers,
Just to let you know that, finally I succeded fixing my issue this morning.
Now, I would like to have more
Hi
Have you already looked at the existing documentation?
For apache
http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/SecureMode.html
-For cloudera
http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.6.0/CDH4-Security-Guide/cdh4sg_topic_3.html
Some
*hdfs://latdevweb02:9000/home/hadoop/hadoop/input*
is this is a valid path on hdfs? Can you access this path outside of the
program? For example using hadoop fs -ls command? Also, was this path and
files in it, created by a different user?
The exception seem to say that it does not exist or the
Examples (the top ones are related to streaming jobs):
http://www.infoq.com/articles/HadoopOutputFormat
http://research.neustar.biz/2011/08/30/custom-inputoutput-formats-in-hadoop-streaming/
Perhaps the following? I get the application logs from here after job
completion. This is path on hdfs.
yarn.nodemanager.remote-app-log-dir
Regards,
Shahab
On Tue, Sep 2, 2014 at 4:02 PM, John Lilley john.lil...@redpoint.net
wrote:
We have a YARN task that is core-dumping, and the JVM error
Hello.
I am trying to access custom counters that I have created in an mapreduce
job on Yarn.
After job.waitForCompletion(true) call, I try to do job.getCounters() but I
get a null.
This only happens if I run a heavy job meaning a) a lot of data and b) lot
of reducers.
E.g. for 10million
.
One minor thing that now the job history UI does not show the history with
the error message that max counter increased.
Regards,
Shahab
On Fri, Aug 22, 2014 at 7:59 AM, Shahab Yunus shahab.yu...@gmail.com
wrote:
Hello.
I am trying to access custom counters that I have created in an mapreduce
Have you looked at the WholeFileInputFormat implementations? There are
quite a few if search for them...
http://hadoop-sandy.blogspot.com/2013/02/wholefileinputformat-in-java-hadoop.html
https://github.com/tomwhite/hadoop-book/blob/master/ch07/src/main/java/WholeFileInputFormat.java
Regards,
I couldn't decide that whether it is an HBase question or Hadoop/Yarn.
In the utility class for MR jobs integerated with HBase,
*org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil, *
in the method:
*public static void initTableReducerJob(String table,*
*Class? extends TableReducer
The reason being that when you write something in HDFS, it guarantees that
it will be written to the specified number of replicas. So if your
replication factor is 2 and one of your node (out of 2) is down, then it
cannot guarantee the 'write'.
The way to handle this to have a cluster of more
The '-bin' file does not have the source code (bin for binaries) while the
other does. You can check and see the major difference in the 'src' folders
under the top-level directory after unzipping/untarring.
Regards,
Shahab
On Mon, Jul 21, 2014 at 3:54 AM, Vimal Jain vkj...@gmail.com wrote:
Why it isn't appropriate to discuss too much vendor specific topics on a
vendor-neutral apache mailing list? Checkout this thread:
http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/201309.mbox/%3ccaj1nbzcocw1rsncf3h-ikjkk4uqxqxt7avsj-6nahq_e4dx...@mail.gmail.com%3E
You can always
I am assuming you meant the batch jobs that are/were used in old world for
data cleansing.
As far as I understand there is no hard and fast rule for it and it depends
functional and system requirements of the usecase.
It is also dependent on the technology being used and how it manages
It is not advisable to have many small files in hdfs as it can put memory
load on Namenode as it maintains the metadata, to highlight one major issue.
On the top of my head, some basic ideas...You can either combine invoices
into a bigger text file containing a collection of records where each
The data itself is eventually store in a form of file. Each blocks of the
file and it replicas are stored in files and directories on different
nodes. The Namenode that keep the information and maintains it about each
file and where its blocks (and replicated blocks exist in the cluster.)
As for
On Thu, Jul 17, 2014 at 11:23 AM, Bertrand Dechoux decho...@gmail.com
wrote:
No reason why not. And a permission explains why there is an error :
missing access rights
Bertrand Dechoux
On Thu, Jul 17, 2014 at 4:58 PM, Shahab Yunus shahab.yu...@gmail.com
wrote:
In MRv2 or Yarn
Adding to what Jungi Jeong said, if you can get your hands on the book*
Hadoop: The Definitive Guide *by Tom White, then that would help as well as
it is explains this in significant detail.
Regards,
Shahab
On Thu, Jul 3, 2014 at 6:29 AM, Jungi Jeong jgje...@calab.kaist.ac.kr
wrote:
As far as
Not exactly. There are of course major implementation differences and then
some subtle and high level ones too.
My 2-cents:
Spark is in-memory M/R and it simulated streaming or real-time distributed
process for large datasets by micro-batching. The gain in speed and
performance as opposed to
My personal thoughts on this.
I approach this problem in a different way. Map/Reduce is not a framework
or a technology. It was a paradigm for distributed and parallel processing
which can be implemented in different frameworks and style. So given that,
I don't think there is as such any harm in
To get rid of empty *part files while using MultipleOutputs in the new API,
LazyOutputFormat class' static method should be used to set the output
format.
Details are here at the official Java docs for MultipleOutputs :
I think it takes the entire file as input. Otherwise it won't be any
different from the normal line/record-based input format.
Regards,
Shahab
On Jun 28, 2014 3:28 AM, unmesha sreeveni unmeshab...@gmail.com wrote:
Hi
A small clarification:
WholeFileInputFormat takes the entire input file
For Machine Learning based applications of Hadoop you can check-out Mahout
framework.
Regards,
Shahab
On Mon, Apr 28, 2014 at 10:02 PM, Mohan Radhakrishnan
radhakrishnan.mo...@gmail.com wrote:
Hi,
I have been reading the definitive guide and taking online courses.
Now I would like
Assuming, you are talking about basic stuff...
Michael Noll has some good Hadoop (pre-Yarn) tutorials
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
Then definitely go through the book Hadoop- The Definitive Guide by Tom
White.
You can pass hadoop conf properties through the -D option. Have you seen
this?
http://stackoverflow.com/questions/15490090/how-to-specify-system-property-in-hadoop-except-modify-hadoop-env-sh
This is not for system properties. The assumption is that you want to
specify hadoop conf property
Question: M/R jobs are supposed to run for a long time. They are
essentially batch processes. Do you plan to keep the Web UI blocked for
that while? Or are you looking for asynchronous invocation of the M/R job?
Or are you thinking about building sort of an Admin UI (e.g. PigLipstick)
What exactly
On Fri, Apr 18, 2014 at 11:28 AM, Shahab Yunus shahab.yu...@gmail.comwrote:
You can pass hadoop conf properties through the -D option. Have you seen
this?
http://stackoverflow.com/questions/15490090/how-to-specify-system-property-in-hadoop-except-modify-hadoop-env-sh
,
Girish
On Saturday, April 19, 2014 12:34 AM, Shahab Yunus
shahab.yu...@gmail.com wrote:
Question: M/R jobs are supposed to run for a long time. They are
essentially batch processes. Do you plan to keep the Web UI blocked for
that while? Or are you looking for asynchronous invocation
You can use ILLUSTRATE and EXPLAIN commands to see the execution plan, if
you mean that by 'under the hood algorithm'
http://pig.apache.org/docs/r0.11.1/test.html
Regards,
Shahab
On Fri, Mar 28, 2014 at 5:51 PM, Spark Storm using.had...@gmail.com wrote:
hello experts,
am really new to
@ados1984, HDFS is a file system and HBase is a data store on top of that.
You cannot create tables (in the conventional meaning of the word table in
database/store) directly on HDFS without HBase.
Regards,
Shahab
On Mon, Mar 24, 2014 at 4:11 PM, Geoffry Roberts threadedb...@gmail.comwrote:
If this parameter is at the job level (i.e. for the whole run level) then
you can set this value int the Configuration object to pass it on to the
mappers.
http://www.thecloudavenue.com/2011/11/passing-parameters-to-mappers-and.html
Regards,
Shahab
On Fri, Mar 21, 2014 at 7:08 AM, Ranjini
There is some explanation here as well (in case you haven't check that out
yet):
http://stackoverflow.com/questions/16634294/understanding-the-hadoop-file-system-counters
Regards,
Shahab
On Fri, Mar 14, 2014 at 5:32 AM, Vinayakumar B vinayakuma...@huawei.comwrote:
Its simple,
bytes read
I would suggest that given the level of details that you are looking for
and fundamental nature of your questions, you should get hold of books or
online documentation. Basically some reading/research.
Latest edition of
http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1449311520 is
, 2014 at 3:11 PM, Shahab Yunus shahab.yu...@gmail.comwrote:
I would suggest that given the level of details that you are looking for
and fundamental nature of your questions, you should get hold of books or
online documentation. Basically some reading/research.
Latest edition of
http
Yes it can. It is a configurable property. The exact name might differ
depending on the version though.
Read the details here:
https://hadoop.apache.org/docs/current2/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
You can specify the HDFS path as follows:
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
where Path object is of course the location of your output dir.
See this for details
http://www.rohitmenon.com/index.php/introducing-mapreduce-part-i/
Regards,
Shahab
On Thu, Oct 24, 2013 at
I am assuming that you are talking about user logs? See the following links
for some pointers:
http://grepalex.com/2012/11/12/hadoop-logging/
http://blog.cloudera.com/blog/2010/11/hadoop-log-location-and-retention/
http://hadoop.apache.org/docs/r1.0.4/mapred-default.html (*userlog*
properties)
Have you tried setting *mapred.reduce.tasks *property?
Regards,
Shahab
On Wed, Sep 25, 2013 at 6:01 PM, xeon xeonmailingl...@gmail.com wrote:
is it possible to set the number of reduce tasks in the wordcount example
when I launch the job by command line?
Thanks
this?
On 09/25/2013 11:16 PM, Shahab Yunus wrote:
Have you tried setting *mapred.reduce.tasks *property?
Regards,
Shahab
On Wed, Sep 25, 2013 at 6:01 PM, xeon xeonmailingl...@gmail.com wrote:
is it possible to set the number of reduce tasks in the wordcount example
when I launch the job by command
the reduces can still be executed in a single wave. Ignored when
mapreduce.jobtracker.address is local.
On Sep 25, 2013, at 3:17 PM, xeon xeonmailingl...@gmail.com wrote:
In yarn 2.0.5, where I set this?
On 09/25/2013 11:16 PM, Shahab Yunus wrote:
Have you tried setting *mapred.reduce.tasks
*mapred.jobtracker.restart.recover *is the old API, while the other one is
for new. It is used to specify whether the job should try to resume at
recovering time and when restarting. If you don't want to use it then the
default value of false is used (specified in the already packaged/bundled
*mapred.jobtracker.restart.recover *is the old API, while the other one is
for new. It is used to specify whether the job should try to resume at
recovering time and when restarting. If you don't want to use it then the
default value of false is used (specified in the already packaged/bundled
In the normal configuration, the issue here is that Reducers can start
before all the Maps have finished so it is not possible to get the number
(or make sense of it even if you are able to,)
Having said that, you can specifically make sure that Reducers don't start
until all your maps have
Just a thought, I don't know how much it makes sense, why not run that
program as the user who is allowed to read that directory and you can then
allow that user to write to whichever directory you want to forward your
logs?
Regards,
Shahab
On Sat, Sep 14, 2013 at 6:47 PM, Prashant Kommireddi
I think, in my opinion, it is a wrong idea because:
1- Many of the participants here are employees for these very companies
that are under discussion. This puts these respective employees in very
difficult position. It is very hard to come with a correct response.
Comments can be misconstrued
The temporary file solution will work in a single node configuration, but
I'm not sure about an MPP config.
Let's say Job A runs on nodes 0 and 1 and job B runs on nodes 2 and 3 or
both jobs run on all 4 nodes - will HDFS be able to redistribute
automagically the records between nodes or does
such -D fs.local.block.size is supported in Hadoop 1.1. or not?
Thank you!
Jun
On Tue, Sep 10, 2013 at 11:38 AM, Shahab Yunus shahab.yu...@gmail.comwrote:
can be set at the time I load the file to the HDFS (that is, it is the
client side setting)?
I don't think you can do this while reading
can be set at the time I load the file to the HDFS (that is, it is the
client side setting)?
I don't think you can do this while reading. These are done at the time of
writing.
You can do it like this (the example is for CLI as evident):
hadoop fs -D fs.local.block.size=134217728 -put
I think he means the 'masters' file found only at the master node(s) at
conf/masters.
Details here:
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/#masters-vs-slaves
Regards,
Shahab
On Mon, Sep 9, 2013 at 10:22 AM, Jay Vyas jayunit...@gmail.com wrote:
Check out the 'Map Reduce' section in this link (assuming MRv1):
http://gbif.blogspot.com/2011/01/setting-up-hadoop-cluster-part-1-manual.html
Regards,
Shahab
On Fri, Sep 6, 2013 at 4:34 AM, kun yan yankunhad...@gmail.com wrote:
I'm from the client to connect to a Hadoop cluster, my
Basically pig.jar has hadoop within itself while the other one, as evident
by the name, does not include hadoop.
Details here:
http://hadoopified.wordpress.com/2013/04/07/pig-startup-script-behavior/
Regards,
Shahab
On Fri, Sep 6, 2013 at 11:33 AM, Viswanathan J
Identity Mapper and Reducer just like the concept of Identity function in
mathematics i.e. do not transform the input and return it as it is in
output form. Identity Mapper takes the input key/value pair and spits it
out without any processing.
The case of identity reducer is a bit different. It
Keep in mind that there are 2 flavors of Hadoop: the older one without HA
and the new one with it. Anyway, have you seen the following?
http://wiki.apache.org/hadoop/NameNodeFailover
http://www.youtube.com/watch?v=Ln1GMkQvP9w
that is.
Cheers,
Adi
On Sat, Aug 31, 2013 at 3:42 AM, Shahab Yunus shahab.yu...@gmail.comwrote:
What I meant was that you might have to split or redesign your logic or
your usecase (which we don't know about)?
Regards,
Shahab
On Fri, Aug 30, 2013 at 10:31 PM, Adrian CAPDEFIER
chivas314
1 - 100 of 167 matches
Mail list logo