Re: 2 items - RE: Unsubscribe & A better ListSrv for beginners.

2015-12-02 Thread Shahab Yunus
On a side note, it is 'foul', not 'fowl', which is a whole different animal (pun intended ;) Regards, Shahab On Wed, Dec 2, 2015 at 4:12 PM, Ted Yu wrote: > For #1, please see: > https://issues.apache.org/jira/browse/INFRA-10725 > > Unfortunately, as of yesterday this

Re: Passing Args to Mappers and Reducers

2015-10-06 Thread Shahab Yunus
Are you properly implementing the Tool interface? https://hadoopi.wordpress.com/2013/06/05/hadoop-implementing-the-tool-interface-for-mapreduce-driver/ Also, there needs to be space between -D and the param name. Regards, Shahab On Tue, Oct 6, 2015 at 9:22 AM, Istabrak Abdul-Fatah

Re: Chaining MapReduce

2015-08-21 Thread Shahab Yunus
What is the different between the mappers? Is the input data suppose to go to all mappers or it is dependent on the source data? Regards, Shahab On Fri, Aug 21, 2015 at 1:35 PM, ☼ R Nair (रविशंकर नायर) ravishankar.n...@gmail.com wrote: All, I have three mappers, followed by a reducer. I

Re: hdfs commands tutorial

2015-08-13 Thread Shahab Yunus
I am confused. The linked posted above tells you exactly that that how you interact with hdfs to do various tasks and features with examples. What else are you looking for? Regards, Shahab On Aug 14, 2015 12:14 AM, Adaryl Bob Wakefield, MBA adaryl.wakefi...@hotmail.com wrote: That’s a

Re: Fwd: Properties file not loaded with hadoop jar command

2015-07-17 Thread Shahab Yunus
This seems to be a Java issue rather than Hadoop? Have you seen these below, regarding intricacies involved in reading a resource file in Java jar? http://javarevisited.blogspot.com/2014/07/how-to-load-resources-from-classpath-in-java-example.html

Re: Reducer called twice for same key

2015-06-29 Thread Shahab Yunus
Ravikant, How is the output that you sent in the email maps to the one you are are printing in the code (using SOP statements)? Where do you see reducer being called again for the same key? Maybe, I am missing something but the output statements in the code look different. Regards, Shahab On

Re: Static variable in reducer

2015-06-28 Thread Shahab Yunus
You asked a similar question earlier also so I will copy those comments here with what I replied then: http://hadoop-common.472056.n3.nabble.com/how-to-assign-unique-ID-Long-Value-in-mapper-td4078062.html Basically, to summarize, you shouldn't incorporate common sharable state among reducers. You

Re: how to assign unique ID (Long Value) in mapper

2015-06-26 Thread Shahab Yunus
I see 2 issues here which go kind of against the architecture and idea of M/R (or distributed and parallel programming models.) 1- The map and reduce tasks are suppose to be shared-nothing and independent tasks. If you add a functionality like this where you need more sure that some data is

Re: can't submit remote job

2015-05-18 Thread Shahab Yunus
I think that poster wanted to unsubscribe from the mailing list? Gopy, if that is the case then please see this for that: https://hadoop.apache.org/mailing_lists.html Regards, Shahab On Mon, May 18, 2015 at 9:42 AM, xeonmailinglist-gmail xeonmailingl...@gmail.com wrote: Why Remove? On

Re: How to set mapreduce.input.fileinputformat.split.maxsize for a specific job

2015-05-17 Thread Shahab Yunus
? Are these both solve the same purpose or something else ? Thanks, On Sat, May 16, 2015 at 8:48 PM, Shahab Yunus shahab.yu...@gmail.com wrote: You can either pass them on as command line argument using -D option. Assuming your job is implementing the standard Tool interface: https

Re: How to set mapreduce.input.fileinputformat.split.maxsize for a specific job

2015-05-16 Thread Shahab Yunus
You can either pass them on as command line argument using -D option. Assuming your job is implementing the standard Tool interface: https://hadoop.apache.org/docs/current/api/org/apache/hadoop/util/Tool.html Or you can set them in the code using the various 'set' methods to set key/value values

Re: How to access value of variable in Driver class which has been declared and modified inside Mapper class?

2015-05-12 Thread Shahab Yunus
Here are some examples of how to use custom counters: http://www.ashishpaliwal.com/blog/2012/05/hadoop-recipe-using-custom-java-counters/ Regards, Shahab On May 12, 2015 1:29 PM, Shahab Yunus shahab.yu...@gmail.com wrote: Better options than using static variable are, imo: One option it use

Re: How to access value of variable in Driver class which has been declared and modified inside Mapper class?

2015-05-12 Thread Shahab Yunus
Better options than using static variable are, imo: One option it use Counters. Check that API. We are using that for values that are numeric and we need those in the driver once the job finishes. You can create your custom counters too. Other option is (if you need more than just one value or

Re: Reading a sequence file from distributed cache

2015-05-12 Thread Shahab Yunus
getLocalCacheFiles is deprecated and can only access files that were downloaded locally to the node running the task. Use of getCacheFiles is encouraged now which downloads using a URI. Have you seen this?

Re: Re: Re: Filtering by value in Reducer

2015-05-12 Thread Shahab Yunus
(); threshold = conf.getInt( threshold, -1 ); } # Best, Peter On 11.05.2015 19:26, Shahab Yunus wrote: What is the type of the threshold variable? sum I believe is a Java int. Regards, Shahab On Mon, May 11, 2015 at 1:08 PM, Peter Ruch

Re: Filtering by value in Reducer

2015-05-11 Thread Shahab Yunus
What is the type of the threshold variable? sum I believe is a Java int. Regards, Shahab On Mon, May 11, 2015 at 1:08 PM, Peter Ruch rutschifen...@gmail.com wrote: Hi, I am currently playing around with Hadoop and have some problems when trying to filter in the Reducer. I extended the

Re: Reading a sequence file from distributed cache

2015-05-11 Thread Shahab Yunus
What version are you using? Have you seen this? Regards, Shahab On Mon, May 11, 2015 at 5:25 PM, marko.di...@nissatech.com wrote: Hello, I'm new to Hadoop and I'm having a problem reading from a sequence file that I add to distributed cache. I didn't have problems when I ran it in

Re: Help with implementing a Storm topology to stream tweets

2015-05-09 Thread Shahab Yunus
Have you tried Storm's mailing list? They would perhaps be able yo guide you better. Regards, Shahab On May 9, 2015 2:36 AM, mani kandan mankand...@gmail.com wrote: Hi I'm new to Storm, and I would like to create a Storm topology to stream tweets, do analysis and store on hdfs. Is there a

Re: Json Parsing in map reduce.

2015-04-30 Thread Shahab Yunus
The reason is that the Json parsing code is in a 3rd party library which is not included in the default map reduce/hadoop distribution. You have to add them in your classpath at *runtime*. There are multiple ways to do it (which also depends upon how you plan to run and package/deploy your code.)

Re: Connection Refused error on Hadoop-2.6.0 on Ubuntu 14.10 desktop running Pseudo Mode

2015-04-22 Thread Shahab Yunus
Can you try sudo? https://www.linux.com/learn/tutorials/306766:linux-101-introduction-to-sudo Regards, Shahab On Wed, Apr 22, 2015 at 8:26 AM, Anand Murali anand_vi...@yahoo.com wrote: Dear Sandeep: many thanks. I did find hosts, but I do not have write priveleges, eventhough I am

Re: How to stop a mapreduce job from terminal running on Hadoop Cluster?

2015-04-12 Thread Shahab Yunus
You can kill t by using the following yarn command yarn application -kill application id https://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-site/YarnCommands.html Or use old hadoop job command http://stackoverflow.com/questions/11458519/how-to-kill-hadoop-jobs Regards, Shahab On

Re: compress data in hadoop

2015-04-05 Thread Shahab Yunus
Your package seems different. Have you tried the following package and class? org.apache.hadoop.io.compress.BZip2Codec Regards, Shahab On Sun, Apr 5, 2015 at 9:45 AM, xeonmailinglist-gmail xeonmailingl...@gmail.com wrote: Hi, I have run the command [1] to create compressed data from my

Re: How to append the contents to a output file

2015-04-02 Thread Shahab Yunus
I hope I understood your requirement correctly. If your requirement is to write into multiple folders from the reducers AND in each folder append the data in the file in that folder, right? Reducer-output= folder1/file1 folder2/file2 This can be done with standard MultipleOutputFormat and

Re: can't set partition class to the configuration

2015-04-01 Thread Shahab Yunus
On Wed, Apr 1, 2015 at 11:03 AM, Shahab Yunus shahab.yu...@gmail.com wrote: As the error tells you, you cannot use a class as a Partitioner if it does not satisfy the interface requirements of the partitioning mechanism. You need to set a class a Partitioner which extends or implements

Re: can't set partition class to the configuration

2015-04-01 Thread Shahab Yunus
As the error tells you, you cannot use a class as a Partitioner if it does not satisfy the interface requirements of the partitioning mechanism. You need to set a class a Partitioner which extends or implements the Partioner contract. Regards, Shahab On Wed, Apr 1, 2015 at 10:54 AM,

Re: Simple MapReduce logic using Java API

2015-03-31 Thread Shahab Yunus
What is the reason of using the queue? job.getConfiguration().set(mapred.job.queue.name, exp_dsa); Is your mapper or reducer even been called? Try adding the override annotation to the map/reduce methods as below: @Override public void map(Object key, Text value, Context context) throws

Re: cleanup() in hadoop results in aggregation of whole file/not

2015-02-28 Thread Shahab Yunus
As far as I understand cleanup is called per task. In your case I.e. per map task. To get an overall count or measure, you need to aggregate it yourself after the job is done. One way to do that is to use counters and then merge them programmatically at the end of the job. Regards, Shahab On

Re: secure checksum in HDFS

2015-02-20 Thread Shahab Yunus
There seem to be some work done on this here: https://issues.apache.org/jira/browse/HADOOP-9209 3rd party tool: https://github.com/rdsr/hdfs-checksum Regards, Shahab On Fri, Feb 20, 2015 at 12:39 PM, xeonmailinglist xeonmailingl...@gmail.com wrote: Hi, Is it possible to use SHA-256, or MD5

Re: writing mappers and reducers question

2015-02-19 Thread Shahab Yunus
Nope. You can use the Standalone setup too to test things. Details here: http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/SingleNodeSetup.html#Standalone_Operation Regards, Shahab On Fri, Feb 20, 2015 at 12:40 AM, Jonathan Aquilina jaquil...@eagleeyet.net wrote: Hey

Re: Creation of an empty output directory in hadoop filesystem

2015-01-07 Thread Shahab Yunus
First try: You should use @Override annotation before map and reduce methods so they are actually called. Like this: *@Override* public void map(LongWritable k,Text v,Context con)throws IOException,InterruptedException {... Do same for 'reduce' method. Regards, Shahab On Wed, Jan 7,

Re: Creation of an empty output directory in hadoop filesystem

2015-01-07 Thread Shahab Yunus
. Regards, Shahab On Wed, Jan 7, 2015 at 8:18 AM, Shahab Yunus shahab.yu...@gmail.com wrote: First try: You should use @Override annotation before map and reduce methods so they are actually called. Like this: *@Override* public void map(LongWritable k,Text v,Context con)throws IOException

Re: Write and Read file through map reduce

2015-01-06 Thread Shahab Yunus
Distributed Cache has been deprecated for a while. You can use the new mechanism, which is functionally the same thing, discussed here in this thread: http://stackoverflow.com/questions/21239722/hadoop-distributedcache-is-deprecated-what-is-the-preferred-api Regards, Shahab On Mon, Jan 5, 2015

Re: FileNotFoundException in distributed mode

2014-12-22 Thread Shahab Yunus
You should not use DistrubutedCache. It is deprecated. See this: http://stackoverflow.com/questions/21239722/hadoop-distributedcache-is-deprecated-what-is-the-preferred-api Regards, Shahab On Mon, Dec 22, 2014 at 6:22 AM, Marko Dinic marko.di...@nissatech.com wrote: Thanks a lot, it works!

Re: DistributedCache

2014-12-11 Thread Shahab Yunus
Look at this thread. It has alternatives to DistributedCache. http://stackoverflow.com/questions/21239722/hadoop-distributedcache-is-deprecated-what-is-the-preferred-api Basically you can use the new method job.addCacheFiles to pass on stuff to the individual tasks. Regards, Shahab On Thu, Dec

Re: HDFS block size question

2014-12-10 Thread Shahab Yunus
Check this out: http://ofirm.wordpress.com/2014/02/01/exploring-the-hdfs-default-value-behaviour/ It seems that the value of *dfs.block.size* is dictated directly by the client, regarding of the cluster setting. If a value is not specified, the client just picks the default value. This finding is

Re: hadoop data structures

2014-12-09 Thread Shahab Yunus
Are you asking about the type for the numberOfRuns variable which you are declaring as a Java primitive int? If yes, then you can use IntWritable class in Hadoop to define a integer variable which will work with M/R Regards, Shahab On Tue, Dec 9, 2014 at 3:47 AM, steven commercial...@yahoo.de

Re: How to limit the number of containers requested by a pig script?

2014-10-21 Thread Shahab Yunus
Jakub, are you saying that we can't change the mappers per job through the script, right? Because, otherwise, if invoking through command line or code, then we can, I think. We do have this property mapreduce.job.maps. Regards, Shahab On Tue, Oct 21, 2014 at 2:42 AM, Jakub Stransky

Re: Spark vs Tez

2014-10-17 Thread Shahab Yunus
What aspects of Tez and Spark are you comparing? They have different purposes and thus not directly comparable, as far as I understand. Regards, Shahab On Fri, Oct 17, 2014 at 2:06 PM, Adaryl Bob Wakefield, MBA adaryl.wakefi...@hotmail.com wrote: Does anybody have any performance figures on

Re: number of mappers allowed in a container in hadoop2

2014-10-15 Thread Shahab Yunus
It depends on memory settings as well, that how much you want to assign resources to each container. Then yarn will run as many mappers in parallel as possible. See this: http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0/

Re: number of mappers allowed in a container in hadoop2

2014-10-15 Thread Shahab Yunus
On Wednesday 15 October 2014 05:45 PM, Shahab Yunus wrote: It depends on memory settings as well, that how much you want to assign resources to each container. Then yarn will run as many mappers in parallel as possible. See this: http://hortonworks.com/blog/how-to-plan-and-configure-yarn-in-hdp-2-0

Re: number of mappers allowed in a container in hadoop2

2014-10-15 Thread Shahab Yunus
there is one property as mapreduce.map.memory.mb = 2*1024 MB mapreduce.reduce.memory.mb = 2 * 2 = 4*1024 MB what are these properties mapreduce.map.memory.mb and mapreduce.reduce.memory.mb On Wednesday 15 October 2014 06:17 PM, Shahab Yunus wrote: It cannot run more mappers

Re: number of mappers allowed in a container in hadoop2

2014-10-15 Thread Shahab Yunus
this property On Wednesday 15 October 2014 07:06 PM, Shahab Yunus wrote: Explanation here. http://stackoverflow.com/questions/24070557/what-is-the-relation-between-mapreduce-map-memory-mb-and-mapred-map-child-jav https://support.pivotal.io/hc/en-us/articles/201462036-Mapreduce-YARN-Memory

Re: Query regarding the replication factor in hadoop

2014-09-19 Thread Shahab Yunus
Your write will not succeed. You will get an exception like could only be replicated to 0 nodes, instead of 1 More details here: http://www.bigdataplanet.info/2013/10/Hadoop-Tutorial-Part-4-Write-Operations-in-HDFS.html

Re: Query regarding the replication factor in hadoop

2014-09-19 Thread Shahab Yunus
Interesting. I thought that the write would fail in case if # of nodes downs is greater than min-replication property. So in reality we only get a warning while writing (and a info message through fsck.) Regards, Shahab On Fri, Sep 19, 2014 at 9:26 AM, Abirami V abiramipand...@gmail.com wrote:

Re: ClassCastException on running map-reduce jobs + tests on Windows (mongo-hadoop)

2014-09-18 Thread Shahab Yunus
. Is there an automatic way to make it? Or should I write myself a parse? And regarding the tests on windows, any experience? Thanks again!! Best regards, Blanca *Von:* Shahab Yunus [mailto:shahab.yu...@gmail.com] *Gesendet:* Mittwoch, 17. September 2014 17:20 *An:* user

Re: ClassCastException on running map-reduce jobs + tests on Windows (mongo-hadoop)

2014-09-17 Thread Shahab Yunus
Can you provide the driver code for this job? Regards, Shahab On Wed, Sep 17, 2014 at 10:28 AM, Blanca Hernandez blanca.hernan...@willhaben.at wrote: Hi again, I changed the String objects with org.apache.hadoop.io.Text objects (why is String not accepted?), and now I get another exception,

Re: ClassCastException on running map-reduce jobs + tests on Windows (mongo-hadoop)

2014-09-17 Thread Shahab Yunus
. } } Best regards, Blanca *Von:* Shahab Yunus [mailto:shahab.yu...@gmail.com] *Gesendet:* Mittwoch, 17. September 2014 16:37 *An:* user@hadoop.apache.org *Betreff:* Re: ClassCastException on running map-reduce jobs + tests on Windows (mongo-hadoop) Can you provide

Re: Explanation according to the output of a successful execution

2014-09-15 Thread Shahab Yunus
How did you fix it? And what is your question now? Regards, Shahab On Mon, Sep 15, 2014 at 9:18 AM, YIMEN YIMGA Gael gael.yimen-yi...@sgcib.com wrote: Hello Dear Hadoopers, Just to let you know that, finally I succeded fixing my issue this morning. Now, I would like to have more

Re: how to setup Kerberozed Hadoop ?

2014-09-15 Thread Shahab Yunus
Hi Have you already looked at the existing documentation? For apache http://hadoop.apache.org/docs/r2.3.0/hadoop-project-dist/hadoop-common/SecureMode.html -For cloudera http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.6.0/CDH4-Security-Guide/cdh4sg_topic_3.html Some

Re: Error when executing a WordCount Program

2014-09-10 Thread Shahab Yunus
*hdfs://latdevweb02:9000/home/hadoop/hadoop/input* is this is a valid path on hdfs? Can you access this path outside of the program? For example using hadoop fs -ls command? Also, was this path and files in it, created by a different user? The exception seem to say that it does not exist or the

Re: Writing output from streaming task without dealing with key/value

2014-09-10 Thread Shahab Yunus
Examples (the top ones are related to streaming jobs): http://www.infoq.com/articles/HadoopOutputFormat http://research.neustar.biz/2011/08/30/custom-inputoutput-formats-in-hadoop-streaming/

Re: YARN userapp cache lifetime: can't find core dump

2014-09-02 Thread Shahab Yunus
Perhaps the following? I get the application logs from here after job completion. This is path on hdfs. yarn.nodemanager.remote-app-log-dir Regards, Shahab On Tue, Sep 2, 2014 at 4:02 PM, John Lilley john.lil...@redpoint.net wrote: We have a YARN task that is core-dumping, and the JVM error

job.getCounters returns null in Yarn-based job

2014-08-22 Thread Shahab Yunus
Hello. I am trying to access custom counters that I have created in an mapreduce job on Yarn. After job.waitForCompletion(true) call, I try to do job.getCounters() but I get a null. This only happens if I run a heavy job meaning a) a lot of data and b) lot of reducers. E.g. for 10million

Re: job.getCounters returns null in Yarn-based job

2014-08-22 Thread Shahab Yunus
. One minor thing that now the job history UI does not show the history with the error message that max counter increased. Regards, Shahab On Fri, Aug 22, 2014 at 7:59 AM, Shahab Yunus shahab.yu...@gmail.com wrote: Hello. I am trying to access custom counters that I have created in an mapreduce

Re: Hadoop InputFormat - Processing large number of small files

2014-08-20 Thread Shahab Yunus
Have you looked at the WholeFileInputFormat implementations? There are quite a few if search for them... http://hadoop-sandy.blogspot.com/2013/02/wholefileinputformat-in-java-hadoop.html https://github.com/tomwhite/hadoop-book/blob/master/ch07/src/main/java/WholeFileInputFormat.java Regards,

Relationship between number of reducers and number of regions in the table

2014-08-14 Thread Shahab Yunus
I couldn't decide that whether it is an HBase question or Hadoop/Yarn. In the utility class for MR jobs integerated with HBase, *org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil, * in the method: *public static void initTableReducerJob(String table,* *Class? extends TableReducer

Re: One datanode is down then write/read starts failing

2014-07-28 Thread Shahab Yunus
The reason being that when you write something in HDFS, it guarantees that it will be written to the specified number of replicas. So if your replication factor is 2 and one of your node (out of 2) is down, then it cannot guarantee the 'write'. The way to handle this to have a cluster of more

Re: Difference between different tar

2014-07-21 Thread Shahab Yunus
The '-bin' file does not have the source code (bin for binaries) while the other does. You can check and see the major difference in the 'src' folders under the top-level directory after unzipping/untarring. Regards, Shahab On Mon, Jul 21, 2014 at 3:54 AM, Vimal Jain vkj...@gmail.com wrote:

Re: Merging small files

2014-07-20 Thread Shahab Yunus
Why it isn't appropriate to discuss too much vendor specific topics on a vendor-neutral apache mailing list? Checkout this thread: http://mail-archives.apache.org/mod_mbox/hadoop-mapreduce-user/201309.mbox/%3ccaj1nbzcocw1rsncf3h-ikjkk4uqxqxt7avsj-6nahq_e4dx...@mail.gmail.com%3E You can always

Re: Data cleansing in modern data architecture

2014-07-20 Thread Shahab Yunus
I am assuming you meant the batch jobs that are/were used in old world for data cleansing. As far as I understand there is no hard and fast rule for it and it depends functional and system requirements of the usecase. It is also dependent on the technology being used and how it manages

Re: Merging small files

2014-07-19 Thread Shahab Yunus
It is not advisable to have many small files in hdfs as it can put memory load on Namenode as it maintains the metadata, to highlight one major issue. On the top of my head, some basic ideas...You can either combine invoices into a bigger text file containing a collection of records where each

Re: what exactly does data in HDFS look like?

2014-07-18 Thread Shahab Yunus
The data itself is eventually store in a form of file. Each blocks of the file and it replicas are stored in files and directories on different nodes. The Namenode that keep the information and maintains it about each file and where its blocks (and replicated blocks exist in the cluster.) As for

Re: Providing a file instead of a directory to a M/R job

2014-07-17 Thread Shahab Yunus
On Thu, Jul 17, 2014 at 11:23 AM, Bertrand Dechoux decho...@gmail.com wrote: No reason why not. And a permission explains why there is an error : missing access rights Bertrand Dechoux On Thu, Jul 17, 2014 at 4:58 PM, Shahab Yunus shahab.yu...@gmail.com wrote: In MRv2 or Yarn

Re: How to recover reducer task data on a different data node?

2014-07-03 Thread Shahab Yunus
Adding to what Jungi Jeong said, if you can get your hands on the book* Hadoop: The Definitive Guide *by Tom White, then that would help as well as it is explains this in significant detail. Regards, Shahab On Thu, Jul 3, 2014 at 6:29 AM, Jungi Jeong jgje...@calab.kaist.ac.kr wrote: As far as

Re: Spark vs. Storm

2014-07-02 Thread Shahab Yunus
Not exactly. There are of course major implementation differences and then some subtle and high level ones too. My 2-cents: Spark is in-memory M/R and it simulated streaming or real-time distributed process for large datasets by micro-batching. The gain in speed and performance as opposed to

Re: The future of MapReduce

2014-07-02 Thread Shahab Yunus
My personal thoughts on this. I approach this problem in a different way. Map/Reduce is not a framework or a technology. It was a paradigm for distributed and parallel processing which can be implemented in different frameworks and style. So given that, I don't think there is as such any harm in

Re: job.setOutputFormatClass(NullOutputFormat.class);

2014-07-01 Thread Shahab Yunus
To get rid of empty *part files while using MultipleOutputs in the new API, LazyOutputFormat class' static method should be used to set the output format. Details are here at the official Java docs for MultipleOutputs :

Re: WholeFileInputFormat in hadoop

2014-06-28 Thread Shahab Yunus
I think it takes the entire file as input. Otherwise it won't be any different from the normal line/record-based input format. Regards, Shahab On Jun 28, 2014 3:28 AM, unmesha sreeveni unmeshab...@gmail.com wrote: Hi A small clarification: WholeFileInputFormat takes the entire input file

Re: Practical examples

2014-04-28 Thread Shahab Yunus
For Machine Learning based applications of Hadoop you can check-out Mahout framework. Regards, Shahab On Mon, Apr 28, 2014 at 10:02 PM, Mohan Radhakrishnan radhakrishnan.mo...@gmail.com wrote: Hi, I have been reading the definitive guide and taking online courses. Now I would like

Re: How do I get started with hadoop

2014-04-25 Thread Shahab Yunus
Assuming, you are talking about basic stuff... Michael Noll has some good Hadoop (pre-Yarn) tutorials http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/ Then definitely go through the book Hadoop- The Definitive Guide by Tom White.

Re: JVM option

2014-04-18 Thread Shahab Yunus
You can pass hadoop conf properties through the -D option. Have you seen this? http://stackoverflow.com/questions/15490090/how-to-specify-system-property-in-hadoop-except-modify-hadoop-env-sh This is not for system properties. The assumption is that you want to specify hadoop conf property

Re: calling mapreduce from webservice

2014-04-18 Thread Shahab Yunus
Question: M/R jobs are supposed to run for a long time. They are essentially batch processes. Do you plan to keep the Web UI blocked for that while? Or are you looking for asynchronous invocation of the M/R job? Or are you thinking about building sort of an Admin UI (e.g. PigLipstick) What exactly

Re: JVM option

2014-04-18 Thread Shahab Yunus
On Fri, Apr 18, 2014 at 11:28 AM, Shahab Yunus shahab.yu...@gmail.comwrote: You can pass hadoop conf properties through the -D option. Have you seen this? http://stackoverflow.com/questions/15490090/how-to-specify-system-property-in-hadoop-except-modify-hadoop-env-sh

Re: calling mapreduce from webservice

2014-04-18 Thread Shahab Yunus
, Girish On Saturday, April 19, 2014 12:34 AM, Shahab Yunus shahab.yu...@gmail.com wrote: Question: M/R jobs are supposed to run for a long time. They are essentially batch processes. Do you plan to keep the Web UI blocked for that while? Or are you looking for asynchronous invocation

Re: How to find generated mapreduce code for pig/hive query

2014-03-28 Thread Shahab Yunus
You can use ILLUSTRATE and EXPLAIN commands to see the execution plan, if you mean that by 'under the hood algorithm' http://pig.apache.org/docs/r0.11.1/test.html Regards, Shahab On Fri, Mar 28, 2014 at 5:51 PM, Spark Storm using.had...@gmail.com wrote: hello experts, am really new to

Re: Architecture question on Injesting Data into Hadoop

2014-03-24 Thread Shahab Yunus
@ados1984, HDFS is a file system and HBase is a data store on top of that. You cannot create tables (in the conventional meaning of the word table in database/store) directly on HDFS without HBase. Regards, Shahab On Mon, Mar 24, 2014 at 4:11 PM, Geoffry Roberts threadedb...@gmail.comwrote:

Re: Need FileName with Content

2014-03-21 Thread Shahab Yunus
If this parameter is at the job level (i.e. for the whole run level) then you can set this value int the Configuration object to pass it on to the mappers. http://www.thecloudavenue.com/2011/11/passing-parameters-to-mappers-and.html Regards, Shahab On Fri, Mar 21, 2014 at 7:08 AM, Ranjini

Re: File_bytes_read vs hdfs_bytes_read

2014-03-14 Thread Shahab Yunus
There is some explanation here as well (in case you haven't check that out yet): http://stackoverflow.com/questions/16634294/understanding-the-hadoop-file-system-counters Regards, Shahab On Fri, Mar 14, 2014 at 5:32 AM, Vinayakumar B vinayakuma...@huawei.comwrote: Its simple, bytes read

Re: Use Cases for Structured Data

2014-03-12 Thread Shahab Yunus
I would suggest that given the level of details that you are looking for and fundamental nature of your questions, you should get hold of books or online documentation. Basically some reading/research. Latest edition of http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1449311520 is

Re: Use Cases for Structured Data

2014-03-12 Thread Shahab Yunus
, 2014 at 3:11 PM, Shahab Yunus shahab.yu...@gmail.comwrote: I would suggest that given the level of details that you are looking for and fundamental nature of your questions, you should get hold of books or online documentation. Basically some reading/research. Latest edition of http

Re: Block size

2014-01-03 Thread Shahab Yunus
Yes it can. It is a configurable property. The exact name might differ depending on the version though. Read the details here: https://hadoop.apache.org/docs/current2/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

Re: Mapreduce outputs to a different cluster?

2013-10-25 Thread Shahab Yunus
You can specify the HDFS path as follows: FileOutputFormat.setOutputPath(conf, new Path(args[1])); where Path object is of course the location of your output dir. See this for details http://www.rohitmenon.com/index.php/introducing-mapreduce-part-i/ Regards, Shahab On Thu, Oct 24, 2013 at

Re: Log file size limiting and log file rotation configurations in hadoop

2013-10-03 Thread Shahab Yunus
I am assuming that you are talking about user logs? See the following links for some pointers: http://grepalex.com/2012/11/12/hadoop-logging/ http://blog.cloudera.com/blog/2010/11/hadoop-log-location-and-retention/ http://hadoop.apache.org/docs/r1.0.4/mapred-default.html (*userlog* properties)

Re: set the number of reduce tasks in the wordcount by command line

2013-09-25 Thread Shahab Yunus
Have you tried setting *mapred.reduce.tasks *property? Regards, Shahab On Wed, Sep 25, 2013 at 6:01 PM, xeon xeonmailingl...@gmail.com wrote: is it possible to set the number of reduce tasks in the wordcount example when I launch the job by command line? Thanks

Re: set the number of reduce tasks in the wordcount by command line

2013-09-25 Thread Shahab Yunus
this? On 09/25/2013 11:16 PM, Shahab Yunus wrote: Have you tried setting *mapred.reduce.tasks *property? Regards, Shahab On Wed, Sep 25, 2013 at 6:01 PM, xeon xeonmailingl...@gmail.com wrote: is it possible to set the number of reduce tasks in the wordcount example when I launch the job by command

Re: set the number of reduce tasks in the wordcount by command line

2013-09-25 Thread Shahab Yunus
the reduces can still be executed in a single wave. Ignored when mapreduce.jobtracker.address is local. On Sep 25, 2013, at 3:17 PM, xeon xeonmailingl...@gmail.com wrote: In yarn 2.0.5, where I set this? On 09/25/2013 11:16 PM, Shahab Yunus wrote: Have you tried setting *mapred.reduce.tasks

Re: Mapreduce jobtracker recover property

2013-09-23 Thread Shahab Yunus
*mapred.jobtracker.restart.recover *is the old API, while the other one is for new. It is used to specify whether the job should try to resume at recovering time and when restarting. If you don't want to use it then the default value of false is used (specified in the already packaged/bundled

Re: Mapreduce jobtracker recover property

2013-09-23 Thread Shahab Yunus
*mapred.jobtracker.restart.recover *is the old API, while the other one is for new. It is used to specify whether the job should try to resume at recovering time and when restarting. If you don't want to use it then the default value of false is used (specified in the already packaged/bundled

Re: MAP_INPUT_RECORDS counter in the reducer

2013-09-17 Thread Shahab Yunus
In the normal configuration, the issue here is that Reducers can start before all the Maps have finished so it is not possible to get the number (or make sense of it even if you are able to,) Having said that, you can specifically make sure that Reducers don't start until all your maps have

Re: Yarn log directory perms

2013-09-14 Thread Shahab Yunus
Just a thought, I don't know how much it makes sense, why not run that program as the user who is allowed to read that directory and you can then allow that user to write to whichever directory you want to forward your logs? Regards, Shahab On Sat, Sep 14, 2013 at 6:47 PM, Prashant Kommireddi

Re: Cloudera Vs Hortonworks Vs MapR

2013-09-13 Thread Shahab Yunus
I think, in my opinion, it is a wrong idea because: 1- Many of the participants here are employees for these very companies that are under discussion. This puts these respective employees in very difficult position. It is very hard to come with a correct response. Comments can be misconstrued

Re: chaining (the output of) jobs/ reducers

2013-09-12 Thread Shahab Yunus
The temporary file solution will work in a single node configuration, but I'm not sure about an MPP config. Let's say Job A runs on nodes 0 and 1 and job B runs on nodes 2 and 3 or both jobs run on all 4 nodes - will HDFS be able to redistribute automagically the records between nodes or does

Re: can the parameters dfs.block.size and dfs.replication be different from one file to the other

2013-09-11 Thread Shahab Yunus
such -D fs.local.block.size is supported in Hadoop 1.1. or not? Thank you! Jun On Tue, Sep 10, 2013 at 11:38 AM, Shahab Yunus shahab.yu...@gmail.comwrote: can be set at the time I load the file to the HDFS (that is, it is the client side setting)? I don't think you can do this while reading

Re: can the parameters dfs.block.size and dfs.replication be different from one file to the other

2013-09-10 Thread Shahab Yunus
can be set at the time I load the file to the HDFS (that is, it is the client side setting)? I don't think you can do this while reading. These are done at the time of writing. You can do it like this (the example is for CLI as evident): hadoop fs -D fs.local.block.size=134217728 -put

Re: hadoop cares about /etc/hosts ?

2013-09-09 Thread Shahab Yunus
I think he means the 'masters' file found only at the master node(s) at conf/masters. Details here: http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/#masters-vs-slaves Regards, Shahab On Mon, Sep 9, 2013 at 10:22 AM, Jay Vyas jayunit...@gmail.com wrote:

Re: about permission to perform the mapreduce task

2013-09-06 Thread Shahab Yunus
Check out the 'Map Reduce' section in this link (assuming MRv1): http://gbif.blogspot.com/2011/01/setting-up-hadoop-cluster-part-1-manual.html Regards, Shahab On Fri, Sep 6, 2013 at 4:34 AM, kun yan yankunhad...@gmail.com wrote: I'm from the client to connect to a Hadoop cluster, my

Re: Pig jars

2013-09-06 Thread Shahab Yunus
Basically pig.jar has hadoop within itself while the other one, as evident by the name, does not include hadoop. Details here: http://hadoopified.wordpress.com/2013/04/07/pig-startup-script-behavior/ Regards, Shahab On Fri, Sep 6, 2013 at 11:33 AM, Viswanathan J

Re: what is the difference between mapper and identity mapper, reducer and identity reducer?

2013-09-05 Thread Shahab Yunus
Identity Mapper and Reducer just like the concept of Identity function in mathematics i.e. do not transform the input and return it as it is in output form. Identity Mapper takes the input key/value pair and spits it out without any processing. The case of identity reducer is a bit different. It

Re: so the master just died... now what?

2013-09-03 Thread Shahab Yunus
Keep in mind that there are 2 flavors of Hadoop: the older one without HA and the new one with it. Anyway, have you seen the following? http://wiki.apache.org/hadoop/NameNodeFailover http://www.youtube.com/watch?v=Ln1GMkQvP9w

Re: Job config before read fields

2013-08-31 Thread Shahab Yunus
that is. Cheers, Adi On Sat, Aug 31, 2013 at 3:42 AM, Shahab Yunus shahab.yu...@gmail.comwrote: What I meant was that you might have to split or redesign your logic or your usecase (which we don't know about)? Regards, Shahab On Fri, Aug 30, 2013 at 10:31 PM, Adrian CAPDEFIER chivas314

  1   2   >