Re: Job config before read fields

2013-08-30 Thread Shahab Yunus
I think you have to override/extend the Comparator to achieve that, something like what is done in Secondary Sort? Regards, Shahab On Fri, Aug 30, 2013 at 9:01 PM, Adrian CAPDEFIER chivas314...@gmail.comwrote: Howdy, I apologise for the lack of code in this message, but the code is fairly

Re: Job config before read fields

2013-08-30 Thread Shahab Yunus
at 2:38 AM, Shahab Yunus shahab.yu...@gmail.comwrote: I think you have to override/extend the Comparator to achieve that, something like what is done in Secondary Sort? Regards, Shahab On Fri, Aug 30, 2013 at 9:01 PM, Adrian CAPDEFIER chivas314...@gmail.com wrote: Howdy, I apologise

Re: How to pass parameter to mappers

2013-08-28 Thread Shahab Yunus
See here: http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Job+Configuration Regards, Shahab On Wed, Aug 28, 2013 at 7:59 AM, rab ra rab...@gmail.com wrote: Hello Any hint on how to pass parameters to mappers in 1.2.1 hadoop release?

Re: Writing data to hbase from reducer

2013-08-28 Thread Shahab Yunus
There are many way to do it. You can write your own M/R job in Java to use the provided outputformatter and inputformatters. Or you can use Pig to store it in HBase using HBaseStorage. There are many ways (and resources available on the web) and he question that you have asked is very high

Re: Writing data to hbase from reducer

2013-08-28 Thread Shahab Yunus
. Where can i find some examples using M/R code On Wed, Aug 28, 2013 at 9:53 AM, Shahab Yunus shahab.yu...@gmail.comwrote: There are many way to do it. You can write your own M/R job in Java to use the provided outputformatter and inputformatters. Or you can use Pig to store it in HBase using

Re: Helper files in python

2013-08-28 Thread Shahab Yunus
What you want to do with these? What do you mean by specify? Do you want to run them as a streaming job? Have you seen the streaming tutorial for hadoop? Regards, Shahab On Aug 28, 2013 7:50 PM, Chengi Liu chengi.liu...@gmail.com wrote: Hi, I have four files mapper.py mapper_helper.py

Re: reg hadoop streaming

2013-08-27 Thread Shahab Yunus
Yes, I think so. The TaskTracker that launched the mapper and reducer in the child JVM which further invoked the streaming process can (and does) communicate with the JobTracker. Regards, Shahab On Tue, Aug 27, 2013 at 8:34 AM, Manoj Babu manoj...@gmail.com wrote: Team, Does streaming

Re: Simplifying MapReduce API

2013-08-27 Thread Shahab Yunus
For starters (experts might have more complex reasons), what if your respective map and reduce logic becomes complex enough to demand separate classes? Why tie the clients to implement both by moving these in one Job interface. In the current design you can always implement both (map and reduce)

Re: Jar issue

2013-08-27 Thread Shahab Yunus
One idea is, you can use the exclusion property of maven (provided you are using that to build your application) while including hadoop dependencies and exclude sl4j that is coming within hadoop and then include your own sl4j as a separate dependency. Something like this: dependency

Re: MapReduce Tutorial tweak

2013-08-27 Thread Shahab Yunus
As far as I undersstand, StringTokenizer.nextToken returns Java String type object which does not implement the required Writable and Comparable interfaces needed to Hadoop Mapreduce serialization and transport. The Text class does that and is compatible and thus that is why that is being used to

Re: running map tasks in remote node

2013-08-23 Thread Shahab Yunus
You say: Each map process gets a line. The map process will then do a file transfer and process it. What file, from where to where is being transferred in the map? Are you sure that the mappers are not complaining about 'this' file access? Because this seem to be separate from the initial data

Re: Partitioner vs GroupComparator

2013-08-23 Thread Shahab Yunus
@Jan, why not, not send the 'hidden' part of the key as a value? Why not then pass value as null or with some other value part. So in the reducer side there is no duplication and you can extract the 'hidden' part of the key yourself (which should be possible as you will be encapsulating it in a

Re: Partitioner vs GroupComparator

2013-08-23 Thread Shahab Yunus
Original message Subject: Re: Partitioner vs GroupComparator From: Shahab Yunus shahab.yu...@gmail.com To: user@hadoop.apache.org user@hadoop.apache.org CC: @Jan, why not, not send the 'hidden' part of the key as a value? Why not then pass value as null or with some other

Re: Getting HBaseStorage() to work in Pig

2013-08-23 Thread Shahab Yunus
You are here running multiple UNIX commands and the end result or the end command is to run hbase-YOUR VERSION.jar using hadoop's *jar* command. So basically you add HBase jars to the classpath of your Hadoop environment and then execute hbase tools using hadoop. If you get the message as

Re: read a changing hdfs file

2013-08-20 Thread Shahab Yunus
As far as I understand (and experts can correct me), the file being written will be visible once one HDFS block size worth of data is written. This applies to subsequent writing as well. Basically a block size worth of data is the level of coherency, the size/unit of data for which data durability

Re: Things to keep in mind when writing to a db

2013-08-19 Thread Shahab Yunus
There are quite a few examples on the web, if you search for them. A quick search yielded the following e.g.: http://jaganadhg.freeflux.net/blog/archive/tag/example/ http://techannotation.wordpress.com/2012/09/10/hadoop-in-practice/ Regards, Shahab On Mon, Aug 19, 2013 at 5:54 AM, Devaraj k

Re: hdfs write files in streaming fashion

2013-08-19 Thread Shahab Yunus
For 1, yes the data is written in chunks to hdfs if you are using the FiLe System API. The whole file is not first stored in memory. For 2, I think you should anyway shouldn't rely on an exception or 'not-closing' the writer for cleaning up the partially written file. It is not a safe and

Re: Reduce Task Clarification

2013-08-14 Thread Shahab Yunus
Also Sam, following are is a link giving example about how to implement secondary sort and what it is... http://codingjunkie.net/secondary-sort/ Regards, Shahab On Tue, Aug 13, 2013 at 3:51 PM, Sam Garrett s...@actionx.com wrote: I am working on a MapReduce job where I would like to have the

Re: Maven Cloudera Configuration problem

2013-08-13 Thread Shahab Yunus
You need to configure your namenode and jobtracker information in the configuration files within you application. Only set the relevant properties in the copy of the files that you are bundling in your job. For rest the default values would be used from the default configuration files

Re: Maven Cloudera Configuration problem

2013-08-13 Thread Shahab Yunus
You should not use LocalJobRunner. Make sure that the mapred.job.tracker property does not point to 'local' an instead to your job-tracker host and port. *But before that* as Sandy said, your client machine (from where you will be kicking of your jobs and apps) should be using config files which

Re: Maven Cloudera Configuration problem

2013-08-13 Thread Shahab Yunus
On Tue, Aug 13, 2013 at 6:07 PM, Brad Cox bradj...@gmail.com wrote: That link got my hopes up. But Cloudera Manager (what I'm running; on CDH4) does not offer an Export Client Config option. What am I missing? On Aug 13, 2013, at 4:04 PM, Shahab Yunus shahab.yu...@gmail.com wrote: You should

Re: Mapreduce for beginner

2013-08-08 Thread Shahab Yunus
Given that your questions are very broad and at high level, I would suggest that you should pick up a book or such to go through that. The Hadoop: Definitive Guide by Tom White is a great book to start with it. Meanwhile some links to start with:

Re: How to construct path to a HDFS dir in local host

2013-08-03 Thread Shahab Yunus
I think you need to have a user folder (with whatever user you are running hadoop) and put your files there. In fact that folder should be already created there. The hdfsr data dir property is for setting the path to the data files. Regards, Shahab On Fri, Aug 2, 2013 at 9:38 PM, Huy Pham

Re: Change the output of Reduce function

2013-07-25 Thread Shahab Yunus
I think uou can use NullWritable as key. http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/NullWritable.html Regards, Shahab On Thu, Jul 25, 2013 at 2:58 PM, Felipe Gutierrez felipe.o.gutier...@gmail.com wrote: I did a MapReduce program to execute a Grep function. I know there

Re: Change the output of Reduce function

2013-07-25 Thread Shahab Yunus
, Shahab Yunus shahab.yu...@gmail.comwrote: I think uou can use NullWritable as key. http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/NullWritable.html Regards, Shahab On Thu, Jul 25, 2013 at 2:58 PM, Felipe Gutierrez felipe.o.gutier...@gmail.com wrote: I did a MapReduce

Re: Change the output of Reduce function

2013-07-25 Thread Shahab Yunus
cloudfront.blogspot.com On Fri, Jul 26, 2013 at 12:48 AM, Felipe Gutierrez felipe.o.gutier...@gmail.com wrote: Sorry, I think I didnt understand, Does NullWritable go to replate MyWritable? But this is may value. My key is a Text. Regards, Felipe On Thu, Jul 25, 2013 at 4:07 PM, Shahab Yunus

Re: ERROR orm.ClassWriter: Cannot resolve SQL type 1111

2013-07-23 Thread Shahab Yunus
I think you will have to write a custom code to handle this. Regards, Shahab On Tue, Jul 23, 2013 at 3:50 AM, Fatih Haltas fatih.hal...@nyu.edu wrote: At those columns, I am using uint type. I tried to cast them via sqoop option still it gave the same error. For other columns having type

Re: Get the tree structure of a HDFS dir, similar to dir/files

2013-07-23 Thread Shahab Yunus
See this https://sites.google.com/site/hadoopandhive/home/how-to-read-all-files-in-a-directory-in-hdfs-using-hadoop-filesystem-api and http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileSystem.html#isDirectory(org.apache.hadoop.fs.Path) Basically you can write your own function,

Re: Loading 1st 100 records

2013-07-22 Thread Shahab Yunus
Any error messages or details or logs would be helpful in advising. Plus you are saying your loading FROM teradata. Where are you loading TO? How does HDFS (and the 100 files on it) comes into the picture? Regards, Shahab On Mon, Jul 22, 2013 at 12:06 PM, suneel hadoop

Re: java.io.IOException: error=2, No such file or directory

2013-07-16 Thread Shahab Yunus
The error is: *Please set $HBASE_HOME to the root of your HBase installation.* * * Have you checked whether it is set or not? Have you verified your HBase or Hadoop installation? Similarly, the following: *Cannot run program psql: java.io.IOException: error=2, No such file or directory * Also

Re: java.io.IOException: error=2, No such file or directory

2013-07-16 Thread Shahab Yunus
Great. Can you please share, if possible, what was the problem and how you solved it? Thanks. Regards, Shahab On Tue, Jul 16, 2013 at 9:58 AM, Fatih Haltas fatih.hal...@nyu.edu wrote: Thanks Shahab, I solved my problem, in anyother way,

Re: How are 'PHYSICAL_MEMORY_BYTES' and 'VIRTUAL_MEMORY_BYTES' calculated?

2013-07-12 Thread Shahab Yunus
of a process in a particular time point (looked like a snapshot of the status of the process). If these numbers are added, the sum would be much more than memory allocated to the program. On Fri, Jul 12, 2013 at 6:47 AM, Shahab Yunus shahab.yu...@gmail.comwrote: I think they are cumulative

Re: New Distributed Cache

2013-07-10 Thread Shahab Yunus
Also, once you have the array of URIs after calling getCacheFiles you can iterate over them using File class or Path ( http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/Path.html#Path(java.net.URI) ) Regards, Shahab On Wed, Jul 10, 2013 at 5:08 PM, Omkar Joshi

Re: Can not follow Single Node Setup example.

2013-06-26 Thread Shahab Yunus
Have you verified that the 'input' folder exists on the hdfs (singel node setup) that you are job needs? Regards, Shahab On Wed, Jun 26, 2013 at 10:53 AM, Peng Yu pengyu...@gmail.com wrote: Hi, http://hadoop.apache.org/docs/r1.1.2/single_node_setup.html I followed the above instructions.

Re: HDFS to a different location other than HADOOP HOME

2013-06-11 Thread Shahab Yunus
http://hadoop.apache.org/docs/stable/hdfs-default.html dfs.data.dir${hadoop.tmp.dir}/dfs/dataDetermines where on the local filesystem an DFS data node should store its blocks. If this is a comma-delimited list of directories, then data will be stored in all named directories, typically on

Re: ALL HDFS Blocks on the Same Machine if Replication factor = 1

2013-06-10 Thread Shahab Yunus
Yeah Kai si right. You can read more details for your understanding at: http://hadoop.apache.org/docs/stable/hdfs_design.html#Data+Replication and right from the horse's mouth (Pgs 70-75):

Re: Job History files location of 2.0.4

2013-06-10 Thread Shahab Yunus
, the job history is going to be stored in hdfs under output/_logs dir. Then after the job completes, I copied back the logs to the server. Thanks a lot, Boyu On Fri, Jun 7, 2013 at 2:32 PM, Shahab Yunus shahab.yu...@gmail.comwrote: What value do you have for hadoop.log.dir property

Re: hdfsConnect/hdfsWrite API writes conetnts of file to local system instead of HDFS system

2013-06-08 Thread Shahab Yunus
I notice that for fs.default.name, you have in core-site.xml hdfs://127.0.0.1:8020 This is local host address, of your local machine so hdfs is there. Where you are expecting it to be created and where it is being created?fs Regards, Shahab On Fri, Jun 7, 2013 at 10:49 PM, Venkivolu, Dayakar

Re: Pool slot questions

2013-06-07 Thread Shahab Yunus
Sai, This is regarding all your recent emails and questions. I suggest that you read Hadoop: The Definitive Guide by Tom White ( http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1449311520) as it goes through all of your queries in detail and with examples. The questions that you are

Re: Job History files location of 2.0.4

2013-06-07 Thread Shahab Yunus
See this; http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201302.mbox/%3c1360184802.61630.yahoomail...@web141205.mail.bf1.yahoo.com%3E Regards, Shahab On Fri, Jun 7, 2013 at 4:33 PM, Boyu Zhang boyuzhan...@gmail.com wrote: Dear All, I recently moved from Hadoop0.20.2 to 2.0.4,

Re: Job History files location of 2.0.4

2013-06-07 Thread Shahab Yunus
saw the link, but it is not the case for me. I copied everything from hdfs ($HADOOP_HOME/bin/hdfs dfs -copyToLocal / $local_dir). But did not see the logs. Did it work for you? Thanks, Boyu On Fri, Jun 7, 2013 at 1:52 PM, Shahab Yunus shahab.yu...@gmail.comwrote: See this; http://mail

Re: Issue with -libjars option in cluster in Hadoop 1.0

2013-06-06 Thread Shahab Yunus
06:58 PM, Shahab Yunus wrote: Are you following the guidelines as mentioned here: http://grepalex.com/2013/02/**25/hadoop-libjars/http://grepalex.com/2013/02/25/hadoop-libjars/ Now I am, so thanks for that :-) Still doesn't work though. Following the hint in that post I looked at the job

Re: Reducer to output only json

2013-06-04 Thread Shahab Yunus
Chengi, You can also see this for pointers: http://java.dzone.com/articles/hadoop-practice Regards, Shahab On Tue, Jun 4, 2013 at 4:15 PM, Mohammad Tariq donta...@gmail.com wrote: Yes...This should do the trick. Warm Regards, Tariq cloudfront.blogspot.com On Wed, Jun 5, 2013 at 1:38

Re: Exposing hadoop web interface

2013-06-03 Thread Shahab Yunus
Are you able to access the main JobTrackerUI Page? http://foobar:50030/jobtracker.jsp If not then it can be a firewall/proxy issue (or correct usage of the foobar domain, maybe you need to to use IP or fqdn.) I am assuming that your jobs are completing successfully behind the scenes. Regards,

Re: copyToLocal Failed inside the cleanup(.........) of Map task

2013-06-03 Thread Shahab Yunus
Have you taken a look into extending the FileOutputFormat class and overriding the OutputCommitter API functionality? Regards, Shahab On Mon, Jun 3, 2013 at 5:11 PM, samir das mohapatra samir.help...@gmail.com wrote: Dear All, Is there any way to copy the intermediate output file of

Re: MR2 submit job help

2013-06-01 Thread Shahab Yunus
Is you file name file.txt or file that you are trying to upload? Have you made sure about that? Is any other command working? Have you tried copyFromLocal? Regards, Shahab On Sat, Jun 1, 2013 at 4:05 AM, Rahul Bhattacharjee rahul.rec@gmail.com wrote: you should be able to use hadoop fs

Re:

2013-06-01 Thread Shahab Yunus
It seems to me that as it is failing when you try to run with security turned on, then in those cases data cannot be written to the disk due to permissions (as you have security turned on) and when you run it without security then possible no such checks are performed and you can write data? I

Re:

2013-06-01 Thread Shahab Yunus
HI Harsh, Quick question though: why do you think it only happens if the OP 'uses security' as he mentioned? Regards, Shahab On Sat, Jun 1, 2013 at 11:49 AM, Harsh J ha...@cloudera.com wrote: Does smell like a bug as that number you get is simply Long.MAX_VALUE, or 8 exbibytes. Looking at

Re: MR2 submit job help

2013-06-01 Thread Shahab Yunus
classes where applicable put: `sample.txt': No such file or directory Native-hadoop library is not loaded what I need to do? Thanks shashidhar On Sat, Jun 1, 2013 at 6:41 PM, Shahab Yunus shahab.yu...@gmail.comwrote: Is you file name file.txt or file that you are trying to upload? Have

Re:

2013-06-01 Thread Shahab Yunus
), and the issue here doesn't have anything to do with security really. Azurry - Lets discuss the code issues on the JIRA (instead of here) or on the mapreduce-dev lists. On Sat, Jun 1, 2013 at 10:05 PM, Shahab Yunus shahab.yu...@gmail.com wrote: HI Harsh, Quick question though: why do you think

Re: File Reloading

2013-05-31 Thread Shahab Yunus
I might not have understood your usecase properly so I apologize for that. But what I think here you need is something outside of Hadoop/HDFS. I am presuming that you need to read the whole updated file when you are going to process it with your never-ending job, right? You don't expect to read

Re: File Reloading

2013-05-31 Thread Shahab Yunus
31, 2013 at 5:30 PM, Shahab Yunus shahab.yu...@gmail.comwrote: I might not have understood your usecase properly so I apologize for that. But what I think here you need is something outside of Hadoop/HDFS. I am presuming that you need to read the whole updated file when you are going

Re: Reading json format input

2013-05-30 Thread Shahab Yunus
For starters, you can specify them through the -libjars parameter when you kick off your M/R job. This way the jars will be copied to all TTs. Regards, Shahab On Thu, May 30, 2013 at 2:43 PM, jamal sasha jamalsha...@gmail.com wrote: Hi Thanks guys. I figured out the issue. Hence i have

Re: Pulling data from secured hadoop cluster to another hadoop cluster

2013-05-28 Thread Shahab Yunus
Also Samir, when you say 'secured', by any chance that cluster is secured with Kerberos (rather than ssh)? -Shahab On Tue, May 28, 2013 at 8:29 AM, Nitin Pawar nitinpawar...@gmail.comwrote: hadoop daemons do not use ssh to communicate. if your distcp job could not connect to remote server

Re: issue launching mapreduce job with kerberos secured hadoop

2013-05-28 Thread Shahab Yunus
Have you verified that the kerberos settings are configured properly in mapred-site.xml too just as in hdfs-site.xml (assuming you are using MRv1)? -Shahab On Tue, May 28, 2013 at 9:06 AM, Neeraj Chaplot geek...@gmail.com wrote: Hi All, When hadoop started with Kerberos authentication

Re: Problem in uploading file in WebHDFS

2013-05-26 Thread Shahab Yunus
If possible can you share what was the root cause? Thanks, -Shahab On Saturday, May 25, 2013, Mohammad Mustaqeem 3m.mustaq...@gmail.com wrote: I have fixed that problem. There were other problems. Anyways, thanks for your reply. On Sat, May 25, 2013 at 9:20 PM, Sanjay Subramanian

Re: ETL Tools

2013-05-21 Thread Shahab Yunus
For batch imports, I would also suggest Sqoop. Very easy to use, specially if you have mySql into the play. I have not used Sqoop 2 but that is suppose to add enterprise level robustness and admin support as well. -Shahab On Tue, May 21, 2013 at 12:17 PM, Peyman Mohajerian

Re: JobClient: Error reading task output - after instituting a DNS server

2013-05-14 Thread Shahab Yunus
HI David. an you explain in a bit more detail what was the issue? Thanks. Shahab On Tue, May 14, 2013 at 2:29 AM, David Parks davidpark...@yahoo.com wrote: I just hate it when I figure out a problem right after asking for help.*** * ** ** Finding the task logs via the task tracker

Re: Hadoop noob question

2013-05-11 Thread Shahab Yunus
@Thoihen. If the data that you are trying to load is not streaming or the data loading is not real-time in nature then why don't you use Sqoop? Relatively easy to use with not much learning curve. Regards, Shahab On Sat, May 11, 2013 at 12:03 PM, Mohammad Tariq donta...@gmail.com wrote: Sorry

Re: question about combiner

2013-05-10 Thread Shahab Yunus
@Kishore, Agreed but but shouldn't 'Reduce shuffle bytes' count decrease with the use of Combiners? Regards, Shahab On Fri, May 10, 2013 at 2:00 PM, Kishore alajangikish...@gmail.com wrote: Combiner will be used between mapper and reduce, so the mapper output for both with used combiner and

Re: Rack Aware Hadoop cluster

2013-05-08 Thread Shahab Yunus
core-site.xml http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml On Wed, May 8, 2013 at 9:43 AM, Mohammad Mustaqeem 3m.mustaq...@gmail.comwrote: Hello everyone, I was searching for how to make the hadoop cluster rack-aware and I find out from here

Re: unable to create table in Hbase

2013-05-06 Thread Shahab Yunus
Tariq donta...@gmail.com wrote: @Shahab : IMHO, he doesn't have to worry about ZK as he is running in stand alone mode. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Mon, May 6, 2013 at 11:54 PM, Shahab Yunus shahab.yu...@gmail.comwrote: I see this in the logs

VersionInfoAnnotation Unknown for Hadoop/HBase

2013-04-29 Thread Shahab Yunus
Hello, This might be something very obvious that I am missing but this has been bugging me and I am unable to find what am I missing? I have hadoop and hbase installed on Linux machine. Version 2.0.0-cdh4.1.2 and 0.92.1-cdh4.1.2 respectively. They are working and I can invoke hbase shell and

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

2013-04-29 Thread Shahab Yunus
coding class path, can you try specifying `hbase classpath` ? Cheers On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus shahab.yu...@gmail.comwrote: Hello, This might be something very obvious that I am missing but this has been bugging me and I am unable to find what am I missing? I have

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

2013-04-29 Thread Shahab Yunus
Unknown 13/04/29 09:40:31 INFO util.VersionInfo: Compiled by Unknown on Unknown I did print `hbase classpath` on the console itself and it does print paths to various libs and jars. Regards, Shahab On Mon, Apr 29, 2013 at 10:39 AM, Shahab Yunus shahab.yu...@gmail.comwrote: Ted, Sorry I didn't

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

2013-04-29 Thread Shahab Yunus
:12 PM, Shahab Yunus shahab.yu...@gmail.com wrote: Okay, I think I know what you mean. Those were back ticks! So I tried the following: java -cp `hbase classpath` org.apache.hadoop.hbase.util.VersionInfo and I still get: 13/04/29 09:40:31 INFO util.VersionInfo: HBase Unknown 13

Re: VersionInfoAnnotation Unknown for Hadoop/HBase

2013-04-29 Thread Shahab Yunus
our projects for 1.5 JVMs, and especially not the GCJ (1.5 didn't have annotations either IIRC? We depend on that here). Try with a Sun/Oracle/OpenJDK 1.6 or higher and your problem is solved. On Mon, Apr 29, 2013 at 8:24 PM, Shahab Yunus shahab.yu...@gmail.com wrote: The output of java

<    1   2