I think you have to override/extend the Comparator to achieve that,
something like what is done in Secondary Sort?
Regards,
Shahab
On Fri, Aug 30, 2013 at 9:01 PM, Adrian CAPDEFIER chivas314...@gmail.comwrote:
Howdy,
I apologise for the lack of code in this message, but the code is fairly
at 2:38 AM, Shahab Yunus shahab.yu...@gmail.comwrote:
I think you have to override/extend the Comparator to achieve that,
something like what is done in Secondary Sort?
Regards,
Shahab
On Fri, Aug 30, 2013 at 9:01 PM, Adrian CAPDEFIER chivas314...@gmail.com
wrote:
Howdy,
I apologise
See here:
http://hadoop.apache.org/docs/stable/mapred_tutorial.html#Job+Configuration
Regards,
Shahab
On Wed, Aug 28, 2013 at 7:59 AM, rab ra rab...@gmail.com wrote:
Hello
Any hint on how to pass parameters to mappers in 1.2.1 hadoop release?
There are many way to do it.
You can write your own M/R job in Java to use the provided outputformatter
and inputformatters.
Or you can use Pig to store it in HBase using HBaseStorage.
There are many ways (and resources available on the web) and he question
that you have asked is very high
.
Where can i find some examples using M/R code
On Wed, Aug 28, 2013 at 9:53 AM, Shahab Yunus shahab.yu...@gmail.comwrote:
There are many way to do it.
You can write your own M/R job in Java to use the provided
outputformatter and inputformatters.
Or you can use Pig to store it in HBase using
What you want to do with these? What do you mean by specify? Do you want to
run them as a streaming job?
Have you seen the streaming tutorial for hadoop?
Regards,
Shahab
On Aug 28, 2013 7:50 PM, Chengi Liu chengi.liu...@gmail.com wrote:
Hi,
I have four files
mapper.py
mapper_helper.py
Yes, I think so. The TaskTracker that launched the mapper and reducer in
the child JVM which further invoked the streaming process can (and does)
communicate with the JobTracker.
Regards,
Shahab
On Tue, Aug 27, 2013 at 8:34 AM, Manoj Babu manoj...@gmail.com wrote:
Team,
Does streaming
For starters (experts might have more complex reasons), what if your
respective map and reduce logic becomes complex enough to demand separate
classes? Why tie the clients to implement both by moving these in one Job
interface. In the current design you can always implement both (map and
reduce)
One idea is, you can use the exclusion property of maven (provided you are
using that to build your application) while including hadoop dependencies
and exclude sl4j that is coming within hadoop and then include your own
sl4j as a separate dependency. Something like this:
dependency
As far as I undersstand, StringTokenizer.nextToken returns Java String type
object which does not implement the required Writable and Comparable
interfaces needed to Hadoop Mapreduce serialization and transport. The Text
class does that and is compatible and thus that is why that is being used
to
You say:
Each map process gets a line. The map process will then do a file transfer
and process it.
What file, from where to where is being transferred in the map? Are you
sure that the mappers are not complaining about 'this' file access? Because
this seem to be separate from the initial data
@Jan, why not, not send the 'hidden' part of the key as a value? Why not
then pass value as null or with some other value part. So in the reducer
side there is no duplication and you can extract the 'hidden' part of the
key yourself (which should be possible as you will be encapsulating it in a
Original message
Subject: Re: Partitioner vs GroupComparator
From: Shahab Yunus shahab.yu...@gmail.com
To: user@hadoop.apache.org user@hadoop.apache.org
CC:
@Jan, why not, not send the 'hidden' part of the key as a value? Why not
then pass value as null or with some other
You are here running multiple UNIX commands and the end result or the end
command is to run hbase-YOUR VERSION.jar using hadoop's *jar* command. So
basically you add HBase jars to the classpath of your Hadoop environment
and then execute hbase tools using hadoop. If you get the message as
As far as I understand (and experts can correct me), the file being written
will be visible once one HDFS block size worth of data is written. This
applies to subsequent writing as well. Basically a block size worth of data
is the level of coherency, the size/unit of data for which data durability
There are quite a few examples on the web, if you search for them. A quick
search yielded the following e.g.:
http://jaganadhg.freeflux.net/blog/archive/tag/example/
http://techannotation.wordpress.com/2012/09/10/hadoop-in-practice/
Regards,
Shahab
On Mon, Aug 19, 2013 at 5:54 AM, Devaraj k
For 1, yes the data is written in chunks to hdfs if you are using the FiLe
System API. The whole file is not first stored in memory.
For 2, I think you should anyway shouldn't rely on an exception or
'not-closing' the writer for cleaning up the partially written file. It is
not a safe and
Also Sam, following are is a link giving example about how to implement
secondary sort and what it is...
http://codingjunkie.net/secondary-sort/
Regards,
Shahab
On Tue, Aug 13, 2013 at 3:51 PM, Sam Garrett s...@actionx.com wrote:
I am working on a MapReduce job where I would like to have the
You need to configure your namenode and jobtracker information in the
configuration files within you application. Only set the relevant
properties in the copy of the files that you are bundling in your job. For
rest the default values would be used from the default configuration files
You should not use LocalJobRunner. Make sure that the mapred.job.tracker
property does not point to 'local' an instead to your job-tracker host and
port.
*But before that* as Sandy said, your client machine (from where you will
be kicking of your jobs and apps) should be using config files which
On Tue, Aug 13, 2013 at 6:07 PM, Brad Cox bradj...@gmail.com wrote:
That link got my hopes up. But Cloudera Manager (what I'm running; on
CDH4) does not offer an Export Client Config option. What am I missing?
On Aug 13, 2013, at 4:04 PM, Shahab Yunus shahab.yu...@gmail.com wrote:
You should
Given that your questions are very broad and at high level, I would suggest
that you should pick up a book or such to go through that. The Hadoop:
Definitive Guide by Tom White is a great book to start with it.
Meanwhile some links to start with:
I think you need to have a user folder (with whatever user you are running
hadoop) and put your files there. In fact that folder should be already
created there. The hdfsr data dir property is for setting the path to the
data files.
Regards,
Shahab
On Fri, Aug 2, 2013 at 9:38 PM, Huy Pham
I think uou can use NullWritable as key.
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/NullWritable.html
Regards,
Shahab
On Thu, Jul 25, 2013 at 2:58 PM, Felipe Gutierrez
felipe.o.gutier...@gmail.com wrote:
I did a MapReduce program to execute a Grep function. I know there
, Shahab Yunus shahab.yu...@gmail.comwrote:
I think uou can use NullWritable as key.
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/NullWritable.html
Regards,
Shahab
On Thu, Jul 25, 2013 at 2:58 PM, Felipe Gutierrez
felipe.o.gutier...@gmail.com wrote:
I did a MapReduce
cloudfront.blogspot.com
On Fri, Jul 26, 2013 at 12:48 AM, Felipe Gutierrez
felipe.o.gutier...@gmail.com wrote:
Sorry, I think I didnt understand,
Does NullWritable go to replate MyWritable? But this is may value. My
key is a Text.
Regards,
Felipe
On Thu, Jul 25, 2013 at 4:07 PM, Shahab Yunus
I think you will have to write a custom code to handle this.
Regards,
Shahab
On Tue, Jul 23, 2013 at 3:50 AM, Fatih Haltas fatih.hal...@nyu.edu wrote:
At those columns, I am using uint type. I tried to cast them via sqoop
option still it gave the same error.
For other columns having type
See this
https://sites.google.com/site/hadoopandhive/home/how-to-read-all-files-in-a-directory-in-hdfs-using-hadoop-filesystem-api
and
http://hadoop.apache.org/docs/stable/api/org/apache/hadoop/fs/FileSystem.html#isDirectory(org.apache.hadoop.fs.Path)
Basically you can write your own function,
Any error messages or details or logs would be helpful in advising.
Plus you are saying your loading FROM teradata. Where are you loading TO?
How does HDFS (and the 100 files on it) comes into the picture?
Regards,
Shahab
On Mon, Jul 22, 2013 at 12:06 PM, suneel hadoop
The error is:
*Please set $HBASE_HOME to the root of your HBase installation.*
*
*
Have you checked whether it is set or not? Have you verified your HBase or
Hadoop installation?
Similarly, the following:
*Cannot run program psql: java.io.IOException: error=2, No such file or
directory *
Also
Great. Can you please share, if possible, what was the problem and how you
solved it? Thanks.
Regards,
Shahab
On Tue, Jul 16, 2013 at 9:58 AM, Fatih Haltas fatih.hal...@nyu.edu wrote:
Thanks Shahab, I solved my problem, in anyother way,
of a process in a particular time point (looked like a
snapshot of the status of the process). If these numbers are added, the sum
would be much more than memory allocated to the program.
On Fri, Jul 12, 2013 at 6:47 AM, Shahab Yunus shahab.yu...@gmail.comwrote:
I think they are cumulative
Also, once you have the array of URIs after calling getCacheFiles you can
iterate over them using File class or Path (
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/Path.html#Path(java.net.URI)
)
Regards,
Shahab
On Wed, Jul 10, 2013 at 5:08 PM, Omkar Joshi
Have you verified that the 'input' folder exists on the hdfs (singel node
setup) that you are job needs?
Regards,
Shahab
On Wed, Jun 26, 2013 at 10:53 AM, Peng Yu pengyu...@gmail.com wrote:
Hi,
http://hadoop.apache.org/docs/r1.1.2/single_node_setup.html
I followed the above instructions.
http://hadoop.apache.org/docs/stable/hdfs-default.html
dfs.data.dir${hadoop.tmp.dir}/dfs/dataDetermines where on the local
filesystem an DFS data node should store its blocks. If this is a
comma-delimited list of directories, then data will be stored in all named
directories, typically on
Yeah Kai si right.
You can read more details for your understanding at:
http://hadoop.apache.org/docs/stable/hdfs_design.html#Data+Replication
and right from the horse's mouth (Pgs 70-75):
, the job history is going to be stored in hdfs
under output/_logs dir.
Then after the job completes, I copied back the logs to the server.
Thanks a lot,
Boyu
On Fri, Jun 7, 2013 at 2:32 PM, Shahab Yunus shahab.yu...@gmail.comwrote:
What value do you have for hadoop.log.dir property
I notice that for fs.default.name, you have in core-site.xml
hdfs://127.0.0.1:8020
This is local host address, of your local machine so hdfs is there. Where
you are expecting it to be created and where it is being created?fs
Regards,
Shahab
On Fri, Jun 7, 2013 at 10:49 PM, Venkivolu, Dayakar
Sai,
This is regarding all your recent emails and questions. I suggest that you
read Hadoop: The Definitive Guide by Tom White (
http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/1449311520) as
it goes through all of your queries in detail and with examples. The
questions that you are
See this;
http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201302.mbox/%3c1360184802.61630.yahoomail...@web141205.mail.bf1.yahoo.com%3E
Regards,
Shahab
On Fri, Jun 7, 2013 at 4:33 PM, Boyu Zhang boyuzhan...@gmail.com wrote:
Dear All,
I recently moved from Hadoop0.20.2 to 2.0.4,
saw the link, but it is not the case for me. I copied everything from
hdfs ($HADOOP_HOME/bin/hdfs dfs -copyToLocal / $local_dir). But did not see
the logs.
Did it work for you?
Thanks,
Boyu
On Fri, Jun 7, 2013 at 1:52 PM, Shahab Yunus shahab.yu...@gmail.comwrote:
See this;
http://mail
06:58 PM, Shahab Yunus wrote:
Are you following the guidelines as mentioned here:
http://grepalex.com/2013/02/**25/hadoop-libjars/http://grepalex.com/2013/02/25/hadoop-libjars/
Now I am, so thanks for that :-)
Still doesn't work though. Following the hint in that
post I looked at the job
Chengi,
You can also see this for pointers:
http://java.dzone.com/articles/hadoop-practice
Regards,
Shahab
On Tue, Jun 4, 2013 at 4:15 PM, Mohammad Tariq donta...@gmail.com wrote:
Yes...This should do the trick.
Warm Regards,
Tariq
cloudfront.blogspot.com
On Wed, Jun 5, 2013 at 1:38
Are you able to access the main JobTrackerUI Page?
http://foobar:50030/jobtracker.jsp
If not then it can be a firewall/proxy issue (or correct usage of the
foobar domain, maybe you need to to use IP or fqdn.) I am assuming that
your jobs are completing successfully behind the scenes.
Regards,
Have you taken a look into extending the FileOutputFormat class and
overriding the OutputCommitter API functionality?
Regards,
Shahab
On Mon, Jun 3, 2013 at 5:11 PM, samir das mohapatra samir.help...@gmail.com
wrote:
Dear All,
Is there any way to copy the intermediate output file of
Is you file name file.txt or file that you are trying to upload? Have you
made sure about that? Is any other command working? Have you tried
copyFromLocal?
Regards,
Shahab
On Sat, Jun 1, 2013 at 4:05 AM, Rahul Bhattacharjee rahul.rec@gmail.com
wrote:
you should be able to use hadoop fs
It seems to me that as it is failing when you try to run with security
turned on, then in those cases data cannot be written to the disk due to
permissions (as you have security turned on) and when you run it without
security then possible no such checks are performed and you can write data?
I
HI Harsh,
Quick question though: why do you think it only happens if the OP 'uses
security' as he mentioned?
Regards,
Shahab
On Sat, Jun 1, 2013 at 11:49 AM, Harsh J ha...@cloudera.com wrote:
Does smell like a bug as that number you get is simply Long.MAX_VALUE,
or 8 exbibytes.
Looking at
classes where
applicable
put: `sample.txt': No such file or directory
Native-hadoop library is not loaded
what I need to do?
Thanks
shashidhar
On Sat, Jun 1, 2013 at 6:41 PM, Shahab Yunus shahab.yu...@gmail.comwrote:
Is you file name file.txt or file that you are trying to upload? Have
), and the issue here
doesn't have anything to do with security really.
Azurry - Lets discuss the code issues on the JIRA (instead of here) or
on the mapreduce-dev lists.
On Sat, Jun 1, 2013 at 10:05 PM, Shahab Yunus shahab.yu...@gmail.com
wrote:
HI Harsh,
Quick question though: why do you think
I might not have understood your usecase properly so I apologize for that.
But what I think here you need is something outside of Hadoop/HDFS. I am
presuming that you need to read the whole updated file when you are going
to process it with your never-ending job, right? You don't expect to read
31, 2013 at 5:30 PM, Shahab Yunus shahab.yu...@gmail.comwrote:
I might not have understood your usecase properly so I apologize for
that.
But what I think here you need is something outside of Hadoop/HDFS. I am
presuming that you need to read the whole updated file when you are going
For starters, you can specify them through the -libjars parameter when you
kick off your M/R job. This way the jars will be copied to all TTs.
Regards,
Shahab
On Thu, May 30, 2013 at 2:43 PM, jamal sasha jamalsha...@gmail.com wrote:
Hi Thanks guys.
I figured out the issue. Hence i have
Also Samir, when you say 'secured', by any chance that cluster is secured
with Kerberos (rather than ssh)?
-Shahab
On Tue, May 28, 2013 at 8:29 AM, Nitin Pawar nitinpawar...@gmail.comwrote:
hadoop daemons do not use ssh to communicate.
if your distcp job could not connect to remote server
Have you verified that the kerberos settings are configured properly in
mapred-site.xml too just as in hdfs-site.xml (assuming you are using MRv1)?
-Shahab
On Tue, May 28, 2013 at 9:06 AM, Neeraj Chaplot geek...@gmail.com wrote:
Hi All,
When hadoop started with Kerberos authentication
If possible can you share what was the root cause? Thanks,
-Shahab
On Saturday, May 25, 2013, Mohammad Mustaqeem 3m.mustaq...@gmail.com
wrote:
I have fixed that problem.
There were other problems.
Anyways, thanks for your reply.
On Sat, May 25, 2013 at 9:20 PM, Sanjay Subramanian
For batch imports, I would also suggest Sqoop. Very easy to use, specially
if you have mySql into the play. I have not used Sqoop 2 but that is
suppose to add enterprise level robustness and admin support as well.
-Shahab
On Tue, May 21, 2013 at 12:17 PM, Peyman Mohajerian
HI David. an you explain in a bit more detail what was the issue? Thanks.
Shahab
On Tue, May 14, 2013 at 2:29 AM, David Parks davidpark...@yahoo.com wrote:
I just hate it when I figure out a problem right after asking for help.***
*
** **
Finding the task logs via the task tracker
@Thoihen. If the data that you are trying to load is not streaming or the
data loading is not real-time in nature then why don't you use
Sqoop? Relatively easy to use with not much learning curve.
Regards,
Shahab
On Sat, May 11, 2013 at 12:03 PM, Mohammad Tariq donta...@gmail.com wrote:
Sorry
@Kishore, Agreed but but shouldn't 'Reduce shuffle bytes' count decrease
with the use of Combiners?
Regards,
Shahab
On Fri, May 10, 2013 at 2:00 PM, Kishore alajangikish...@gmail.com wrote:
Combiner will be used between mapper and reduce, so the mapper output for
both with used combiner and
core-site.xml
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml
On Wed, May 8, 2013 at 9:43 AM, Mohammad Mustaqeem
3m.mustaq...@gmail.comwrote:
Hello everyone,
I was searching for how to make the hadoop cluster rack-aware and I
find out from here
Tariq donta...@gmail.com wrote:
@Shahab : IMHO, he doesn't have to worry about ZK as he is running in
stand alone mode.
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Mon, May 6, 2013 at 11:54 PM, Shahab Yunus shahab.yu...@gmail.comwrote:
I see this in the logs
Hello,
This might be something very obvious that I am missing but this has been
bugging me and I am unable to find what am I missing?
I have hadoop and hbase installed on Linux machine. Version 2.0.0-cdh4.1.2
and 0.92.1-cdh4.1.2 respectively. They are working and I can invoke hbase
shell and
coding class path, can you try specifying `hbase
classpath` ?
Cheers
On Mon, Apr 29, 2013 at 5:52 AM, Shahab Yunus shahab.yu...@gmail.comwrote:
Hello,
This might be something very obvious that I am missing but this has been
bugging me and I am unable to find what am I missing?
I have
Unknown
13/04/29 09:40:31 INFO util.VersionInfo: Compiled by Unknown on Unknown
I did print `hbase classpath` on the console itself and it does print paths
to various libs and jars.
Regards,
Shahab
On Mon, Apr 29, 2013 at 10:39 AM, Shahab Yunus shahab.yu...@gmail.comwrote:
Ted, Sorry I didn't
:12 PM, Shahab Yunus shahab.yu...@gmail.com
wrote:
Okay, I think I know what you mean. Those were back ticks!
So I tried the following:
java -cp `hbase classpath` org.apache.hadoop.hbase.util.VersionInfo
and I still get:
13/04/29 09:40:31 INFO util.VersionInfo: HBase Unknown
13
our projects for 1.5 JVMs, and especially not the GCJ
(1.5 didn't have annotations either IIRC? We depend on that here). Try
with a Sun/Oracle/OpenJDK 1.6 or higher and your problem is solved.
On Mon, Apr 29, 2013 at 8:24 PM, Shahab Yunus shahab.yu...@gmail.com
wrote:
The output of java
101 - 167 of 167 matches
Mail list logo