mappers-node relationship

2013-01-24 Thread jamal sasha
Hi. A very very lame question. Does numbers of mapper depends on the number of nodes I have? How I imagine map-reduce is this. For example in word count example I have bunch of slave nodes. The documents are distributed across these slave nodes. Now depending on how big the data is, it will sprea

Re: How to Backup HDFS data ?

2013-01-24 Thread Mahesh Balija
Hi Steve, On top of Harsh answer, other than Backup there is a feature called Snapshot offered by some third party vendors like MapR. Though its not really a backup it is just a point for which you can revert back at any point in time. Best, Mahesh Balija, CalsoftLabs. On

Re: Copy files from remote folder to HDFS

2013-01-24 Thread Nitin Pawar
if this is one time activity then just download hadoop binaries from apache replace the hdfs-site.xml and core-site.xml with the one you have on hadoop cluster allow this machine to connect with hadoop cluster then you can just do it with hadoop command line scripts. On Fri, Jan 25, 2013 at 1:0

Re: How to Backup HDFS data ?

2013-01-24 Thread Ted Dunning
Incremental backups are nice to avoid copying all your data again. You can code these at the application layer if you have nice partitioning and keep track correctly. You can also use platform level capabilities such as provided for by the MapR distribution. On Fri, Jan 25, 2013 at 3:23 PM, Hars

Re: Copy files from remote folder to HDFS

2013-01-24 Thread Mahesh Balija
Hi Panshul, I am also working on similar requirement, one approach is, mount your remote folder on your hadoop master node. And simply write a shell script to copy the files to HDFS using crontab. I believe Flume is literally a wrong choice as Flume is a da

Re: How to Backup HDFS data ?

2013-01-24 Thread Harsh J
You need some form of space capacity on the backup cluster that can withstand it. Lower replication (<3) may also be an option there to save yourself some disks/nodes? On Fri, Jan 25, 2013 at 5:04 AM, Steve Edison wrote: > Backup to disks is what we do right now. Distcp would copy across HDFS > c

Re: Filesystem closed exception

2013-01-24 Thread Harsh J
It is pretty much the same in 0.20.x as well, IIRC. Your two points are also correct (for a fix to this). Also see: https://issues.apache.org/jira/browse/HADOOP-7973. On Fri, Jan 25, 2013 at 6:56 AM, Hemanth Yamijala wrote: > Hi, > > We are noticing a problem where we get a filesystem closed exce

Re: UTF16

2013-01-24 Thread Nitin Pawar
hive by default supports utf-8 I am not sure about utf-16 you can refer to this https://issues.apache.org/jira/browse/HIVE-2859 On Fri, Jan 25, 2013 at 10:23 AM, Koert Kuipers wrote: > is it safe to upload UTF16 encoded (unicode) text files to hadoop for > processing by map-reduce, hive, pig,

MapFileOutputFormat class is not found in hadoop-core 1.1.1

2013-01-24 Thread feng lu
Hi all i want to migrate the nutch WebGraph class to new MapReduce API, https://issues.apache.org/jira/browse/NUTCH-1223 . but i found MapFileOutputFormat class is not in the org.apache.hadoop.mapreduce.lib.output package in hadoop-core-1.1.1. I don't know why this class is deleted in hadoop-cor

Re: Copy files from remote folder to HDFS

2013-01-24 Thread Mohammad Tariq
Hello Panshul, You might find flume useful. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Fri, Jan 25, 2013 at 6:39 AM, Panshul Whisper wrote: > Hello, > > I am trying to copy files, Json files from a remote folder - (a folder on > my l

UTF16

2013-01-24 Thread Koert Kuipers
is it safe to upload UTF16 encoded (unicode) text files to hadoop for processing by map-reduce, hive, pig, etc? thanks! koert

Re: Problems

2013-01-24 Thread ke yuan
is there anything done with hardware? i used thinkpad t430,this problem occurs,but i used about 100 machines ,there is nothing to do with this ,all the machines is redhat 6.0,and the jdk is jdk1.5 to jdk1.6 , so i think there is something to do with the hardware,any idea? 2013/1/22 Jean-Marc Spag

Re: Spring for hadoop

2013-01-24 Thread Radim Kolar
Dne 23.1.2013 22:55, Panshul Whisper napsal(a): Hello Radim, Your solution sounds interesting. Is it possible for me to try the solution before I buy it? i do not ship demo version since it would be identical to production version. I do onsite presentations with live demo and you will get all

Re: hdfs du periodicity and hdfs not respond at that time

2013-01-24 Thread Xibin Liu
I'm using ext3 and use df instead of du is a good way to solve this problem, thanks you all 2013/1/24 Harsh J > I missed the periodicity part of your question. Unfortunately the "du" > refresh interval is hard-coded today, although the "df" interval is > configurable. Perhaps this is a bug - I f

Re: Moving data in hadoop

2013-01-24 Thread Raj hadoop
Thank you.I am looking into it now. On Fri, Jan 25, 2013 at 7:27 AM, Mohit Anchlia wrote: > Have you looked at distcp? > > > On Thu, Jan 24, 2013 at 5:55 PM, Raj hadoop wrote: > >> Hi, >> >> Can you please suggest me what is the good way to move 1 peta byte of >> data from one cluster to another

unsubscribe

2013-01-24 Thread 徐永睿
2013-01-25

Any streaming options that specify data type of key value pairs?

2013-01-24 Thread Yuncong Chen
Hi, What would be the best way in hadoop streaming to send a binary object (i.e. a python dict, array) as the value in pairs? I know I can dump the object to string with pickle.dumps() and encode it to eliminate unintended '\t','\n' before sending it to stdout, but I wonder if there are nati

Re: Moving data in hadoop

2013-01-24 Thread Mohit Anchlia
Have you looked at distcp? On Thu, Jan 24, 2013 at 5:55 PM, Raj hadoop wrote: > Hi, > > Can you please suggest me what is the good way to move 1 peta byte of data > from one cluster to another cluster? > > Thanks > Raj >

Moving data in hadoop

2013-01-24 Thread Raj hadoop
Hi, Can you please suggest me what is the good way to move 1 peta byte of data from one cluster to another cluster? Thanks Raj

Re: Help with DataDrivenDBInputFormat: splits are created properly but zero records are sent to the mappers

2013-01-24 Thread Stephen Boesch
It turns out to be an apparent problem in one of the two methods of DataDrivenDBAPi.setInput(). The version I used does not work as shown: it needs to have a primary key column set somehow. But no information / documentation on how to set the pkcol that I could find. So I converted to using the

Filesystem closed exception

2013-01-24 Thread Hemanth Yamijala
Hi, We are noticing a problem where we get a filesystem closed exception when a map task is done and is finishing execution. By map task, I literally mean the MapTask class of the map reduce code. Debugging this we found that the mapper is getting a handle to the filesystem object and itself calli

Copy files from remote folder to HDFS

2013-01-24 Thread Panshul Whisper
Hello, I am trying to copy files, Json files from a remote folder - (a folder on my local system, Cloudfiles folder or a folder on S3 server) to the HDFS of a cluster running at a remote location. The job submitting Application is based on Spring Hadoop. Can someone please suggest or point me in

Re: How to Backup HDFS data ?

2013-01-24 Thread Steve Edison
Backup to disks is what we do right now. Distcp would copy across HDFS clusters, meaning by I will have to build another 12 node cluster ? Is that correct ? On Thu, Jan 24, 2013 at 3:32 PM, Mathias Herberts < mathias.herbe...@gmail.com> wrote: > Backup on tape or on disk? > > On disk, have anoth

Re: How to Backup HDFS data ?

2013-01-24 Thread Mathias Herberts
Backup on tape or on disk? On disk, have another Hadoop cluster dans do regular distcp. On tape, make sure you have a backup program which can backup streams so you don't have to materialize your TB files outside of your Hadoop cluster first... (I know Simpana can't do that :-(). On Fri, Jan 25,

How to Backup HDFS data ?

2013-01-24 Thread Steve Edison
Folks, Its been an year and my HDFS / Solar /Hive setup is working flawless. The data logs which were meaningless to my business all of a sudden became precious to the extent that our management wants to backup this data. I am talking about 20 TB of active HDFS data with an incremental of 2 TB/mon

Error after installing Hadoop-1.0.4

2013-01-24 Thread Deepti Garg
Hi, I installed Hadoop 1.0.4 from this link on my Windows 7 machine using cygwin: http://hadoop.apache.org/docs/r1.0.4/single_node_setup.html I am getting an error on running bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+' Error: cd: pattern "--" not found in "c:

unsubscribe

2013-01-24 Thread Luiz Fernando Figueiredo

Re: Submitting MapReduce job from remote server using JobClient

2013-01-24 Thread bejoy . hadoop
Hi Amit, Apart for the hadoop jars, Do you have the same config files ($HADOOP_HOME/conf) that are in the cluster on your analytics server as well? If you are having the default config files in analytics server then your MR job would be running locally and not on the cluster. Regards Bejoy K

Re: Support of RHEL version

2013-01-24 Thread Alexander Alten-Lorenz
Do you mean Transparent Huge Page Defrag (https://bugzilla.redhat.com/show_bug.cgi?id=805593)? Do echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag - Alex On Jan 24, 2013, at 6:46 PM, Dheeren Bebortha wrote: > I do not think this is a cdh specific issues. If this is a hbase compa

RE: Support of RHEL version

2013-01-24 Thread Dheeren Bebortha
I do not think this is a cdh specific issues. If this is a hbase compaction issue it would be all pervading as long as it si RHEL62 and above! Am I reading it right? -Dheeren -Original Message- From: Alexander Alten-Lorenz [mailto:wget.n...@gmail.com] Sent: Thursday, January 24, 2013 9:4

Re: Support of RHEL version

2013-01-24 Thread Alexander Alten-Lorenz
Hi, Moving the post to cdh-u...@cloudera.org (https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!forum/cdh-user) as it is CDH4 you specifically are asking about. BCC'd user@hadoop lists, lets carry forward the discussion on the CDH lists. My response below. RHEL 5.x and 6.x wil be supp

Support of RHEL version

2013-01-24 Thread Nilesh_Sangani
Hi, We are working on implementing Cloudera distributed Hadoop (CDH 4.x) on our environment. Cloudera website talks about supporting RHEL 6.1 version with challenges/issues with the newer version. It also though provides a workaround for it. Wanted to hear from the community on the supported v

Re: Modifying Hadoop For join Operation

2013-01-24 Thread Praveen Sripati
Vikas, Check the below paper on the different ways on performing joins in MR http://lintool.github.com/MapReduceAlgorithms/index.html Also, `Hadoop - The Definitive Guide` has a section on the different approaches and when to use them. Thanks, Praveen Cloudera Certified Developer for Apache H

Re: Submitting MapReduce job from remote server using JobClient

2013-01-24 Thread Amit Sela
Hi Harsh, I'm using Job.waitForCompletion() method to run the job but I can't see it in the webapp and it doesn't seem to finish... I get: *org.apache.hadoop.mapred.JobClient - Running job: job_local_0001* *INFO org.apache.hadoop.util.ProcessTree

Re: HDFS File/Folder permission control with POSIX standard

2013-01-24 Thread Harsh J
Hi, Please give the HDFS Permissions Guide a read, it should answer your questions: http://hadoop.apache.org/docs/stable/hdfs_permissions_guide.html On Thu, Jan 24, 2013 at 9:17 PM, Dhanasekaran Anbalagan wrote: > Hi Guys, > > In our scenario we have two hdfs user, research and development > > n

Re: Submitting MapReduce job from remote server using JobClient

2013-01-24 Thread Harsh J
The Job class itself has a blocking and non-blocking submitter that is similar to JobConf's runJob method you discovered. See http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/Job.html#submit() and its following method waitForCompletion(). These seem to be what you're looking fo

Re: Join Operation Using Hadoop MapReduce

2013-01-24 Thread Harsh J
The Hadoop: The Definitive Guide (and also other books) has a detailed topic on Joins and types of Joins in MR, in its MapReduce chapters. Looking the word up in the index would probably help you find some good things to read on this topic. On Thu, Jan 24, 2013 at 6:48 PM, Vikas Jadhav wrote: > H

Re: Modifying Hadoop For join Operation

2013-01-24 Thread Harsh J
Hi, Can you also define 'efficient way' and the idea you have in mind to implement that isn't already doable today? On Thu, Jan 24, 2013 at 6:51 PM, Vikas Jadhav wrote: > Anyone has idea about how should i modify Hadoop Code for > Performing Join operation in efficient Way. > Thanks. > > -- > >

Re: hdfs du periodicity and hdfs not respond at that time

2013-01-24 Thread Harsh J
I missed the periodicity part of your question. Unfortunately the "du" refresh interval is hard-coded today, although the "df" interval is configurable. Perhaps this is a bug - I filed https://issues.apache.org/jira/browse/HADOOP-9241 to make it configurable. Also, your problem reminded me of a si

Re: hdfs du periodicity and hdfs not respond at that time

2013-01-24 Thread Chris Embree
What type of FS are you using under HDFS? XFS, ext3, ext4? The type and configuration of the underlying FS will impact performance. Most notably, ext3 has a lock-up effect when flushing disk cache. On Thu, Jan 24, 2013 at 2:54 AM, Xibin Liu wrote: > Thanks, http://search-hadoop.com/m/LLBgUiH0

Submitting MapReduce job from remote server using JobClient

2013-01-24 Thread Amit Sela
Hi all, I want to run a MapReduce job using the Hadoop Java api from my analytics server. It is not the master or even a data node but it has the same Hadoop installation as all the nodes in the cluster. I tried using JobClient.runJob() but it accepts JobConf as argument and when using JobConf it

Re: Hadoop Nutch Mkdirs failed to create file

2013-01-24 Thread samir das mohapatra
just try to apply $>chmod 755 -R /home/wj/apps/apache-nutch-1.6 then try after it. On Wed, Jan 23, 2013 at 9:23 PM, 吴靖 wrote: > hi, everyone! > I want use the nutch to crawl the web pages, but problem comes as the > log like, I think it maybe some permissions problem,but i am not sure. > A

RE: Hadoop Cluster

2013-01-24 Thread Henjarappa, Savitha
Thank you so much for your quick response. From: Mohammad Tariq [mailto:donta...@gmail.com] Sent: Tuesday, January 22, 2013 11:51 PM To: user@hadoop.apache.org; Bejoy Ks Subject: Re: Hadoop Cluster The most significant difference between the two, as per my view, is that HA eliminates the problem