Sending the entire file content as value to the mapper

2013-07-11 Thread Kasi Subrahmanyam
Hi Team, I have a file which has semi structured text data with no definite start and end points. How can i send the entire content of the file at once as key or value to the mapper instead of line by line. Thanks, Subbu

RE: Sending the entire file content as value to the mapper

2013-07-11 Thread Charles Baker
Hi Subbu. Sounds like you'll have to implement a custom non-splittable InputFormat which instantiates a custom RecordReader which in turn consumes the entire file when it's next(K,V) method is called. Once implemented, you specify the input format to the JobConf object:

Sending the entire file content as value to the mapper

2013-07-11 Thread Kasi Subrahmanyam
Hi Team, I have a file which has semi structured text data with no definite start and end points. How can i send the entire content of the file at once as key or value to the mapper instead of line by line. Thanks, Subbu

RE: Sending the entire file content as value to the mapper

2013-07-11 Thread Devaraj k
Hi, You could send the file meta info to the map function as key/value through the split, and then you can read the entire file in your map function. Thanks Devaraj k -Original Message- From: Kasi Subrahmanyam [mailto:kasisubbu...@gmail.com] Sent: 11 July 2013 13:38 To:

Re: Task failure in slave node

2013-07-11 Thread devara...@huawei.com
Hi, It seems mahout-examples-0.7-job.jar is depending on other jars/classes. While running Job Tasks it is not able to find those classes in the classpath and failing those tasks. You need to provide the dependent jar files while submitting/running Job. Thanks Devaraj k -- View this

Datanodes using public ip, why?

2013-07-11 Thread Ben Kim
Hello Hadoop Community! I've setup datanodes with private network by adding private hostname's to the slaves file. but it looks like when i lookup on the webUI datenodes are registered with public hostnames. are they actually networking with public network? all datanodes have eth0 with public

Re: Datanodes using public ip, why?

2013-07-11 Thread Thanh Do
have you tried playing with this config parameter dfs.datanode.dns.interface ? On Thu, Jul 11, 2013 at 4:20 AM, Ben Kim benkimkim...@gmail.com wrote: Hello Hadoop Community! I've setup datanodes with private network by adding private hostname's to the slaves file. but it looks like when

Re: Datanodes using public ip, why?

2013-07-11 Thread Alex Levin
make sure that your hostnames resolved ( dns or/and hosts files ) with private IPs. if you have records in the nodes hosts files like public IP hosname remove (or comment) them Alex On Jul 11, 2013 2:21 AM, Ben Kim benkimkim...@gmail.com wrote: Hello Hadoop Community! I've setup datanodes

Re: ConnectionException in container, happens only sometimes

2013-07-11 Thread Andrei
Here are logs of RM and 2 NMs: RM (master-host): http://pastebin.com/q4qJP8Ld NM where AM ran (slave-1-host): http://pastebin.com/vSsz7mjG NM where slave container ran (slave-2-host): http://pastebin.com/NMFi6gRp The only related error I've found in them is the following (from RM logs): ...

Task failure in slave node

2013-07-11 Thread Margusja
Hi I have tow nodes: n1 (master, salve) and n2 (slave) after set up I ran wordcount example and it worked fine: [hduser@n1 ~]$ hadoop jar /usr/local/hadoop/hadoop-examples-1.0.4.jar wordcount /user/hduser/gutenberg /user/hduser/gutenberg-output 13/07/11 15:30:44 INFO input.FileInputFormat:

copy files from ftp to hdfs in parallel, distcp failed

2013-07-11 Thread Hao Ren
Hi, I am running a hdfs on Amazon EC2 Say, I have a ftp server where stores some data. I just want to copy these data directly to hdfs in a parallel way (which maybe more efficient). I think hadoop distcp is what I need. But $ bin/hadoop distcp

Cloudera links and Document

2013-07-11 Thread Sathish Kumar
Hi All, Can anyone help me the link or document that explain the below. How Cloudera Manager works and handle the clusters (Agent and Master Server)? How the Cloudera Manager Process Flow works? Where can I locate Cloudera configuration files and explanation in brief? Regards Sathish

Re: Task failure in slave node

2013-07-11 Thread Azuryy Yu
sorry for typo, mahout, not mahou. sent from mobile On Jul 11, 2013 9:40 PM, Azuryy Yu azury...@gmail.com wrote: hi, put all mahou jars under hadoop_home/lib, then restart cluster. On Jul 11, 2013 8:45 PM, Margusja mar...@roo.ee wrote: Hi I have tow nodes: n1 (master, salve) and n2

Re: Cloudera links and Document

2013-07-11 Thread Ram
Hi, Go through the links. http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/latest/Cloudera-Manager-Managing-Clusters/cmmc_CM_architecture.html

Re: Task failure in slave node

2013-07-11 Thread Margusja
Than you, it resolved the problem. Funny, I don't remember that I copied mahout libs to n1 hadoop but there they are. Tervitades, Margus (Margusja) Roo +372 51 48 780 http://margus.roo.ee skype: margusja -BEGIN PUBLIC KEY-

RE: New Distributed Cache

2013-07-11 Thread Botelho, Andrew
So in my driver code, I try to store the file in the cache with this line of code: job.addCacheFile(new URI(file location)); Then in my Mapper code, I do this to try and access the cached file: URI[] localPaths = context.getCacheFiles(); File f = new File(localPaths[0]); However, I get a

Re: Cloudera links and Document

2013-07-11 Thread Suresh Srinivas
Sathish, this mailing list for Apache Hadoop related questions. Please post questions related to other distributions to appropriate vendor's mailing list. On Thu, Jul 11, 2013 at 6:28 AM, Sathish Kumar sa848...@gmail.com wrote: Hi All, Can anyone help me the link or document that explain

How are 'PHYSICAL_MEMORY_BYTES' and 'VIRTUAL_MEMORY_BYTES' calculated?

2013-07-11 Thread hadoop qi
Hello, I am wondering how memory counters 'PHYSICAL_MEMORY_BYTES' and 'VIRTUAL_MEMORY_BYTES' are calculated? They are peaks of memory usage or cumulative usage? Thanks for help,

Re: Cloudera links and Document

2013-07-11 Thread Alejandro Abdelnur
Satish, the right alias for Cloudera Manager questions scm-us...@cloudera.org Thanks On Thu, Jul 11, 2013 at 9:20 AM, Suresh Srinivas sur...@hortonworks.comwrote: Sathish, this mailing list for Apache Hadoop related questions. Please post questions related to other distributions to

Re: New Distributed Cache

2013-07-11 Thread Omkar Joshi
Yeah Andrew.. there seems to be some problem with context.getCacheFiles() api which is returning null.. Path[] cachedFilePaths = context.getLocalCacheFiles(); // I am checking why it is deprecated... for (Path cachedFilePath : cachedFilePaths) { File cachedFile = new

Re: copy files from ftp to hdfs in parallel, distcp failed

2013-07-11 Thread பாலாஜி நாராயணன்
On 11 July 2013 06:27, Hao Ren h@claravista.fr wrote: Hi, I am running a hdfs on Amazon EC2 Say, I have a ftp server where stores some data. I just want to copy these data directly to hdfs in a parallel way (which maybe more efficient). I think hadoop distcp is what I need.

Re: CompositeInputFormat

2013-07-11 Thread Jay Vyas
Map Side joins will use the CompositeInputFormat. They will only really be worth doing if one data set is small, and the other is large. This is a good example : http://www.congiu.com/joins-in-hadoop-using-compositeinputformat/ the trick is to google for CompositeInputFormat.compose() :)

RE: CompositeInputFormat

2013-07-11 Thread Botelho, Andrew
Sorry I should've specified that I need an example of CompositeInputFormat that uses the new API. The example linked below uses old API objects like JobConf. Any known examples of CompositeInputFormat using the new API? Thanks in advance, Andrew From: Jay Vyas [mailto:jayunit...@gmail.com]

Staging directory ENOTDIR error.

2013-07-11 Thread Jay Vyas
Hi , I'm getting an ungoogleable exception, never seen this before. This is on a hadoop 1.1. cluster... It appears that its permissions related... Any thoughts as to how this could crop up? I assume its a bug in my filesystem, but not sure. 13/07/11 18:39:43 ERROR security.UserGroupInformation:

Re: Issues Running Hadoop 1.1.2 on multi-node cluster

2013-07-11 Thread siddharth mathur
I figured out the issue! The problem was in the permission to rum Hadoop scripts from root user. I create a dedicated hadoop user to rum hadoop cluster but one of the time i accidentally started hadoop from root. Hence, some of the permissions of hadoop scripts changed. The solution is to again

RE: CompositeInputFormat

2013-07-11 Thread Devaraj k
Hi Andrew, You could make use of hadoop data join classes to perform the join or you can refer these classes for better idea to perform join. http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-tools/hadoop-datajoin Thanks Devaraj k From: Botelho, Andrew [mailto:andrew.bote...@emc.com]

RE: Staging directory ENOTDIR error.

2013-07-11 Thread Devaraj k
Hi Jay, Here client is trying to create a staging directory in local file system, which actually should create in HDFS. Could you check whether do you have configured fs.defaultFS configuration in client with the HDFS. Thanks Devaraj k From: Jay Vyas [mailto:jayunit...@gmail.com] Sent: