RE: Problem setting up Hadoop security with active directory using one-way cross-realm configuration

2012-10-15 Thread Guillaume Polaert
Hi Ivan, Did you solve your problem ? I've the same issue. I can run Hadoop commands after a kinit with a local principal (@CLUSTER.HADOOP.DEV) but it doesn't work with AD user (@AD.HADOOP.DEV). Could you help me ? Thanks Guillaume Polaert | Cyrès Conseil -Message d'origine- De :

wait at the end of job

2012-10-15 Thread Harun Raşit Er
I have a windows hadoop cluster consists of 8 slaves 1 master node. My hadoop program is a collection of recursive jobs. I create 14 map, 14 reduce tasks in each job. My files are up to 10mb. My problem is that all jobs are waiting at the end of job. Map %100 Reduce %100 seen on command prompt

RE: Problem setting up Hadoop security with active directory using one-way cross-realm configuration

2012-10-15 Thread Guillaume Polaert
It's working. I haven't configured the property namehadoop.security.auth_to_local/name for AD REALM. Guillaume Polaert | Cyrès Conseil -Message d'origine- De : Guillaume Polaert [mailto:gpola...@cyres.fr] Envoyé : lundi 15 octobre 2012 12:08 À : ivan.fr...@gmail.com;

Re: 33%, 66%, and 100% *reducer* optimization

2012-10-15 Thread Vinod Kumar Vavilapalli
Reduce has three phases - shuffle, sort and reduce. So, 33% would imply the shuffle phase end, and 66% would refer to the end of sort phase. Thanks, +Vinod On Oct 15, 2012, at 2:32 PM, Jay Vyas wrote: Hi guys ! We all know that there are major milestones in reducers (33%, 66%) In

Re: Logistic regression package on Hadoop

2012-10-15 Thread Rajesh Nikam
Hi Harsh, Thanks for giving link for sgd from mahout. I have asked question on issue with using sgd. Below is description of it. Ted Dunning has mentioned their may be some issue with data encoding. However I am not able to point issue. Could you please let me know what is issue its format or

Re: Logistic regression package on Hadoop

2012-10-15 Thread Bertrand Dechoux
Hi Rajesh, You may want to use the mahout mailing list for mahout related question. http://mahout.apache.org/mailinglists.html Regards Bertrand On Mon, Oct 15, 2012 at 2:34 PM, Rajesh Nikam rajeshni...@gmail.com wrote: Hi Harsh, Thanks for giving link for sgd from mahout. I have asked

hadoop 1.0.3 configurations

2012-10-15 Thread Adrian Acosta Mitjans
hi, I want to know how I can configure hadoop if I want to use in the differents datanode different partitions. 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu

Re: hadoop 1.0.3 configurations

2012-10-15 Thread Mohammad Tariq
Try Google. Or you can go here : http://cloudfront.blogspot.in/2012/07/how-to-configure-hadoop.html Regards, Mohammad Tariq On Mon, Oct 15, 2012 at 1:39 PM, Adrian Acosta Mitjans amitj...@estudiantes.uci.cu wrote: hi, I want to know how I can configure hadoop if I want to use in the

Hadoop - the irony

2012-10-15 Thread Kartashov, Andy
Gents, Let’s not forget about fun. This is an awesome parody clip on Hadoop. Funny, yet quite informative: http://www.youtube.com/watch?v=hEqQMLSXQlY Rgds, AK NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use,

Re: Hadoop - the irony

2012-10-15 Thread Joseph Chiu
That hit too close to home... On Mon, Oct 15, 2012 at 8:48 AM, Kartashov, Andy andy.kartas...@mpac.cawrote: Gents, Let’s not forget about fun. This is an awesome parody clip on Hadoop. Funny, yet quite informative: http://www.youtube.com/watch?v=hEqQMLSXQlY Rgds, AK NOTICE: This e-mail

Re: Hadoop - the irony

2012-10-15 Thread Patai Sangbutsarakum
Thanks for sharing; it brings big smile on my face in horrible monday morning. On Mon, Oct 15, 2012 at 8:56 AM, Joseph Chiu joec...@joechiu.com wrote: That hit too close to home... On Mon, Oct 15, 2012 at 8:48 AM, Kartashov, Andy andy.kartas...@mpac.ca wrote: Gents, Let’s not forget

Re: Hadoop - the irony

2012-10-15 Thread Matthew John
Truly awesome! Thanks for sharing. :) On Mon, Oct 15, 2012 at 9:48 PM, Patai Sangbutsarakum silvianhad...@gmail.com wrote: Thanks for sharing; it brings big smile on my face in horrible monday morning. On Mon, Oct 15, 2012 at 8:56 AM, Joseph Chiu joec...@joechiu.com wrote: That hit too

PriorityQueueWritable

2012-10-15 Thread Aseem Anand
Hi, Is anyone familiar with a PriorityQueueWritable to be used to pass data from mapper to reducers ? Regards, Aseem

problem using s3 instead of hdfs

2012-10-15 Thread Parth Savani
Hello, I am trying to run hadoop on s3 using distributed mode. However I am having issues running my job successfully on it. I get the following error I followed the instructions provided in this article - http://wiki.apache.org/hadoop/AmazonS3 I replaced the fs.default.name value in my

Re: Reading Sequence File from Hadoop Distributed Cache ..

2012-10-15 Thread Mark Olimpiati
I'll try that thanks for the suggestion Steve! Mark On Fri, Oct 12, 2012 at 11:27 AM, Steve Loughran ste...@hortonworks.comwrote: On 11 October 2012 20:53, Mark Olimpiati markq2...@gmail.com wrote: Thanks for the reply Harsh, but as I said I tried locally too by using the following:

Re: PriorityQueueWritable

2012-10-15 Thread Chris Nauroth
I think it would work, but I'm wondering if it would be easier for your application to restructure the keys emitted from the mapper tasks so that you can take advantage of the sorting inherently done during the shuffle. For each reduce task, your reducer code will receive keys emitted from

Re: PriorityQueueWritable

2012-10-15 Thread Chris Nauroth
Also, another advantage in trying to make use of the shuffle/sort is that your sorted list can grow beyond the size of memory. A risk in trying to pack this data into a sorted ArrayWritable is that the list would grow too large to fit in memory. Thanks, --Chris On Mon, Oct 15, 2012 at 11:37 AM,

GroupingComparator

2012-10-15 Thread Alberto Cordioli
Hi all, a very strange thing is happening with my hadoop program. My map simply emits tuples with a custom object as key (which implement WritableComparable). The object is made of 2 fields, and I implement my partitioner and groupingclass in such a way that only the first field is taken into

Re: final the dfs.replication and fsck

2012-10-15 Thread Chris Nauroth
Hello Patai, Has your configuration file change been copied to all nodes in the cluster? Are there applications connecting from outside of the cluster? If so, then those clients could have separate configuration files or code setting dfs.replication (and other configuration properties). These

Re: final the dfs.replication and fsck

2012-10-15 Thread Harsh J
Hey Chris, The dfs.replication param is an exception to the final config feature. If one uses the FileSystem API, one can pass in any short value they want the replication to be. This bypasses the configuration, and the configuration (being per-file) is also client sided. The right way for an

RE: Issue when clicking on BrowseFileSystem

2012-10-15 Thread Kartashov, Andy
Andy, My /etc/hosts does say: 127.0.0.1 localhost.localdomain localhost Shall I delete this entry? The only reference to localhost is in: Core-site: property namefs.default.name/name valuehdfs://localhost:8020/value /property Mapred-site property

Suitability of HDFS for live file store

2012-10-15 Thread Matt Painter
Hi, I am a new Hadoop user, and would really appreciate your opinions on whether Hadoop is the right tool for what I'm thinking of using it for. I am investigating options for scaling an archive of around 100Tb of image data. These images are typically TIFF files of around 50-100Mb each and need

Re: Issue when clicking on BrowseFileSystem

2012-10-15 Thread Andy Isaacson
Oh, you've *configured* localhost as your hostname in the hadoop *.xml files. Yes, that'll result in the behavior you're seeing. I was assuming you were using a hostname that other machines can resolve. For example running on my laptop I use adit420 (which is what the laptop calls itself). When

RE: Issue when clicking on BrowseFileSystem

2012-10-15 Thread Kartashov, Andy
Andy, Thanks. Glad I asked. I run Hadoop in pseudo-distrib on amazon instance in the cloud. Shall I change localhost in both core-site and mapred-site to my my-host-name? p.s. I can see Namenode status through Web-based interface using url: http://my-host-name:50070 however cannot access

Re: Suitability of HDFS for live file store

2012-10-15 Thread Brock Noland
Hi, Generally I do not see a problem with your plan of using HDFS to store these files, assuming they are updated rarely if ever. Hadoop is traditionally a batch system and MapReduce largely remains a batch system. I'd argue this because minimum job latencies are in the seconds range. HDFS,

Re: Suitability of HDFS for live file store

2012-10-15 Thread Harsh J
Hey Matt, What do you mean by 'real-time' though? While HDFS has pretty good contiguous data read speeds (and you get N x replicas to read from), if you're looking to cache frequently accessed files into memory then HDFS does not natively have support for that. Otherwise, I agree with Brock,

Re: Suitability of HDFS for live file store

2012-10-15 Thread Matt Painter
Thanks guys; really appreciated. I was deliberately vague about the notion of real-time because I didn't know what the metrics are that made Hadoop be considered a batch system - if that makes sense! Essentially, the speed of access to the files stored in HDFS needs to be comparable to files

Re: Suitability of HDFS for live file store

2012-10-15 Thread Jay Vyas
Seems like a heavyweight solution unless you are actually processing the images? Wow, no mapreduce, no streaming writes, and relatively small files. Im surprised that you are considering hadoop at all ? Im surprised there isnt a simpler solution that uses redundancy without all the daemons and

Example of secondary sort using Avro data.

2012-10-15 Thread Ravi P
Hello Group, Are there any sample code/documentation available on writing Map-reduce jobs with secondary sort using Avro data?--Thanks,Ravi

Re: Example of secondary sort using Avro data.

2012-10-15 Thread Harsh J
Hi Ravi, Avro questions are best asked at user@avro lists. I've moved your question there. Take a look at Jacob's responses at http://search-hadoop.com/m/woY9Gz8Qyz1 for a detailed take on how to setup the comparators. On Tue, Oct 16, 2012 at 1:54 AM, Ravi P hadoo...@outlook.com wrote: Hello

Re: GroupingComparator

2012-10-15 Thread Alberto Cordioli
Hi Dave, thanks for your reply. Now it's more clear; in fact the code that I wrote is inspired to the old api, where the behavior is another. So, how can I achieve the same behavior as the old api? I need the second field of the first key object to stay the same among the iterations, in order to

RE: Example of secondary sort using Avro data.

2012-10-15 Thread Ravi P
Harsh - thanks for the link, but this question is related to Hadoop secondary sort, so I emailed it to hadoop user group.I think other people who have faced same issue in hadoop user group will be able to answer this.I appreciate your promptness. From: ha...@cloudera.com Date: Tue, 16 Oct

Re: final the dfs.replication and fsck

2012-10-15 Thread Patai Sangbutsarakum
Thanks Harsh, dfs.replication.max does do the magic!! On Mon, Oct 15, 2012 at 1:19 PM, Chris Nauroth cnaur...@hortonworks.com wrote: Thank you, Harsh. I did not know about dfs.replication.max. On Mon, Oct 15, 2012 at 12:23 PM, Harsh J ha...@cloudera.com wrote: Hey Chris, The

Re: Fair scheduler.

2012-10-15 Thread Patai Sangbutsarakum
Thanks for input, I am reading the document; i forget to mention that i am on cdh3u4. If you point your poolname property to mapred.job.queue.name, then you can leverage the Per-Queue ACLs Is that mean if i plan to 3 pools of fair scheduler, i have to configure 3 queues of capacity scheduler.

Re: Suitability of HDFS for live file store

2012-10-15 Thread Goldstone, Robin J.
If the goal is simply an alternative to SAN for cost-effective storage of large files you might want to take a look at Gluster. It is an open source scale-out distributed filesystem that can utilize local storage. Also, it has distributed metadata and a POSIX interface and can be accessed

Re: possible resource leak in capacity scheduler

2012-10-15 Thread Radim Kolar
updating to trunk fixed this, or at least i can not reproduce it anymore.

Re: Suitability of HDFS for live file store

2012-10-15 Thread Ted Dunning
If you are going to mention commercial distros, you should include MapR as well. Hadoop compatible, very scalable and handles very large numbers of files in a Posix-ish environment. On Mon, Oct 15, 2012 at 1:35 PM, Brian Bockelman bbock...@cse.unl.eduwrote: Hi, We use HDFS to process data

Re: Fair scheduler.

2012-10-15 Thread Harsh J
Hi Patai, Reply inline. On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum silvianhad...@gmail.com wrote: Thanks for input, I am reading the document; i forget to mention that i am on cdh3u4. That version should have the support for all of this. If you point your poolname property to

Re: possible resource leak in capacity scheduler

2012-10-15 Thread Vinod Kumar Vavilapalli
Which version are you running? Can you enabled debug logging for RM and see what's happening? Thanks, +Vinod Kumar Vavilapalli Hortonworks Inc. http://hortonworks.com/ On Oct 15, 2012, at 2:44 AM, Radim Kolar wrote: i have simple 2 node cluster. one node with 2 GB and second with 1 GB RAM

Re: Issue when clicking on BrowseFileSystem

2012-10-15 Thread Andy Isaacson
On Mon, Oct 15, 2012 at 1:00 PM, Kartashov, Andy andy.kartas...@mpac.ca wrote: I run Hadoop in pseudo-distrib on amazon instance in the cloud. Shall I change localhost in both core-site and mapred-site to my my-host-name? It depends on the desired behavior, but generally it's easiest if you

Re: Suitability of HDFS for live file store

2012-10-15 Thread Vinod Kumar Vavilapalli
For your original use case, HDFS indeed sounded like an overkill. But once you start thinking of thumbnail generation, PDFs etc, MapReduce obviously fits the bill. If you wish to do stuff like streaming the stored digital films, clearly, you may want to move your serving somewhere else that

Re: final the dfs.replication and fsck

2012-10-15 Thread Patai Sangbutsarakum
Just want to share check if this is make sense. Job was failed to run after i restarted the namenode and the cluster stopped complain about under-replication. this is what i found in log file Requested replication 10 exceeds maximum 2 java.io.IOException: file

Re: DFS respond very slow

2012-10-15 Thread Vinod Kumar Vavilapalli
Try picking up a single operation say hadoop dfs -ls and start profiling. - Time the client JVM is taking to start. Enable debug logging on the client side by exporting HADOOP_ROOT_LOGGER=DEBUG,CONSOLE - Time between the client starting and the namenode audit logs showing the read request.

Re: Question about namenode HA

2012-10-15 Thread Todd Lipcon
Hi Liang, Answers inline below. On Sun, Oct 14, 2012 at 8:01 PM, 谢良 xieli...@xiaomi.com wrote: Hi Todd and other HA experts, I've two question: 1) why the zkfc is a seperate process, i mean, what's the primary design consideration that we didn't integrate zkfc features into namenode self

Re: DFS respond very slow

2012-10-15 Thread Andy Isaacson
Also, note that JVM startup overhead, etc, means your -ls time is not completely unreasonable. Using OpenJDK on a cluster of VMs, my hdfs dfs -ls takes 1.88 seconds according to time (and 1.59 seconds of user CPU time). I'd be much more concerned about your slow transfer times. On the same

Re: DFS respond very slow

2012-10-15 Thread Ted Dunning
Uhhh... Alexey, did you really mean that you are running 100 mega bit per second network links? That is going to make hadoop run *really* slowly. Also, putting RAID under any DFS, be it Hadoop or MapR is not a good recipe for performance. Not that it matters if you only have 10mega bytes per

Re: DFS respond very slow

2012-10-15 Thread Vinod Kumar Vavilapalli
I just realized one more thing. You mentioned disk is 700Gb RAID. How many disks overall? What RAID configuration? Usually we advocate JBOD with hadoop to avoid performance hits with RAID, and let HDFS itself take care of replication. May be you are running into this? Thanks, +Vinod On Oct

RE: Secure hadoop and group permission on HDFS

2012-10-15 Thread Zheng, Kai
Hi Koert Harsh, Regarding LdapGroupsMapping, I have questions: 1. Is it possible to use ShellBasedUnixGroupsMapping for Hadoop service principals/users, and LdapGroupsMapping for end user accounts? In our environment, normal end users (along with their groups info) for Hadoop cluster

Re: final the dfs.replication and fsck

2012-10-15 Thread Harsh J
Patai, My bad - that was on my mind but I missed noting it down on my earlier reply. Yes you'd have to control that as well. 2 should be fine for smaller clusters. On Tue, Oct 16, 2012 at 5:32 AM, Patai Sangbutsarakum silvianhad...@gmail.com wrote: Just want to share check if this is make