Hi Ivan,
Did you solve your problem ?
I've the same issue. I can run Hadoop commands after a kinit with a local
principal (@CLUSTER.HADOOP.DEV) but it doesn't work with AD user
(@AD.HADOOP.DEV).
Could you help me ?
Thanks
Guillaume Polaert | Cyrès Conseil
-Message d'origine-
De :
I have a windows hadoop cluster consists of 8 slaves 1 master node. My
hadoop program is a collection of recursive jobs. I create 14 map, 14
reduce tasks in each job. My files are up to 10mb.
My problem is that all jobs are waiting at the end of job. Map %100 Reduce
%100 seen on command prompt
It's working.
I haven't configured the property namehadoop.security.auth_to_local/name
for AD REALM.
Guillaume Polaert | Cyrès Conseil
-Message d'origine-
De : Guillaume Polaert [mailto:gpola...@cyres.fr]
Envoyé : lundi 15 octobre 2012 12:08
À : ivan.fr...@gmail.com;
Reduce has three phases - shuffle, sort and reduce.
So, 33% would imply the shuffle phase end, and 66% would refer to the end of
sort phase.
Thanks,
+Vinod
On Oct 15, 2012, at 2:32 PM, Jay Vyas wrote:
Hi guys !
We all know that there are major milestones in reducers (33%, 66%)
In
Hi Harsh,
Thanks for giving link for sgd from mahout.
I have asked question on issue with using sgd. Below is description of it.
Ted Dunning has mentioned their may be some issue with data encoding.
However I am not able to point issue. Could you please let me know what is
issue its format or
Hi Rajesh,
You may want to use the mahout mailing list for mahout related question.
http://mahout.apache.org/mailinglists.html
Regards
Bertrand
On Mon, Oct 15, 2012 at 2:34 PM, Rajesh Nikam rajeshni...@gmail.com wrote:
Hi Harsh,
Thanks for giving link for sgd from mahout.
I have asked
hi, I want to know how I can configure hadoop if I want to use in the
differents datanode different partitions.
10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
http://www.uci.cu
Try Google.
Or you can go here :
http://cloudfront.blogspot.in/2012/07/how-to-configure-hadoop.html
Regards,
Mohammad Tariq
On Mon, Oct 15, 2012 at 1:39 PM, Adrian Acosta Mitjans
amitj...@estudiantes.uci.cu wrote:
hi, I want to know how I can configure hadoop if I want to use in the
Gents,
Let’s not forget about fun. This is an awesome parody clip on Hadoop. Funny,
yet quite informative:
http://www.youtube.com/watch?v=hEqQMLSXQlY
Rgds,
AK
NOTICE: This e-mail message and any attachments are confidential, subject to
copyright and may be privileged. Any unauthorized use,
That hit too close to home...
On Mon, Oct 15, 2012 at 8:48 AM, Kartashov, Andy andy.kartas...@mpac.cawrote:
Gents,
Let’s not forget about fun. This is an awesome parody clip on Hadoop.
Funny, yet quite informative:
http://www.youtube.com/watch?v=hEqQMLSXQlY
Rgds,
AK
NOTICE: This e-mail
Thanks for sharing; it brings big smile on my face in horrible monday morning.
On Mon, Oct 15, 2012 at 8:56 AM, Joseph Chiu joec...@joechiu.com wrote:
That hit too close to home...
On Mon, Oct 15, 2012 at 8:48 AM, Kartashov, Andy andy.kartas...@mpac.ca
wrote:
Gents,
Let’s not forget
Truly awesome! Thanks for sharing. :)
On Mon, Oct 15, 2012 at 9:48 PM, Patai Sangbutsarakum
silvianhad...@gmail.com wrote:
Thanks for sharing; it brings big smile on my face in horrible monday
morning.
On Mon, Oct 15, 2012 at 8:56 AM, Joseph Chiu joec...@joechiu.com wrote:
That hit too
Hi,
Is anyone familiar with a PriorityQueueWritable to be used to pass data
from mapper to reducers ?
Regards,
Aseem
Hello,
I am trying to run hadoop on s3 using distributed mode. However I am
having issues running my job successfully on it. I get the following error
I followed the instructions provided in this article -
http://wiki.apache.org/hadoop/AmazonS3
I replaced the fs.default.name value in my
I'll try that thanks for the suggestion Steve!
Mark
On Fri, Oct 12, 2012 at 11:27 AM, Steve Loughran ste...@hortonworks.comwrote:
On 11 October 2012 20:53, Mark Olimpiati markq2...@gmail.com wrote:
Thanks for the reply Harsh, but as I said I tried locally too by using
the following:
I think it would work, but I'm wondering if it would be easier for your
application to restructure the keys emitted from the mapper tasks so that
you can take advantage of the sorting inherently done during the shuffle.
For each reduce task, your reducer code will receive keys emitted from
Also, another advantage in trying to make use of the shuffle/sort is that
your sorted list can grow beyond the size of memory. A risk in trying to
pack this data into a sorted ArrayWritable is that the list would grow too
large to fit in memory.
Thanks,
--Chris
On Mon, Oct 15, 2012 at 11:37 AM,
Hi all,
a very strange thing is happening with my hadoop program.
My map simply emits tuples with a custom object as key (which
implement WritableComparable).
The object is made of 2 fields, and I implement my partitioner and
groupingclass in such a way that only the first field is taken into
Hello Patai,
Has your configuration file change been copied to all nodes in the cluster?
Are there applications connecting from outside of the cluster? If so, then
those clients could have separate configuration files or code setting
dfs.replication (and other configuration properties). These
Hey Chris,
The dfs.replication param is an exception to the final config
feature. If one uses the FileSystem API, one can pass in any short
value they want the replication to be. This bypasses the
configuration, and the configuration (being per-file) is also client
sided.
The right way for an
Andy,
My /etc/hosts does say: 127.0.0.1 localhost.localdomain localhost
Shall I delete this entry?
The only reference to localhost is in:
Core-site:
property
namefs.default.name/name
valuehdfs://localhost:8020/value
/property
Mapred-site
property
Hi,
I am a new Hadoop user, and would really appreciate your opinions on
whether Hadoop is the right tool for what I'm thinking of using it for.
I am investigating options for scaling an archive of around 100Tb of image
data. These images are typically TIFF files of around 50-100Mb each and
need
Oh, you've *configured* localhost as your hostname in the hadoop
*.xml files. Yes, that'll result in the behavior you're seeing.
I was assuming you were using a hostname that other machines can
resolve. For example running on my laptop I use adit420 (which is
what the laptop calls itself). When
Andy,
Thanks. Glad I asked.
I run Hadoop in pseudo-distrib on amazon instance in the cloud. Shall I change
localhost in both core-site and mapred-site to my my-host-name?
p.s. I can see Namenode status through Web-based interface using url:
http://my-host-name:50070 however cannot access
Hi,
Generally I do not see a problem with your plan of using HDFS to store
these files, assuming they are updated rarely if ever. Hadoop is
traditionally a batch system and MapReduce largely remains a batch
system. I'd argue this because minimum job latencies are in the
seconds range. HDFS,
Hey Matt,
What do you mean by 'real-time' though? While HDFS has pretty good
contiguous data read speeds (and you get N x replicas to read from),
if you're looking to cache frequently accessed files into memory
then HDFS does not natively have support for that. Otherwise, I agree
with Brock,
Thanks guys; really appreciated.
I was deliberately vague about the notion of real-time because I didn't
know what the metrics are that made Hadoop be considered a batch system -
if that makes sense!
Essentially, the speed of access to the files stored in HDFS needs to be
comparable to files
Seems like a heavyweight solution unless you are actually processing the
images?
Wow, no mapreduce, no streaming writes, and relatively small files. Im
surprised that you are considering hadoop at all ?
Im surprised there isnt a simpler solution that uses redundancy without all
the
daemons and
Hello Group, Are there any sample code/documentation available on writing
Map-reduce jobs with secondary sort using Avro data?--Thanks,Ravi
Hi Ravi,
Avro questions are best asked at user@avro lists. I've moved your
question there.
Take a look at Jacob's responses at
http://search-hadoop.com/m/woY9Gz8Qyz1 for a detailed take on how to
setup the comparators.
On Tue, Oct 16, 2012 at 1:54 AM, Ravi P hadoo...@outlook.com wrote:
Hello
Hi Dave,
thanks for your reply. Now it's more clear; in fact the code that I
wrote is inspired to the old api, where the behavior is another.
So, how can I achieve the same behavior as the old api? I need the
second field of the first key object to stay the same among the
iterations, in order to
Harsh - thanks for the link, but this question is related to Hadoop secondary
sort, so I emailed it to hadoop user group.I think other people who have faced
same issue in hadoop user group will be able to answer this.I appreciate your
promptness.
From: ha...@cloudera.com
Date: Tue, 16 Oct
Thanks Harsh, dfs.replication.max does do the magic!!
On Mon, Oct 15, 2012 at 1:19 PM, Chris Nauroth cnaur...@hortonworks.com wrote:
Thank you, Harsh. I did not know about dfs.replication.max.
On Mon, Oct 15, 2012 at 12:23 PM, Harsh J ha...@cloudera.com wrote:
Hey Chris,
The
Thanks for input,
I am reading the document; i forget to mention that i am on cdh3u4.
If you point your poolname property to mapred.job.queue.name, then you
can leverage the Per-Queue ACLs
Is that mean if i plan to 3 pools of fair scheduler, i have to
configure 3 queues of capacity scheduler.
If the goal is simply an alternative to SAN for cost-effective storage of large
files you might want to take a look at Gluster. It is an open source scale-out
distributed filesystem that can utilize local storage. Also, it has distributed
metadata and a POSIX interface and can be accessed
updating to trunk fixed this, or at least i can not reproduce it anymore.
If you are going to mention commercial distros, you should include MapR as
well. Hadoop compatible, very scalable and handles very large numbers of
files in a Posix-ish environment.
On Mon, Oct 15, 2012 at 1:35 PM, Brian Bockelman bbock...@cse.unl.eduwrote:
Hi,
We use HDFS to process data
Hi Patai,
Reply inline.
On Tue, Oct 16, 2012 at 2:57 AM, Patai Sangbutsarakum
silvianhad...@gmail.com wrote:
Thanks for input,
I am reading the document; i forget to mention that i am on cdh3u4.
That version should have the support for all of this.
If you point your poolname property to
Which version are you running? Can you enabled debug logging for RM and see
what's happening?
Thanks,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/
On Oct 15, 2012, at 2:44 AM, Radim Kolar wrote:
i have simple 2 node cluster. one node with 2 GB and second with 1 GB RAM
On Mon, Oct 15, 2012 at 1:00 PM, Kartashov, Andy andy.kartas...@mpac.ca wrote:
I run Hadoop in pseudo-distrib on amazon instance in the cloud. Shall I
change localhost in both core-site and mapred-site to my my-host-name?
It depends on the desired behavior, but generally it's easiest if you
For your original use case, HDFS indeed sounded like an overkill. But once you
start thinking of thumbnail generation, PDFs etc, MapReduce obviously fits the
bill.
If you wish to do stuff like streaming the stored digital films, clearly, you
may want to move your serving somewhere else that
Just want to share check if this is make sense.
Job was failed to run after i restarted the namenode and the cluster
stopped complain about under-replication.
this is what i found in log file
Requested replication 10 exceeds maximum 2
java.io.IOException: file
Try picking up a single operation say hadoop dfs -ls and start profiling.
- Time the client JVM is taking to start. Enable debug logging on the client
side by exporting HADOOP_ROOT_LOGGER=DEBUG,CONSOLE
- Time between the client starting and the namenode audit logs showing the
read request.
Hi Liang,
Answers inline below.
On Sun, Oct 14, 2012 at 8:01 PM, 谢良 xieli...@xiaomi.com wrote:
Hi Todd and other HA experts,
I've two question:
1) why the zkfc is a seperate process, i mean, what's the primary design
consideration that we didn't integrate zkfc features into namenode self
Also, note that JVM startup overhead, etc, means your -ls time is not
completely unreasonable. Using OpenJDK on a cluster of VMs, my hdfs
dfs -ls takes 1.88 seconds according to time (and 1.59 seconds of
user CPU time).
I'd be much more concerned about your slow transfer times. On the
same
Uhhh... Alexey, did you really mean that you are running 100 mega bit per
second network links?
That is going to make hadoop run *really* slowly.
Also, putting RAID under any DFS, be it Hadoop or MapR is not a good recipe
for performance. Not that it matters if you only have 10mega bytes per
I just realized one more thing. You mentioned disk is 700Gb RAID. How many
disks overall? What RAID configuration? Usually we advocate JBOD with hadoop to
avoid performance hits with RAID, and let HDFS itself take care of replication.
May be you are running into this?
Thanks,
+Vinod
On Oct
Hi Koert Harsh,
Regarding LdapGroupsMapping, I have questions:
1. Is it possible to use ShellBasedUnixGroupsMapping for Hadoop service
principals/users, and LdapGroupsMapping for end user accounts?
In our environment, normal end users (along with their groups info) for Hadoop
cluster
Patai,
My bad - that was on my mind but I missed noting it down on my earlier
reply. Yes you'd have to control that as well. 2 should be fine for
smaller clusters.
On Tue, Oct 16, 2012 at 5:32 AM, Patai Sangbutsarakum
silvianhad...@gmail.com wrote:
Just want to share check if this is make
49 matches
Mail list logo