You can create a list of files for each type and use MultipleInputs[1].
https://hadoop.apache.org/docs/r2.6.3/api/org/apache/hadoop/mapreduce/lib/input/MultipleInputs.html
On Thu, Jun 22, 2017 at 10:30 PM, vivek wrote:
> Thanks!
>
>
> On Jun 22, 2017 20:15, "Erik
Thanks!
On Jun 22, 2017 20:15, "Erik Krogen" wrote:
> You would need to write a custom InputFormat which would return an
> appropriate RecordReader based on the file format involved in each
> InputSplit. You can have InputFormat#getSplits load InputSplits for both
> file
I've described our use case here on Microsoft's forums for SQL Server. I'm
hoping someone out there has used this technology:
+common-user
On Mar 7, 2016, at 3:42 PM, Hitesh Shah wrote:
>
> On Mar 7, 2016, at 1:50 PM, José Luis Larroque wrote:
>
>> Hi again guys, i could, finally, find what the issue was!!!
>>
>>
>
>>
>> mapreduce.map.java.opts
>> 256
>>
>>
>>
>>
I'm not sure what they are trying to say with persistent session. A
session in zookeeper has a timeout associated with it. If the server
doesn't hear from the client within the timeout period the session is
expired, and all ephemeral nodes associated with the session are
deleted. This is what
Replying to my own thread here.
While we got a good handle on the IP based hadoop cluster by using the
settings mentioned above, we are now upgrading the Cloudera 5.1.0 packages
and Yarn.
So far most everything seemed to work well, except that for some reason Yarn
insists on making use of DNS,
harsh, those are just javadocs. i'm talking about the full documentations
(see original post).
On Tue, Jul 29, 2014 at 2:17 PM, Harsh J ha...@cloudera.com wrote:
Precompiled docs are available in the archived tarballs of these
releases, which you can find on:
Jane,
The tarball includes generated release documentation pages as well.
Did you download and look inside?
~ tar tf hadoop-0.22.0.tar.gz | grep cluster_setup | grep html
hadoop-0.22.0/common/docs/cluster_setup.html
On Wed, Jul 30, 2014 at 11:24 PM, Jane Wayne jane.wayne2...@gmail.com wrote:
Precompiled docs are available in the archived tarballs of these
releases, which you can find on:
https://archive.apache.org/dist/hadoop/common/
On Tue, Jul 29, 2014 at 1:36 AM, Jane Wayne jane.wayne2...@gmail.com wrote:
where can i get the old hadoop documentation (e.g. cluster setup, xml
I think your best bet might be to check out a particular release tag for 0.22
release and checking the docs out there. Perhaps you might want to run 'ant
docs' of whatever the target used to be back then.
Cos
On Mon, Jul 28, 2014 at 04:06PM, Jane Wayne wrote:
where can i get the old hadoop
nevermind i resolved it. the solution was bad instructions on the hadoop
site or unclear/misleading instructions.
this is NOT the way to start slave datanode daemons (NOTICE THE SINGULAR
DAEMON).
$HADOOP_PREFIX/sbin/hadoop-daemon.sh --config $HADOOP_CONF_DIR --script
hdfs start datanode
this is
Can you check your dmesg | tail output to see if there are any error
messages from the HDFS fuse client?
On Sat, May 3, 2014 at 11:44 PM, Preetham Kukillaya pkuki...@gmail.com wrote:
Hi,
I m also getting the same error i.e. ?- ? ? ? ?? hdfs
after mounting the hadoo file
Like you said, it depends both on the kind of network you have and the type of
your workload.
Given your point about S3, I'd guess your input files/blocks are not large
enough that moving code to data trumps moving data itself to the code. When
that balance tilts a lot, especially when moving
AM
To: common-user@hadoop.apache.org
Subject: Re: Data Locality Importance
Like you said, it depends both on the kind of network you have and the type
of your workload.
Given your point about S3, I'd guess your input files/blocks are not large
enough that moving code to data trumps moving
somehow looks like hive is not able to find hadoop libs
On Fri, Mar 7, 2014 at 11:48 PM, Manish manishbh...@rocketmail.com wrote:
Please look into the below issue help.
Original Message
Subject:Error when connecting Hive
Date: Fri, 07 Mar 2014 20:51:25 +0530
Yes. JobTracker and TaskTracker are gone from all the 2.x release lines.
MapReduce is an application on top of YARN. That is per job - launches, starts
and finishes after it is done with its work. Once it is done, you can go look
at it in the MapReduce specific JobHistoryServer.
+Vinod
On
when i go to the job history server
http://hadoop-cluster:19888/jobhistory
i see no map reduce job there. i ran 3 simple mr jobs successfully. i
verified by the console output and hdfs output directory.
all i see on the UI is: No data available in table.
any ideas?
unless there is a
ok, the reason why hadoop jobs were not showing up was because i did not
enable mapreduce to be run as a yarn application.
On Thu, Mar 6, 2014 at 11:45 PM, Jane Wayne jane.wayne2...@gmail.comwrote:
when i go to the job history server
http://hadoop-cluster:19888/jobhistory
i see no map
Hi!
At least I'm not alone with this issue. I'd like to create this ticket as I
incidently ran into this again today with a few nodes. :(
On which hadoop-version did you ran into this issue? I guess its not
version related.
It will enhance the ticket if it would not only affect the old hadoop
Hi Gary,
It looks like port 8080 is already taken on your machine by XDB.
You should shut XDB down to free up port 8080 and re-launch the Sandbox VM.
Then you should be able to log in to Ambari using ambari/ambari.
Yusaku
On Sat, Jan 4, 2014 at 3:19 PM, Ted Yu yuzhih...@gmail.com wrote
Hi Sandy,
Thank you so much for the immediate response.
Is there a way to make it happen? Any suggestions will be greatly
appreciated.
Also, can you tell me how any communication happens in the cluster, be it
between RM and nodes or any scenario?
Thanks,
Santosh
PhD Candidate
USF, Tampa, FL
I believe that you could do that through Puppet, or any tool that can
remotely execute some command (e.g. pssh).
2013/12/8 Jay Vyas jayunit...@gmail.com
I want to put a file on all nodes of my cluster, that is locally readable
(not in HDFS).
Assuming that i cant gaurantee a FUSE mount or
We ran into issue as well on our cluster.
+1 for JIRA for that
Alexander, could you please create a JIRA in
https://issues.apache.org/jira/browse/HDFS for that (it is your
observation, so that you should get credit ;). Otherwise, I can do that.
2013/2/12 Alexander Fahlke
Yup , we figured it out eventually.
The artifacts now use the test-jar directive which creates a jar file that you
can reference in mvn using the type tag in your dependencies.
However, fyi, I haven't been able to successfully google for the quintessential
classes in the hadoop test libs like
[Cc bigtop-dev@]
We have stack tests as a part of Bigtop project. We don't do fault injection
tests
like you describe just yet, but that be a great contribution to the project.
Cos
On Wed, Oct 16, 2013 at 02:12PM, hdev ml wrote:
Hi all,
Are there automated tests available for testing
Thanks Ravi. The number of nodes isn't a lot but the size is rather large.
Each data node has about 14-16T (560-640T).
For the datanode block scanner, how can increase its Current scan rate
limit KBps ?
On Sun, Oct 6, 2013 at 11:09 PM, Ravi Prakash ravi...@ymail.com wrote:
Please look at
From: Rita rmorgan...@gmail.com
To: common-user@hadoop.apache.org common-user@hadoop.apache.org;
Ravi Prakash ravi...@ymail.com
Sent: Monday, October 7, 2013 5:55 AM
Subject: Re: datanode tuning
Thanks Ravi. The number of nodes isn't a lot but the size is rather large
I got hbase working. The trick was to properly configure the fs.defaultFS and
hbase.rootdir. Other than that hbase does not seem to care about hostname vs
ip address.
Note that I use python to fill my templates, hence the %(hadoop.dfs.master)s
syntax. Here hadoop.dfs.master is an ip address and
Please look at dfs.heartbeat.interval and
dfs.namenode.heartbeat.recheck-interval
40 datanodes is not a large cluster IMHO and the Namenode is capable of
managing 100 times more datanodes.
From: Rita rmorgan...@gmail.com
To: common-user@hadoop.apache.org
The only security is the one provided by the slave/master whitelists (more
dumb proof than attack proof, but still useful to avoid clusters talking to
each other accidentally).
I want to automate the deployment of hadoop clusters through Glu (from
LinkedIn) since we already use it to do single
I've check it out and it works like that. The problem is, if the two racks
have not the same capacity, one will have the disk space filled up much
faster than the other (that's what I'm seeing).
If one rack (rack A) has 2 servers of 8 cores with 4 reduce slots each and
the other rack (rack B) has
Marc,
The rack aware script is an artificial concept. Meaning you can tell which
machine is in which rack and that may or may not reflect where the machine is
actually located.
The idea is to balance the number of nodes in the racks, at least on paper. So
you can have 14 machines in rack 1,
Doing that will balance the block writing but I think here you loose the
concept of physical rack awareness.
Let's say you have 2 physical racks, one with 2 servers and one with 4. If
you artificially tell hadoop that one rack has 3 servers and the other 3 you
are loosing the concept of rack
And that's the rub.
Rack awareness is an artificial construct.
You want to fix it and match the real world, you need to balance the racks
physically.
Otherwise you need to rewrite load balancing to take in to consideration the
number and power of the nodes in the rack.
The short answer, it's
.
Thanks,
Junping
- Original Message -
From: Michael Segel michael_se...@hotmail.com
To: common-user@hadoop.apache.org
Cc: hadoop-u...@lucene.apache.org
Sent: Thursday, October 3, 2013 8:23:58 PM
Subject: Re: rack awarness unexpected behaviour
Marc,
The rack aware script is an artificial
Is security on? I'm not entirely sure (and I think it might be illuminating to
the rest of us when you work this out, so please email back when you do), but I
am guessing that a code change may be required. I think I remember someone
telling me that hostnames are reverse-lookup'd to verify
Hey Nikhil,
Just tried what you asked for and yes there are files and folders in
c:/Hadoop/name (folders: current, image, previous.checkpoint, in_use.lock)
and also tried with the firewall is disabled.
Just want to let you know one more thing that when on Jobtracker UI, I
click on '0' under
Hi Nikhil,
Appreciate your quick response on this, but the issue still continues. I
believe I have covered all the pointers you have mentioned. Still I am
pasting the portions of the documents so that you can verify.
1. /etc/hosts file, localhost should not be commented, and add ip address.
The
Hi,
I am facing an issue where the map job is stuck at map 0% reduce 0%.
I have installed Hadoop version 1.2.1 and am trying to run on my windows 8
machine using cygwin in pseudo distribution mode. I have followed the
instruction at: http://hadoop.apache.org/docs/stable/single_node_setup.html
Jobs run on the whole cluster. After rebalancing everything is properly
allocated. Then I start running jobs using all the slots of the 2 racks and
the problem starts to happen.
Maybe I'm missing something. When using the rack awareness, do you have to
specify to the jobs to run in slots form both
When you rebalance, the block is fully written, so the writer locality does
not have to be taken into account (there is no writer anymore), hence it
can rebalance across the racks. That's why jobs asymmetry was the easy
guess. What's your hadoop version by the way? I remember a bug around rack
I'm on cdh3u4 (0.20.2), gonna try to read a bit on this bug
--
View this message in context:
http://lucene.472066.n3.nabble.com/rack-awareness-unexpected-behaviour-tp4086029p4086049.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Hi Rohit
Did you succeed in running R script from Oozie action?
If so can you share you action configuration?
I am trying to figure out how to run a R script from Oozie
--
View this message in context:
I'm not aware of a bug in 0.20.2 that would not honor the Rack
Awareness, but have you done the two below checks as well?
1. Ensuring JT has the same rack awareness scripts and configuration
so it can use it for scheduling, and,
2. Checking if the map and reduce tasks are being evenly spread
Rack aware is an artificial concept.
Meaning you can define where a node is regardless of is real position in the
rack.
Going from memory, and its probably been changed in later versions of the
code...
Isn't the replication... Copy on node 1, copy on same rack, third copy on
different rack?
, August 22, 2013 6:57:15 PM
Subject: Re: rack awarness unexpected behaviour
Rack aware is an artificial concept.
Meaning you can define where a node is regardless of is real position in the
rack.
Going from memory, and its probably been changed in later versions of the
code...
Isn't
thanks, i also tried using HADOOP_PREFIX but that didn't work. I still get
the same error: Could not find or load main class
org.apache.hadoop.hdfs.server.namenode.NameNode
btw, how do we install hadoop-common and hadoop-hdfs?
also, according to this link,
I don't think you ought to be using HADOOP_HOME anymore.
Try unset HADOOP_HOME and then export HADOOP_PREFIX=/opt/hadoop
and retry the NN command.
On Sun, Aug 11, 2013 at 8:50 AM, Jane Wayne jane.wayne2...@gmail.com wrote:
hi,
i have downloaded and untarred hadoop v0.23.9. i am trying to set
any particular reason the 1.1.2 releases were pulled from the mirrors (so
quickly)?
On Aug 4, 2013, at 2:08 PM, Matt Foley ma...@apache.org wrote:
I'm happy to announce that Hadoop version 1.2.1 has passed its release vote
and is now available. It has 18 bug fixes and patches over the
It's still available in archive at
http://archive.apache.org/dist/hadoop/core/. I can put it back on the main
download site if desired, but the model is that the main download site is
for stuff we actively want people to download. Here is the relevant quote
from
regardless of what was written in a wiki somewhere, it is a bit aggressive I
think.
there are a fair number of automated things that link to the former stable
releases that are now broken as they weren't given a grace period to cut over.
not the end of the world or anything. just a bit of a
Chris, there is a stable link for exactly this purpose:
http://www.apache.org/dist/hadoop/core/stable/
--Matt
On Mon, Aug 5, 2013 at 11:43 AM, Chris K Wensel ch...@wensel.net wrote:
regardless of what was written in a wiki somewhere, it is a bit aggressive
I think.
there are a fair number
which will include Windows native compatibility.
My apologies, this was incorrect. Windows has only been integrated to
trunk and branch-2.1.
Thanks,
--Matt
On Sun, Aug 4, 2013 at 2:08 PM, Matt Foley ma...@apache.org wrote:
I'm happy to announce that Hadoop version 1.2.1 has passed its
Hi Manish,
Can you check how many data node processes are running really in the machine
using the command 'jps' or 'ps'.
Thanks
Devaraj k
-Original Message-
From: Manish Bhoge [mailto:manishbh...@rocketmail.com]
Sent: 25 July 2013 12:29
To: common-user@hadoop.apache.org
Subject:
in hdfs-site.xml?
From: Devaraj k devara...@huawei.com
To: common-user@hadoop.apache.org common-user@hadoop.apache.org
Sent: Thursday, 25 July 2013 12:41 PM
Subject: RE: Multiple data node and namenode ?
Hi Manish,
Can you check how many data node processes
datanode' shell command to know how many
datanode processes are running at this moment.
Thanks
Devaraj k
-Original Message-
From: Manish Bhoge [mailto:manishbh...@rocketmail.com]
Sent: 25 July 2013 12:56
To: common-user@hadoop.apache.org
Subject: Re: Multiple data node and namenode
' shell command to know how many
datanode processes are running at this moment.
Thanks
Devaraj k
-Original Message-
From: Manish Bhoge [mailto:manishbh...@rocketmail.com]
Sent: 25 July 2013 12:56
To: common-user@hadoop.apache.org
Subject: Re: Multiple data node and namenode
To: common-user@hadoop.apache.org
Subject: Re: Multiple data node and namenode ?
Yes I have change the hostname and restarted datanode
Sent via Rocket from my HTC
- Reply message -
From: Devaraj k devara...@huawei.com
To: common-user@hadoop.apache.org common-user@hadoop.apache.org
Subject
We can have it if u r able to process
Sent from my iPhone
On Jul 24, 2013, at 8:12 PM, Kasi Subrahmanyam kasisubbu...@gmail.com wrote:
Hi,
I am able to find that we have definite API for processing iages in hadoop
using HIPI.
Why dont we have the same for videos?
Thanks,
Subbu
Hi,
You could send the file meta info to the map function as key/value through
the split, and then you can read the entire file in your map function.
Thanks
Devaraj k
-Original Message-
From: Kasi Subrahmanyam [mailto:kasisubbu...@gmail.com]
Sent: 11 July 2013 13:38
To:
Hi,
It seems mahout-examples-0.7-job.jar is depending on other jars/classes.
While running Job Tasks it is not able to find those classes in the
classpath and failing those tasks.
You need to provide the dependent jar files while submitting/running Job.
Thanks
Devaraj k
--
View this
Hi Kishore,
As per the exception given, Node Manager is getting excluded. It might be
happening that you have configured the Node Manager in excluded file using
this configuration in Resource Manager.
Could you check this configuration in RM, is it configured with any file
and that file
Hi Kasi,
I think MapR mailing list is the better place to ask this question.
Thanks
Devaraj k
From: Kasi Subrahmanyam [mailto:kasisubbu...@gmail.com]
Sent: 04 July 2013 08:49
To: common-user@hadoop.apache.org; mapreduce-u...@hadoop.apache.org
Subject: Output Directory not getting created
Hi,
Hi Harsh,
Awesome!! It worked. Thanks you so much. Actually, i went through that
ticket before but the above option was not mentioned there.
Additional Info for others: In the run configuration add
-Djava.security.krb5.realm=yourrealm -Djava.security.krb5.kdc=yourkdc in
VM arguments.
Thanks,
Anil,
Please try the options provided at
https://issues.apache.org/jira/browse/HADOOP-7489.
Essentially, pass JVM system properties (In Eclipse you'll edit the
Run Configuration for this) and add
-Djava.security.krb5.realm=yourrealm
-Djava.security.krb5.kdc=yourkdc and also ensure your Mac's
Nadine_RIOU http://fonio-bio.org/yahoo.com/bernard_blanchet.jpg
mapoun_prioux http://obsession.mu/yahoo.com/isabelle_maillard.jpeg
My reading on Capacity Scheduling is that it controls the number of jobs
scheduled at the level of the cluster.
My issue is not sharing at the level of the cluster - usually my job is the
only one running but rather at the level of
the individual machine.
Some of my jobs require more memory and
Yes, you're correct that the end-result is not going to be as static
as you expect it to be. FWIW, per node limit configs have been
discussed before (and even implemented + removed):
https://issues.apache.org/jira/browse/HADOOP-5170
On Fri, May 24, 2013 at 1:47 PM, Steve Lewis
Your problem seems to surround available memory and over-subscription. If
you're using a 0.20.x or 1.x version of Apache Hadoop, you probably want to
use the CapacityScheduler to address this for you.
I once detailed how-to, on a similar question here:
http://search-hadoop.com/m/gnFs91yIg1e
On
I Found it works with the following
You need to be in a thread as a PrivilegedExceptionAction
final String user = My Identity;
UserGroupInformation uig = UserGroupInformation.createRemoteUser(user);
try {
return uig.doAs(new PrivilegedExceptionActionReturnType() {
Hi Steve,
You can use fs.permissions.umask-modeto set the appropriate umask
From: Steve Lewis lordjoe2...@gmail.com
To: common-user common-user@hadoop.apache.org
Sent: Monday, May 20, 2013 9:33 AM
Subject: Default permissions for Hadoop output files
I am
Then you have a problem where the solution is more of people management and not
technical.
All of your servers should be using NTP. At a minimum, you have one server
that gets the time from a national (government) time server, and then have all
of the machines in that Data Center use that
Here is the issue -
1 - I am running a Java client on a machine unknown to the cluster - my
default name on this pc is
HYPERCHICKEN\local_admin - the name known to the cluster is slewis
2 Thew listed code
String connectString = hdfs:// + host + : + port + /;*
Configuration config =
Am not sure I'm getting your problem yet, but mind sharing the error
you see specifically? That'd give me more clues.
On Fri, May 17, 2013 at 2:39 PM, Steve Lewis lordjoe2...@gmail.com wrote:
Here is the issue -
1 - I am running a Java client on a machine unknown to the cluster - my
default
What is meant by 'cluster time'? and What you want to achieve?
let me try to clarify. i have a hadoop cluster (e.g. name node, data nodes,
job tracker, task trackers, etc...). all the nodes in this hadoop cluster
use ntp to sync time.
i have another computer (which i have referred to as a
You are searching for a solution in the Hadoop API (where this does not
exist)
thanks, that's all i needed to know.
cheers.
On Fri, May 17, 2013 at 9:17 AM, Niels Basjes ni...@basjes.nl wrote:
Hi,
i have another computer (which i have referred to as a server, since it
is
running
For hadoop, 'cluster time' is the local OS time. You might want to get the
time of the namenode machine but indeed if NTP is correctly used, the local
OS time from your server machine will be the best estimation. If you
request the time from the namenode machine, you will be penalized by the
delay
if NTP is correclty used
that's the key statement. in several of our clusters, NTP setup is kludgy.
note that the professionals administering the cluster are different from
us the engineers. so, there's a lot of red tape to go through to get
something trivial or not fixed. we have noticed that
and please remember, i stated that although the hadoop cluster uses NTP,
the server (the machine that is not a part of the hadoop cluster) cannot
assume to be using NTP (and in fact, doesn't).
On Fri, May 17, 2013 at 10:10 AM, Jane Wayne jane.wayne2...@gmail.comwrote:
if NTP is correclty used
yes, but that gets the current time on the server, not the hadoop cluster.
i need to be able to probe the date/time of the hadoop cluster.
On Tue, May 14, 2013 at 5:09 PM, Niels Basjes ni...@basjes.nl wrote:
I made a typo. I meant API (instead of SPI).
Have a look at this for more
If you have all nodes using NTP then you can simply use the native Java SPI
to get the current system time.
On Tue, May 14, 2013 at 4:41 PM, Jane Wayne jane.wayne2...@gmail.comwrote:
hi all,
is there a way to get the current time of a hadoop cluster via the
api? in particular, getting the
niels,
i'm not familiar with the native java spi. spi = service provider
interface? could you let me know if this spi is part of the hadoop
api? if so, which package/class?
but yes, all nodes on the cluster are using NTP to synchronize time.
however, the server (which is not a part of the hadoop
I made a typo. I meant API (instead of SPI).
Have a look at this for more information:
http://stackoverflow.com/questions/833768/java-code-for-getting-current-time
If you have a client that is not under NTP then that should be the way to
fix your issue.
Once you have that getting the current
Hi Steve,
A normally-written client program would work normally on both
permissions and no-permissions clusters. There is no concept of a
password for users in Apache Hadoop as of yet, unless you're dealing
with a specific cluster that has custom-implemented it.
Setting a specific user is not
Hi
I am new to Hadoop world. Can you please let me know what is a hadoop stack?
Thanks,
Burberry
On Mon, Apr 22, 2013 at 10:19 AM, Keith Wiley kwi...@keithwiley.com wrote:
Simple question: When I issue a hadoop fs -du command and/or when I view
the namenode web UI to see HDFS disk
Hi Keith,
The fs -du computes length of files, and would not report replicated
on-disk size. HDFS disk utilization OTOH, is the current, simple
report of used/free disk space, which would certainly include
replicated data.
On Mon, Apr 22, 2013 at 10:49 PM, Keith Wiley kwi...@keithwiley.com
To update on this, it was just pointed out to me by matt farrallee
that the auto fix of permissions is for a failsafe
in case of a race condition, and not meant to mend bad permissions in all cases:
https://github.com/apache/hadoop-common/commit/f25dc04795a0e9836e3f237c802bfc1fe8a243ad
you can set the property fs.trash.interval
From: Artem Ervits
Date: 2013-04-02 05:04
To: common-user@hadoop.apache.org
Subject: Protect from accidental deletes
Hello all,
I'd like to know what users are doing to protect themselves from accidental
deletes of files and directories in HDFS?
Hi Artem,
right now HDFS has a trash functionality that moves files removed with
'hadoop dfs -rm' to an intermediate directory (/trash). You can configure how
may time a file spends in that directory before it's actually removed from the
filesystem. Look for 'fs.trash.interval' on your
From your email header:
List-Unsubscribe: mailto:common-user-unsubscr...@hadoop.apache.org
On Wed, Mar 13, 2013 at 10:42 AM, Alex Luya alexander.l...@gmail.com wrote:
can't find a way to unsubscribe from this list.
--
Harsh J
any thought?
On Wed, Mar 13, 2013 at 7:17 PM, Rita rmorgan...@gmail.com wrote:
i am planning to build a hdfs cluster primary for streaming large files
(10g avg size). I was wondering if anyone can recommend a good hardware
vendor.
--
--- Get your facts first, then you can distort them as
The problem is resolved in the next release of hadoop (2.0.3-alpha cf.
MAPREDUCE-1700)
For hadoop 1.x based releases/distributions, put
-Dmapreduce.user.classpath.first=true on the hadoop command line and/or
client config
On Tue, Mar 12, 2013 at 6:49 AM, Jane Wayne
check dfs.include in your namenode. Entries in there should resolve to new
addresses.
On Feb 19, 2013, at 18:23, Henry JunYoung KIM henry.jy...@gmail.com wrote:
hi, hadoopers.
Recently, we've moved our clusters to another idc center.
We keep the same host-names, but, they have now
Here is a possible solution
To add a root directory structure such as the following
as InputPath, do the following
For the output to mirror the input and to build based on Harsh J's response
You might be able to not use a reducer and use the MultipleOutput from the
mapper directly.
Michael,
So as you said, do you want the upstream to encrypt data before sending it to
HDFS.
Regards
Abhishek
On Feb 15, 2013, at 8:47 AM, Michael Segel michael_se...@hotmail.com wrote:
Simple, have your app encrypt the field prior to writing to HDFS.
Also consider HBase.
On Feb 14,
too.
De: Michael Segel michael_se...@hotmail.com
Para: common-user@hadoop.apache.org
CC: cdh-u...@cloudera.org
Enviados: Viernes, 15 de Febrero 2013 8:47:16
Asunto: Re: How to handle sensitive data
Simple, have your app encrypt the field prior to writing to HDFS.
Also consider HBase
application.
I recommend to use HBase too.
De: Michael Segel michael_se...@hotmail.com
Para: common-user@hadoop.apache.org
CC: cdh-u...@cloudera.org
Enviados: Viernes, 15 de Febrero 2013 8:47:16
Asunto: Re: How to handle sensitive data
Simple, have your app encrypt the field prior
Harsh,
Can we load the file into HDFS with one replication and lock the file.
Regards
Abhishek
On Feb 22, 2013, at 1:03 AM, Harsh J ha...@cloudera.com wrote:
HDFS does not have such a client-side feature, but your applications
can use Apache Zookeeper to coordinate and implement this on
Hi Abhishek,
I fail to understand what you mean by that; but HDFS generally has no
client-exposed file locking on reads. There's leases for preventing
multiple writers to a single file, but nothing on the read side.
Replication of the blocks under a file is a different concept and is
completely
1 - 100 of 11664 matches
Mail list logo