Is it possible to set FS permissions (e.g. 755) in hdfs-site.xml?
--
Best regards,
FSDataOutputStream can write in a file in a remote host?
--
Best regards,
I suppose you mean the default FS permissions. You can use dfs.umask for that.
Thanks,
+Vinod Kumar Vavilapalli
Hortonworks Inc.
http://hortonworks.com/
On Mar 28, 2013, at 5:23 AM, Pedro Sá da Costa wrote:
Is it possible to set FS permissions (e.g. 755) in hdfs-site.xml?
--
Best
I'm moving off AWS MapReduce to our own cluster, I'm installing Hadoop on
Ubuntu Server 12.10.
I see a .deb installer and installed that, but it seems like files are all
over the place `/usr/share/Hadoop`, `/etc/hadoop`, `/usr/bin/hadoop`. And
the documentation is a bit harder to follow:
apache bigtop has builds done for ubuntu
you can check them at jenkins mentioned on bigtop.apache.org
On Thu, Mar 28, 2013 at 11:37 AM, David Parks davidpark...@yahoo.comwrote:
I’m moving off AWS MapReduce to our own cluster, I’m installing Hadoop on
Ubuntu Server 12.10.
** **
I see
The DistributedCache is cleaned automatically and no user intervention
(aside of size limitation changes, which may be an administrative
requirement) is generally required to delete the older distributed
cache files.
This is observable in code and is also noted in TDG, 2ed.:
Tom White:
The
The EMR distributions have special versions of the s3 file system. They
might be helpful here.
Of course, you likely aren't running those if you are seeing 5MB/s.
An extreme alternative would be to light up an EMR cluster, copy to it,
then to S3.
On Thu, Mar 28, 2013 at 4:54 AM, Himanish
Another Ted piping in.
For Hadoop use, it is dangerous to use anything but a static class for your
mapper and reducer functions since you may accidentally think that you can
access a closed variable from the parent. A static class cannot reference
those values so you know that you haven't made
Also, Canonical just announced that MapR is available in the Partner repos.
On Thu, Mar 28, 2013 at 7:22 AM, Nitin Pawar nitinpawar...@gmail.comwrote:
apache bigtop has builds done for ubuntu
you can check them at jenkins mentioned on bigtop.apache.org
On Thu, Mar 28, 2013 at 11:37 AM,
When client write data, if there are three replicates, the sync method
latency time formula should be:
sync method latency time = first datanode receive data time + sencond
datanode receive data time + third datanode receive data time.
if the three datanode receive data time all are 2
Hi all,
Fair Scheduler link is not added in the document index page for hadoop 2.x, as
Capacity Scheduler does, like
http://hadoop.apache.org/docs/r2.0.3-alpha/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html
Should we add it if it can be experimentally used?
Regards,
Kai
Hi everyone,
how can i know the keys that are associated to a particular reducer in
the setup method?
Let's assume in the setup method to read from a file where each line
is a string that will become a key emitted from mappers.
For each of these lines I would like to know if the string will be a
Hi,
Not sure if I am answering your question, but this is the background. Every
MapReduce job has a partitioner associated to it. The default partitioner
is a HashPartitioner. You can as a user write your own partitioner as well
and plug it into the job. The partitioner is responsible for
Hi Dave,
Thanks for your reply. Our hadoop instance is inside our corporate
LAN.Could you please provide some details on how i could use the s3distcp
from amazon to transfer data from our on-premises hadoop to amazon s3.
Wouldn't some kind of VPN be needed between the Amazon EMR instance and our
Hi
Sameone know samething about EMC distribution for Big Data which itegrate
Hadoop and other tools ?
Thanks
Hi Hemanth,
thanks for your reply.
Yes, this partially answered to my question. I know how hash
partitioner works and I guessed something similar.
The piece that I missed was that mapred.task.partition returns the
partition number of the reducer.
So, putting al the pieces together I undersand
Hmm. That feels like a join. Can't you read the input file on the map side
and output those keys along with the original map output keys.. That way
the reducer would automatically get both together ?
On Thu, Mar 28, 2013 at 5:20 PM, Alberto Cordioli
cordioli.albe...@gmail.com wrote:
Hi
Yes, that is a possible solution.
But since the MR job has another scope, the mappers already read other
files (very large) and output tuples.
You cannot control the number of mappers and hence the risk is that a
lot of mappers will be created, and each of them read also the other
file instead of
I solved the problem using Capacity Scheduler, because I'm using 1.0.4
It is known issue solved in version 1.2.0 (
https://issues.apache.org/jira/browse/MAPREDUCE-4398).
On Thu, Mar 28, 2013 at 11:08 AM, Bertrand Dechoux decho...@gmail.comwrote:
Permission denied: user=*realtime*,
You can get detail information from the Greenplum website:
http://www.greenplum.com/products/pivotal-hd
2013/3/28 oualid ait wafli oualid.aitwa...@gmail.com
Hi
Sameone know samething about EMC distribution for Big Data which itegrate
Hadoop and other tools ?
Thanks
1st when client wants to write data to HDFS, it should be create
DFSOutputStream.
Then the client write data to this output stream and this stream will
transfer data to all DataNodes with the constructed pipeline by the means
of Packet whose size is 64KB.
These two operations is concurrent, so the
You can try to add some probes to source code and recompile it.
If you want to know the keys and values you add at each step, you can add
print code to map() function of class Mapper and reduce() function of class
Reducer.
The shortcoming is that you will produce many log output which may fill the
Thanks Harsh. My issue was not related to the number of files/folders
but related to the total size of the DistributedCache. The directory
where it's stored only has 7GB available... So I will setup the limit
to 5GB with local.cache.size, or move it to the drives there I have
the dfs files stored.
you could use WebHDFS/HttpFS and the APPEND operation.
thx
On Wed, Mar 27, 2013 at 1:25 AM, 小学园PHP xxy-...@qq.com wrote:
I want to put file to HDFS with curl, and more i need to put it by chunks.
So, does somebody know if curl can upload file by chunk?
Or ,who has work with WebHDFS by
Hi,
I am trying to run a hadoop streaming job, where I want to specify my
mapper script residing on HDFS. Currently its trying to locate the
script on local FS only. Is there a option available through which I
can specify hadoop streaming to look for the mapper script on HDFS,
not on local FS.
In BigTop´s wiki, you can find this:
https://cwiki.apache.org/confluence/display/BIGTOP/How+to+install+Hadoop+distribution+from+Bigtop+0.5.0#HowtoinstallHadoopdistributionfromBigtop0.5.0-Ubuntu%2864bit%2Clucid%2Cprecise%2Cquantal%29
2013/3/28 Ted Dunning tdunn...@maprtech.com
Also, Canonical
Hello,
I've been running a virtualized CDH 4.2 cluster. I now want to migrate all
my data to another (this time physical) set of slaves and then stop using
the virtualized slaves.
I added the new physical slaves in the cluster, and marked all the old
virtualized slaves as decommissioned using
Felix,
After changing hdfs-site.xml, did you run hadoop dfsadmin -refreshNodes? That
should have been enough, but you can try increasing the replication factor of
these files, wait for them to be replicated to the new nodes, then setting it
back to its original value.
Cheers,
Marcos
In
Yes, I didn't specify how I was testing my changes, but basically, here's
what I did:
My hdfs-site.xml file was modified to include a reference the a file
containing a list of all datanodes (via dfs.hosts) and a reference to a
file containing decommissioned nodes (via dfs.hosts.exclude). After
Did you check if you have any disk that is read-only for the nodes that has
the missing blocks ? If you know which are the blocks, you can manually copy
the blocks and the corresponding '.meta' file to another node. Hadoop will
re-read those blocks and replicate them.
-
On Mar 28, 2013,
which hadoop version you used?
On Mar 29, 2013 5:24 AM, Felix GV fe...@mate1inc.com wrote:
Yes, I didn't specify how I was testing my changes, but basically, here's
what I did:
My hdfs-site.xml file was modified to include a reference the a file
containing a list of all datanodes (via
Hi,
I am facing a weird problem.
My python scripts were working just fine.
I made few modifications..
tested via:
cat input.txt | python mapper.py | sort | python reducer.py
runs just fine
Ran on my local machine (pseudo-distributed mode)
THat also runs just fine
Deployed on clusters..
Now,
I'm using the version of hadoop in CDH 4.2, which is a version of Hadoop
2.0 with a bunch of patches on top...
I've tried copying one block and its .meta file to one of my new DN, then
restarted the DN service, and it did pick up the missing block and
replicate it properly within the new slaves.
Very much like this:
http://stackoverflow.com/questions/13445126/python-code-is-valid-but-hadoop-streaming-produces-part-0-empty-file
On Thu, Mar 28, 2013 at 5:10 PM, jamal sasha jamalsha...@gmail.com wrote:
Hi,
I am facing a weird problem.
My python scripts were working just fine.
I
On Mar 28, 2013, at 7:13 PM, Felix GV fe...@mate1inc.com wrote:
I'm using the version of hadoop in CDH 4.2, which is a version of Hadoop 2.0
with a bunch of patches on top...
I've tried copying one block and its .meta file to one of my new DN, then
restarted the DN service, and it did
oops never mind guys..
figured out the issue.
sorry for spamming.
On Thu, Mar 28, 2013 at 5:15 PM, jamal sasha jamalsha...@gmail.com wrote:
Very much like this:
http://stackoverflow.com/questions/13445126/python-code-is-valid-but-hadoop-streaming-produces-part-0-empty-file
On Thu, Mar
Thanks Yanbo for your reply.
I test code are :
FSDataOutputStream outputStream = fs.create(path);
Random r = new Random();
long totalBytes = 0;
String str = new String(new byte[1024]);
while(totalBytes 1024 * 1024 * 500) {
byte[] bytes =
Hi, I'm using hadoop 1.0.4.
Today I want to delete a file in hdfs, but after a while, the file reappear
again.
I use both type of remove command: hadoop fs -rm and hadoop fs -rmr, but the
file still reappear after a while.
I inspect the namenode log and saw repetition of block/dir/removing lease
The write method write data to memory of client, the sync method send
package to pipeline I thin you made a mistake for understanding the write
procedure of HDFS.
It's right that the write method write data to memory of client, however
the data in the client memory is sent to DataNodes at the
Hi,
The way I understand your requirement - you have a file that contains a set
of keys. You want to read this file on every reducer and take only those
entries of the set, whose keys correspond to the current reducer.
If the above summary is correct, can I assume that you are potentially
Sorry!
Todd has been reviewed it.
On Fri, Mar 29, 2013 at 11:40 AM, Azuryy Yu azury...@gmail.com wrote:
hi,
who can review this one:
https://issues.apache.org/jira/browse/HDFS-4631
thanks.
Sorry, please ignore my question. It appears that the problem is from program
that upload files into hadoop.
I'm too quick to assume that the problem relies inside hadoop.
Sorry for being a noob in hadoop.
Best regards,
Henry Hung.
From: MA11 YTHung1
Sent: Friday, March 29, 2013 11:21 AM
To:
None of that complexity, they distribute the jar publicly (not the source,
but the jar). You can just add this to your libjars:
s3n://region.elasticmapreduce/libs/s3distcp/latest/s3distcp.jar
No VPN or anything, if you can access the internet you can get to S3.
Follow their docs here:
43 matches
Mail list logo