I'm once again finding myself in this same situation, but still have no
solution. I have 4 task trackers that were black-listed at the job
level. I've since fixed the issue that got them blacklisted. But the
job still isn't assigning them tasks. Is there any way to clear the
blacklist at
Simple, have your app encrypt the field prior to writing to HDFS.
Also consider HBase.
On Feb 14, 2013, at 10:35 AM, abhishek abhishek.dod...@gmail.com wrote:
Hi all,
we are having some sensitive data, in some particular fields(columns). Can I
know how to handle sensitive in Hadoop.
Regards, abhishek.
I´m agree with Michael. You can encrypt your incoming data from your
application.
I recommend to use HBase too.
- Mensaje original -
De: Michael Segel michael_se...@hotmail.com
Para: common-user@hadoop.apache.org
CC: cdh-u...@cloudera.org
Enviados: Viernes, 15 de
Hi, I am a CS undergraduate working with hadoop. I wrote a library to process
logs, my input directory has the following structure:
logs_hourly
├── dt=2013-02-15
│ ├── ts=1360887451
│ │ └── syslog-2013-02-15-1360887451.gz
│ └── ts=1360891051
│ └── syslog-2013-02-15-1360891051.gz
├──
You should be able to use
http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html
to achieve this. It accepts subdirectory creation (under main job
output directory). However, the special chars may be an issue (i.e. -,
=, etc.), for which you'll either need
Hello Charles,
Thanks a lot for your reply and help!
Yes, the NN data (image, edit files) is kept separate from the data files.
dfs.name.dir=/hadoop/hdfs/name whereas dfs.data.dir=/hadoop/hdfs/data.
Their contents match with the description you specified
Can I safely go ahead and delete all
Hey Arko. It should be safe to delete then.
-Chuck
-Original Message-
From: Arko Provo Mukherjee [mailto:arkoprovomukher...@gmail.com]
Sent: Friday, February 15, 2013 11:56 AM
To: hdfs-user@hadoop.apache.org
Subject: Re: Managing space in Master Node
Hello Charles,
Thanks a lot for
I am trying to upload (using the -archives option of ToolRunner) a .zip archive
so that it is unzipped locally. I have tried all possible combinations of
command line options, getResource()/DistributedCache.getSmthng() with no luck.
I have found a post suggesting that in Yarn this is all
Hello,
I am new to Hadoop. I have installed Ubuntu i386 on my VM and want to debug
hadoop. Can you please guide me as to which stable release should I download
and exactly how should I use ANT debugger to debug hadoop.
Also, I would like to know how the MapReduce interacts with other
Hi,
I am struggling for many days to install Oozie 3.3.1 on Hadoop 1.1.1. Oozie
documentation is very poorly written I am not able to figure it out. While
installing I got an error saying it doesn't support Hadoop v1.1.1. Please
help me out.
--
Regards
Ananda Prakash Verma
Hi,
I am new in Hadoop and i set the hadoop cluster with the help of Michell
Noll Multi-Node setup (
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/).
When i setup the single Node Hadoop then every things works fine.
But in Multi Node setup i found that
Hi Janesh,
I think your SNN may be starting up with the wrong IP, I'm sure the machine
parameter should say 192.168.0.101?
http://namenode:50070/getimage?putimage=1
http://namenode:50070/getimage?putimage=1port=50090machine=0.0.0.0token=
-32:1989419481:0:136084943:1360849122845
Are you trying to run a Map/Reduce job or are you writing a new YARN
application? If it is a MR job, then it should work mostly the same as
before (on 1.x). If you are writing a new YARN application then there is a
separate Map in the ContainerLaunchContext that you need to fill in.
--Bobby
it seems you are having an extra : before the first / in your uris.
thx
Alejandro
(phone typing)
On Feb 15, 2013, at 8:22 AM, Dhanasekaran Anbalagan bugcy...@gmail.com wrote:
HI Guys,
we have two cluster running with CDH4.0.1 I am trying data one cluster to
another cluster.
It's says
I would suggest you to read Hadoop The Definitive Guide 2nd Edition by Tom
white. I too started few weeks back and still learning it. :) hope you like it
too
From: SrinivasaRao Kongar [mailto:ksrinu...@gmail.com]
Sent: Thursday, February 14, 2013 11:38 PM
To: user@hadoop.apache.org
Subject:
I might contact them but we are specifically avoiding EMR for this project. We
have already successfully deployed EMR but we want more precise control over
the cluster, namely the ability to persist and reawaken it on demand. We
really want a direct Hadoop installation instead of an EMR-based
Hi Marcos and Keith,
Thanks for bringing this to our attention. Saurabh is currently OOF, so I’ll
pass this along to the EMR team.
Jeff;
From: Marcos Ortiz Valmaseda [mailto:mlor...@uci.cu]
Sent: Thursday, February 14, 2013 7:10 PM
To: user@hadoop.apache.org
Cc: Saurabh Baji; Barr, Jeffrey
i don't think you can't do an embarassingly parallel sort of a randomly
ordered file without merging results.
However, if you know that the file is psudeoordered:
1123
1232
1000
19991019
20200222
30111
3000
Then you can (maybe) sort the individual blocks in mappers using
This book is actually a good way to start with. But I would suggest you to
go for the 3rd edition. 2nd edition covers the old API.
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Fri, Feb 15, 2013 at 10:26 PM, Shah, Rahul1 rahul1.s...@intel.comwrote:
I would suggest
HI,
In one of our test clusters that has Namenode HA using QJM+ YARN + HBase 0.94,
namenode came down with following logs. I am trying to root cause the issue.
Any help is appreciated.
=
2013-02-13 10:18:27,521 INFO hdfs.StateChange - BLOCK*
NameSystem.fsync: file
This link might help :
http://www.cloudera.com/content/cloudera/en/developer-community/developer-admin-resources/new-to-hadoop.html#books
On Fri, Feb 15, 2013 at 1:17 PM, Mohammad Tariq donta...@gmail.com wrote:
This book is actually a good way to start with. But I would suggest you to
go
Yes, I know, Keith. I know that you want more control over your Hadoop cluster,
so I recommend you three things:
- You can use Whirr to manage your Hadoop clusters installations en EC2 [1]
- You can create your own Hadoop-focused AMI based in your requirements (my
favorite choice here)
- Or
I don't see a crash log in your snippets. Mind pastebinning the NN
crash log up somewhere? Did both NNs go down?
In any case, the log below is due to a client attempting to connect
with an older HDFS library. This would log such warns (and also
indicate the client IP/attempt port as you notice),
Regards, Dheeren. It seems that you are using an incompatible version of HDFS
with this
version of HBase. Can you provide the exact version of your HBase package?
- Mensaje original -
De: Dheeren Bebortha dbebor...@salesforce.com
Para: user@hadoop.apache.org
Enviados: Viernes, 15
Why not?
Who said you had to parallelize anything?
On Feb 15, 2013, at 12:09 PM, Jay Vyas jayunit...@gmail.com wrote:
i don't think you can't do an embarassingly parallel sort of a randomly
ordered file without merging results.
However, if you know that the file is psudeoordered:
well.. ok... i guess you could have a 1TB block do an in place sort on the
file, write it to a tmp directory, and then spill the records in order or
something. at that point might as well not use hadoop.
Why do you need a 1TB block?
On Feb 15, 2013, at 1:29 PM, Jay Vyas jayunit...@gmail.com wrote:
well.. ok... i guess you could have a 1TB block do an in place sort on the
file, write it to a tmp directory, and then spill the records in order or
something. at that point might as well not
Maybe im mistaken about what is meant by map-only. Does a map-only job
still result in standard shuffle-sort ? Or does that get cut short?
hmmm i think I see what you mean, i guess a map-only sort is possible as
long as you use a custom partitioner and you let the shuffle/sort run to
A map-only job does not result in the standard shuffle-sort. Map outputs
are written directly to HDFS.
-Sandy
On Fri, Feb 15, 2013 at 12:23 PM, Jay Vyas jayunit...@gmail.com wrote:
Maybe im mistaken about what is meant by map-only. Does a map-only job
still result in standard shuffle-sort ?
Hi,
I am trying to do some work with in memory Join Map Reduce implementation,
it can be summarized as a a join between two data set, R and S, one of them is
too large to fit into memory, the other one can fit into memory reasonably
well,
(size of R size of S). The typical implementation
Why not look at HIVE ? It already implements the JOIN that you are looking
for and has features to do MAPJOIN i.e. load small file into memory.
On Fri, Feb 15, 2013 at 1:25 PM, Yunming Zhang
zhangyunming1...@gmail.comwrote:
Hi,
I am trying to do some work with in memory Join Map Reduce
This is a typical total sort using map/reduce. it can be done with both
map and reduce.
On Fri, Feb 15, 2013 at 10:39 PM, Arun Vasu arun...@gmail.com wrote:
Hi,
Is it possible to sort a huge text file lexicographically using a
mapreduce job which has only map tasks and zero reduce tasks?
Use PIG it has specific directives for in memory joins of small
data sets. The whole thing might require a half a dozen lines
of code.
On 2/15/2013 4:25 PM, Yunming Zhang wrote:
Hi,
I am trying to do some work with in memory Join Map Reduce implementation,
it can be summarized as a a join
Specifically, replicated join -
http://pig.apache.org/docs/r0.10.0/perf.html#replicated-joins
On Fri, Feb 15, 2013 at 6:22 PM, David Boyd db...@lorenzresearch.comwrote:
Use PIG it has specific directives for in memory joins of small
data sets. The whole thing might require a half a dozen
Hi,
I can see that in pom.xml the supported hadoop version is
hadoop.version1.0.1/hadoop.version
You can try to build your self with version you want to see if it works.
Also try to ask your question on oozie mailing list.
Regards,
Jagat Singh
On Sat, Feb 16, 2013 at 12:45 PM, Hemanth
Hi
I tried to use short circuit read to improve my hbase cluster MR scan
performance.
I have the following setting in hdfs-site.xml
dfs.client.read.shortcircuit set to true
dfs.block.local-path-access.user set to MR job runner.
The cluster is 1+4 node
Hi Raymond,
did you enable security feature in your cluster? there'll be no obvious
benefit be found if so.
Regards,
Liang
___
发件人: Liu, Raymond [raymond@intel.com]
发送时间: 2013年2月16日 11:10
收件人: user@hadoop.apache.org
主题: why my test result on dfs short
HELLO GOOD MORNING HADOOP KINGS
On Thu, Feb 14, 2013 at 1:25 AM, Yusaku Sako yus...@hortonworks.com wrote:
Hello Seema,
Yes, you can use Apache Ambari to set up and manage a single node cluster.
Yusaku
On Wed, Feb 13, 2013 at 11:48 AM, Hadoop seemami...@gmail.com wrote:
Hi All, Good to
Hi Liang
Did you mean set dfs.permissions to false?
Is that all I need to do to disable security feature? Cause It seems to me that
without change dfs.block.local-path-access.user, dfs.permissions alone doesn't
works. HBASE still fall back to go through datanode to read data.
Hi Raymond,
I'm not very clear about your senario, just a kindly reminder: If security is
on, the feature can be used only for user that has kerberos credentials at the
client, therefore map reduce tasks cannot benefit from it in general, see
HDFS-2246's release note for more info
If you didn't enable
Sorry all, I maybe get the answer because I didn't read doc carefully.
clustera is for n1, n2
clusterb is for n3,n4
so the following configuration would be answer my question, does that okay?
property
namedfs.nameservices/name
valueclustera, clusterb/value
/property
property
If you want HBase to leverage the shortcircuit, the DN config
dfs.block.local-path-access.user should be set to the user running
HBase (i.e. hbase, for example), and the hbase-site.xml should have
dfs.client.read.shortcircuit defined in all its RegionServers. Doing
this wrong could result in
Hi Harsh
Yes, I did set both of these. While not in hbase-site.xml but hdfs-site.xml.
And I have double confirmed that local reads are performed, since there are no
Error in datanode logs, and by watching lo network IO.
If you want HBase to leverage the shortcircuit, the DN config
Hi Arpit Gupta
Yes, this way also confirms that short circuit read is enabled on my cluster.
13/02/16 14:07:34 DEBUG hdfs.DFSClient: Short circuit read is true
13/02/16 14:07:34 DEBUG hdfs.DFSClient: New BlockReaderLocal for file
It seems to me that, with short circuit read enabled, the BlockReaderLocal read
data in 512/4096 bytes unit(checksum check enabled/skiped)
While when It go through datanode, the BlockSender.sendChunks will read and
sent data in 64K bytes units?
Is that true? And if so, won't it explain that
45 matches
Mail list logo