Hello folks,
I have a lot of input splits (10k-50k - 128 mb blocks) which contains text
files. I need to process those line by line, then copy the result into
roughly equal size of "shards".
So i generate a random key (from a range of [0:numberOfShards]) which is
used to route the map output to d
Is only the same IP printed in all such messages? Can you check the DN
log in that machine to see if it reports any form of issues?
Also, did your jobs fail or kept going despite these hiccups? I notice
you're threading your clients though (?), but I can't tell if that may
cause this without furth
Deepak is right here. The line-reading technique is explained in
further detail at http://wiki.apache.org/hadoop/HadoopMapReduce.
On Fri, Apr 27, 2012 at 2:37 AM, Deepak Nettem wrote:
> HDFS doesn't care about the contents of the file. The file gets divided
> into 64MB Blocks.
>
> For example, If
I had 20 mappers in parallel reading 20 gz files and each file around
30-40MB data over 5 hadoop nodes and then writing to the analytics
database. Almost midway it started to get this error:
2012-04-26 16:13:53,723 [Thread-8] INFO org.apache.hadoop.hdfs.DFSClient -
Exception in createBlockOutputS
HDFS doesn't care about the contents of the file. The file gets divided
into 64MB Blocks.
For example, If your input file contains data in custom format (like
Paragraphs) and you want the files to split as per paragraphs, HDFS isn't
responsible - and rightly so.
The application developer needs to
I guess what I meant to say was, how does hadoop make 64M blocks without
cutting off parts of words at the end of each block? Does it only make blocks
at whitespace?
-SB
-Original Message-
From: Michael Segel [mailto:michael_se...@hotmail.com]
Sent: Thursday, April 26, 2012 1:56 PM
To:
Not sure of your question.
Java child Heap size is independent of how files are split on HDFS.
I suggest you look at Tom White's book on HDFS and how files are split in to
blocks.
Blocks are split on set sizes. 64MB by default.
Your record boundaries are not necessarily on file block bounda
Within my small 2 node cluster I set up my 4 core slave node to have 4 task
trackers and I also limited my java heap size to -Xmx1024m
Is there a possibility that when the data gets broken up that it will break it
at a place in the file that is not a whitespace? Or is that already handled
when
hi ,thank you for reply me.i don't know whether the Job configuration can
work,now i solve it in another way.
i want to pass a value from main to map because i want to do a judge in
map.now i write several map class and judge it in main().
在 2012年4月26日 下午5:58,Devaraj k 写道:
> Hi Wang Ruijun,
>
>
Ant suggestion or pointers would be helpful. Are there any best practices?
On Mon, Apr 23, 2012 at 3:27 PM, Mohit Anchlia wrote:
> I just wanted to check how do people design their storage directories for
> data that is sent to the system continuously. For eg: for a given
> functionality we get d
Manu,
for clarifying:
root has no access to the mounted HDFS. Just follow the howto:
1. create the group and the users on ALL nodes:
groupadd hdfs-user && adduser USERNAME -G hdfs-user
2. sudo into hdfs:
su - hdfs
3. create a directory in hdfs and change the rights:
hadoop fs -mkdir /someone
Look at how many job_ directories there are on your slave nodes. We're
using Cloudera so they are under the 'userlogs' directory, not sure on
'pure' Apache where they are.
As we approach 30k we see this.(we run a monthly report does 10s of
thousands of jobs in a few days) We've tried tuning the #
Yes, as I wrote. You can't use root as user for writing, root (or superuser)
has another context in hdfs. Just change into hdfs (su - hdfs) and try again.
For all user who should have access to the mounted fs you should create a group
and chown them in hdfs (maybe /tmp/group or similar)
best,
A
Yeah Alex, I tried. But still I am not able to make it
[root@namenode ~]# *echo "hadoop-fuse-dfs#dfs://namenode:8020 /hdfs_mount
fuse usetrash,rw 0 0" >> /etc/fstab*
[root@namenode ~]# *mount -a
INFO fuse_options.c:162 Adding FUSE arg /hdfs_mount*
[root@namenode ~]# *mount | grep fuse
fuse on /hd
Manu,
did you mount hdfs over fstab:
hadoop-fuse-dfs#dfs://namenode.local: /hdfs-mount fuse usetrash,rw 0 0" ?
You could that do with:
"mkdir -p /hdfs-mount && chmod 777 /hdfs-mount && echo
"hadoop-fuse-dfs#dfs://NN.URI: /hdfs-mount fuse usetrash,rw 0 0" >>
/etc/fstab && mount -a ; mount"
-
On 04/26/2012 01:49 AM, john cohen wrote:
I had the same issue. My problem was the use of VPN
connected to work, and at the same time working
with M/R jobs on my Mac. It occurred to me that
maybe Hadoop was binding to the wrong IP (the IP
given to you after connecting through VPN),
bottom lin
Thanks a lot Alex.
Actually I didn't tried the NFS option, as I am trying to sort out this
hadoop-fuse mounting issue.
I can't change the ownership of mount directory after hadoop-fuse mount.
[root@namenode ~]# ls -ld /hdfs_mount/
drwxrwxr-x 11 nobody nobody 4096 Apr 9 12:34 /hdfs_mount/
[root@
This is to announce the Berlin Buzzwords program. The Program Committee has
completed reviewing all submissions and set up the schedule containing a great
lineup of speakers for this years Berlin Buzzwords program. Among the speakers
we have Leslie Hawthorn (Red Hat), Alex Lloyd (Google), Michae
Hi,
I wrote a small writeup about:
http://mapredit.blogspot.de/2011/11/nfs-exported-hdfs-cdh3.html
As you see, the FS is mounted as nobody and you try as root. Change the
permissions in your hdfs:
hadoop -dfs chmod / chown
- Alex
--
Alexander Lorenz
http://mapredit.blogspot.com
On Apr 26
Dear All,
I have installed *Hadoop-fuse* to mount the HDFS filesystem locally . I
could mount the HDFS without any issues.But I am not able to do any file
operations like *delete, copy, move* etc directly. The directory ownership
automatically changed to *nobody:nobody* while mounting.
*[root@nam
Hi Wang Ruijun,
You can do this way,
1. Set the value in Job configuration with some property name before submitting
the job.
2. Get the value in map() function using the property name from the
configuration and you can perform the business logic.
Thanks
Devaraj
Hi,
It seems that you are trying to code a hadoop job which takes input from stdin ?
As far as I know it is not possible as hadoop job works on input coming in
batches not on real time streams.
Please wait for others response as well as I am a beginner and may be missing
something here.
Ajay Sr
maybe exists file hole, are -sh and du -sch /tmp results same?
add configuration:
dfs.permissions
false
2012/4/26 Lukáš Kryške
>
> Hello, I'm using Ubuntu Linux and Eclipse IDE for my Hadoop cluster
> development. I've two users in my OS: - hadoop (local cluster user)- user
> (local user where I use the Eclipse IDE, SVN for my
I use org.apache.hadoop.streaming.AutoInputFormat to handle sequence file input
for streaming, but I found that it provide format below for . (
key is a string , value is binary)
"keystring\tvalue\n"
since value is binary, there is a lot '\n' within value, my mapper can't
distinguish it.
in
hello everyone:
i have a problem.we knew map() has it's own parameter such
as map(Object key, Text value, Context context).
the following is my structure.
public class aaa{
public static class bbb {
public void map() {
i want use the variable nu
add configuration:
dfs.permissions
false
2012/4/26 Lukáš Kryške
>
> Hello, I'm using Ubuntu Linux and Eclipse IDE for my Hadoop cluster
> development. I've two users in my OS: - hadoop (local cluster user)- user
> (local user where I use the Eclipse IDE, SVN for
Rhive uses Hive Thrift server to connect with Hive. You can execute hive
queries and get results back into R data frames. and then play around with
it using R libraries. Its pretty interesting project, given that you have
Hive setup on top of hadoop.
Regards,
Praveenesh
On Thu, Apr 26, 2012 at 1:
Hello,
Yesterday, I've discovered RHive project. It use R-server on each datanode.
Does Somebody tried it ?
Guillaume
-Message d'origine-
De : Devi Kumarappan [mailto:kpala...@att.net]
Envoyé : mercredi 25 avril 2012 22:56
À : common-user@hadoop.apache.org
Objet : Re: Text Analysis
RH
Hello, I'm using Ubuntu Linux and Eclipse IDE for my Hadoop cluster
development. I've two users in my OS: - hadoop (local cluster user)- user
(local user where I use the Eclipse IDE, SVN for my project etc.) When I am
developing my program under 'user' I am not able to debug my program directly
Might be worth checking your inode usage as lack of inodes will result in the
same error
HTH
On 26 Apr 2012, at 07:01, Harsh J wrote:
> The transient Map->Reduce files do not go to the DFS, but rather onto
> the local filesystem directories specified by the "mapred.local.dir"
> parameter. I
31 matches
Mail list logo