If I understand correctly, Hadoop or BigData applicatins are highly I/O
bound. So for performant processing you would definitely prefer many
little physical machines.
Apart from that setting up Hadoop on VMWare should bring no questions that
would not occur on physical installations.
Depending
And that it is caused by this exception, as I've found out
2013-12-19 13:14:28,237 ERROR [pipe-uplink-handler]
org.apache.hadoop.mapred.pipes.BinaryProtocol: java.io.EOFException
at java.io.DataInputStream.readByte(DataInputStream.java:267)
at
sorry to reply to my own thread.
Does anyone know the answer to this question?
If so, can you please tell me if my understanding is right or wrong?
thanks.
2013/12/17 Xiaobin She xiaobin...@gmail.com
hi,
I'm using libhdfs to deal with hdfs in an c++ programme.
And I have encountered an
Hello,
In my experience with Flume, watching the HDFS Sink verbose output, I know
that even after a file has flushed, but is still open, it reads as a 0-byte
file, even if there is actually data contained in the file.
A HDFS file is a meta-location that can accept streaming input for as
long as
Ah, I see - thanks for clarifying.
john
From: Chris Nauroth [mailto:cnaur...@hortonworks.com]
Sent: Tuesday, December 17, 2013 4:32 PM
To: user@hadoop.apache.org
Subject: Re: HDFS short-circuit reads
Both of these methods return the same underlying data type that you're
ultimately interested
Hi folks,
I am unning Haddop 2.2 in pseudo-distributed mode and I have a i5 processor
with 8 GB RAM running on CentOS 6.4 . However my Nutch job fails after a
few minutes into execution withan OOM exception , I have increased the
HADOOP_HEAPSIZE from the 1000MB to 4GB, but I still face the issue.
Hello,
I am using hadoop-0.20.2-cdh3u1.
First question:
can one ommit sorting in streaming (e.g. when one only sums numbers)?
Second question:
Why do I have to run my jobs from empty current working directory? When
I run it from my home, I get this:
13/12/19 16:22:40 ERROR
To Devin,
thank you very much for your explanation.
I do found that I can read the data out of the file even if I did not close
the file I'm writing to ( the read operation is call on another file
handler opened on the same file but still in the same process ), which make
me more confuse at that
We have a large relational database ( ~ 500 GB, hundreds of tables ).
We have summary tables that we rebuild from scratch each night that takes
about 10 hours.
From these summary tables, we have a web interface that accesses the
summary tables to build reports.
There is a business reason for
In big data terms, 500G isn't big. But, moving that much data around
every night is not trivial either. I'm going to guess at a lot here,
but at a very high level.
1. Sqoop the data required to build the summary tables into Hadoop.
2. Crunch the summaries into new tables (really just files on
I would also look at current setup.
I agree with Chris that 500 GB is fairly insignificant.
Best,
Vinay Bagare
On Dec 19, 2013, at 12:51 PM, Chris Embree cemb...@gmail.com wrote:
In big data terms, 500G isn't big. But, moving that much data around
every night is not trivial either. I'm
Ok i just read the book section on this (Definite Guide to Hadoop), just to
be sure, length of a file is stored in Name Node, and its updated only
after client calls Name Node after close of the file. At that point if Name
Node has received all the ACK from Data Nodes then it will set the length
I want to log my System.out.println() in console
How to do that.
I did the below code but it is not displaying any thing. I am using mapred
api the old one.
Did i do anything wrong?
Code
package tech;
import java.io.BufferedWriter;
To Peyman,
thank you for your reply.
So the property of the file is stored in namenode, and it will not be
updated until the file is closed.
But isn't this will cause some problem ?
For example,
1. process A open the file in write mode, wirte 1MB data, flush the data,
and hold the file handler
14 matches
Mail list logo