I have a follow-up question on this thread: How do we make sure that at the
getFileSplit phase, there is no records that cross the boundary of different
file splits?
To explain my point better, for example, if each of my record is 100 bytes,
would there be such case that there is some record
PM, Yabo-Arber Xu arber.resea...@gmail.com
wrote:
I have a follow-up question on this thread: How do we make sure that at
the
getFileSplit phase, there is no records that cross the boundary of
different
file splits?
To explain my point better, for example, if each of my record is 100
Congratulations! Wished I were there. :-)
Best,
Arber
On Sat, May 16, 2009 at 9:09 AM, He Yongqiang
heyongqi...@software.ict.ac.cn wrote:
Hi, all
In May 9, we held the second Hadoop In China salon. About 150 people
attended, 46% of them are engineers/managers from industry companies, and
.
So anyway, give Todd a shout if you want to try DEBs out. Otherwise, if
you're interested in going down the Redhat derivative route (Fedora, RHEL,
CentOS), you can use the RPMs.
Alex
On Tue, Apr 21, 2009 at 10:04 PM, Yabo-Arber Xu arber.resea...@gmail.com
wrote:
Thanks for all your
Hi there,
I setup a small cluster for testing. When I start my cluster on my master
node, I have to type the password for starting each datanode and
tasktracker. That's pretty annoying and may be hard to handle when the
cluster grows. Any graceful way to handle this?
Best,
Arber
Thanks for all your help, especially Asteem's detailed instruction. It works
now!
Alex: I did not use RPMs, but several of my existing nodes are installed
with Ubuntu. Is there any diff on running Hadoop on Ubuntu? I am thinking of
choosing one before I started scaling up the cluster, but not
Hi Amandeep,
I just did the same investigation not long ago, and I was recommended to get
Amazon EC2 X-Large
equivalenthttp://www.google.com/url?q=http%3A%2F%2Faws.amazon.com%2Fec2%2F%23pricingsa=Dsntz=1usg=AFrqEzc1z8IB5p0hIR7SGe-mRVRZXW7Lvgnodes:
, 8
EC2 Compute Units (4 virtual cores with 2 EC2