along the lines of the email below, has there any libraries built out to copy
files in parallel into the cluster? using some sort of byte offset techniques,
etc?
Thanks,
Ranjith
On Oct 30, 2012, at 9:24 AM, "M. C. Srivas" wrote:
> Loading a petabyte from a single machine will t
the file loaded last could be corrupted. try to decompress the file and see if
you get any errors.
Thanks,
Ranjith
On Aug 9, 2012, at 8:07 PM, rei125 wrote:
> s.
, record and the data buffers. Given that my
jvm for map tasks 700m and the space left after taking out the space used for
buffers is 400m. What is stored in this 400m?
Thanks,
Ranjith
Harsh,
Thanks for the response bud. Appreciate it!
Thanks,
Ranjith
On May 21, 2012, at 11:09 PM, Harsh J wrote:
> Ranjith,
>
> MapReduce and HDFS are two different things. MapReduce uses HDFS (and
> can use any other FS as well) to do some efficient work, but HDFS does
> no
Thanks harsh. So when it connects directly to the data nodes it does not fire
off any mappers. So how does it get the data over? Is it just a block by block
copy?
Thanks,
Ranjith
On May 21, 2012, at 9:22 PM, Harsh J wrote:
> Ranjith,
>
> Are you speaking of DistC
I have always wondered about this and and not sure as to phenomenon. When I
fire a map reduce job to copy data over in a distributed fashion I would expect
to see mappers executing the copy. What happens with a copy command from Hadoop
fs?
Thanks,
Ranjith
user and
>> permissions must be rwx--
Thanks,
Ranjith
On May 17, 2012, at 5:37 AM, Luca Pireddu wrote:
> Hello all,
>
> we're trying to set up a multi-user MapReduce cluster that doesn't use HDFS.
> The idea is to use a central, shared JobTracker to which
hear from the rest of the
community about this to see it is consistent with what they have seen.
Thanks,
Ranjith
On May 14, 2012, at 8:45 PM, "Manish Bhoge" wrote:
> You first need to copy data using copyFromLocal to your HDFS and then you can
> utilize PIG and Hive program for f