Adding values to a Configuration object does not really work unless you
serialize the config into a file and send it over to the AM and containers as a
local resource. The application code would then need to load in this file using
Configuration::addResource(). MapReduce does this by taking in a
Thanks Yong.
Thanks & Regards,
B Anil Kumar.
On Fri, Jan 31, 2014 at 12:44 AM, java8964 wrote:
> In avro, you need to think about a schema to match your data. Avor's
> schema is very flexible and should be able to store all kinds of data.
>
> If you have a Json string, you have 2 options to ge
In avro, you need to think about a schema to match your data. Avor's schema is
very flexible and should be able to store all kinds of data.
If you have a Json string, you have 2 options to generate the Avro schema for
it:
1) Use "type: string" to store the whole Json string into Avro. This will b
Hi,
As of now in my jobs, I am using SequenceFileOutputFormat and I am emitting
custom java objects as MR output.
Now I am planning to emit it in avro format, I went through few blogs but
still have following doubts.
1) My current custom Writable objects has nested json format as toString(),
So
MultipleInputs is nice. Most of the time, I use it for reduce-side join.
It's great, however, you'll need to specify different Mapper class per input
directory.
In our case, we try to let the Mapper itself to capture the directory
information, because these directories might contain
data across m
Hi Prav,
You are correct, thanks for the explanation. As per below link, I can see
that Job's method internally calls to DistributedCache itself (
http://grepcode.com/file/repository.cloudera.com/content/repositories/releases/org.apache.hadoop/hadoop-mapreduce-client-core/2.0.0-cdh4.4.0/org/apache
Hi Amit,
Side data distribution is altogether a different concept at all. Its when
you set custom (key,value) pairs and use Job object for doing that, so that
you can use them in your mappers/reducers. It is good when you want to pass
some small information to your mappers/reducers like extra comm
Hi Prav,
Yes, you are correct that DistributedCache does not upload file into
memory. Also using job configuration and DistributedCache are 2 different
approaches. I am referring based on "Hadoop: The definitive guide"
Chapter:8 > Side Data Distribution (Page 288-295).
As you are saying that now m
Hi Amit,
I am not sure how are they linked with DistributedCache.. Job configuration
is not uploading any data in memory.. As far as I am aware of how
DistributedCache works, nothing get loaded in memory. Distributed cache
just copies the files into slave nodes, so that they are accessible to
mapp