Hi
I am using Hadoop 0.20.203.
I have performed simple vertical scalability experiments of Hadoop with the
use of Graph datasets and BFS algorithm. My experiment configuration is
20workers + Master. In each test I divided the Map slots and Reduce Slots
equally (M==R), I can process the
Hello,
I am trying to write MapReduce jobs to read data from JSON files and load
it into HBase tables.
Please suggest me an efficient way to do it. I am trying to do it using
Spring Data Hbase Template to make it thread safe and enable table locking.
I use the Map methods to read and parse the
Hello Panshul,
My answers :
1- You can serialize the entire jSON into a byte[ ] and store it in a
cell.(Is it important for you extract individual values from your JSON and
then put them into the table?)
2- You can write your own datatype to pass your object to the reducer. But,
it must be a
hive-0.9.0-cdh4.1.2)
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary=bcaec502d692f132a604d5202931
--bcaec502d692f132a604d5202931
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
When you run a hive command like show tables or show databases, do
One correction. If your datatype is gonna be used just as values, you
actually don't need it to be comparable. But if you need it to be a key as
well, then it must be both.
Warm Regards,
Tariq
https://mtariq.jux.com/
cloudfront.blogspot.com
On Thu, Feb 7, 2013 at 4:58 PM, Mohammad Tariq
Hello,
Thank you for the reply.
1. I cannot serialize the Json and store it as a whole. I need to extract
individual values and store them as later I need to query the stored values
in various aggregation algorithms.
2. Can u please point me in direction where I can find out how to write a
data
You might find these links helpful :
http://stackoverflow.com/questions/10961474/how-in-hadoop-is-the-data-put-into-map-and-reduce-functions-in-correct-types/10965026#10965026
Hello,
Why not using a PIG script for that ?
make the json file available on HDFS
Load with
http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/JsonLoader.html
Store with
http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html
Is there a good reason why the OldCombinerRunner passes Reporter.NULL to
the combiner instead of the actual TaskReporter? The NewCombinerRunner
does use the TaskReporter when creating the context. If this is a bug I
will submit a JIRA with a patch
I am using the Map Reduce approach. I was looking into AVRO to create my
own custom Data types to pass from Mapper to Reducer.
With Avro I need to maintain the schema for all the types of Jason files I
am receiving and since there will be many different map reduce methods
running, so a different
Hi All,
Can any one list me the mandatory system level check
(ulimit,firewall,selinux...) before starting a hadoop cluster.
Regards
Sathish
This conversation is probably better for common-user@ so I am moving it
over there, I put common-dev@ in the BCC.
I am not really sure what you mean by validate. I assume you want to test
that your library does what you want it to do. I would start out with
unit tests to validate the individual
Is it possible to pass unmolested binary data through a map-only streaming
job from the command line? I.e., is there a way to avoid extra tabs and
newlines in the output? I don't need input splits or key/value pairs, I
just want one whole input file fed unmodified into a program, and its
output
Hello,
the lack of an HDFS API is just one of the drawbacks that motivated us
to abandon Streaming and develop Pydoop. Unfortunately, in the blog
post cited by Harsh J, Pydoop is just briefly mentioned because the
author failed to build and install it.
Here is how you solve your problem in
when you start hadoop its always better to set
ulimit -n
we normally disable selinux and iptables in our hadoop cluser and then put
network level security enclosing entire cluster with single network
security level
On Thu, Feb 7, 2013 at 8:36 PM, sara raji sa848...@gmail.com wrote:
Hi All,
I agree its a bug if there is a discrepancy between the APIs (we are
supposed to be supporting both for the time being). Please do file a
JIRA with a patch - there shouldn't be any harm in re-passing the
reporter object within the combiner.
On Thu, Feb 7, 2013 at 7:10 PM, Jim Donofrio
Without access to org.skyz.basic.KeyComparator.compare it is hard to say why
this is happening. It looks like you got an array with 0 entries and the given
function could not handle that.
I don't really know why.
--Bobby
From: Aseem Anand aseem.ii...@gmail.commailto:aseem.ii...@gmail.com
hi,
it is Hadoop 2.0.0-cdh4.1.1. the whole output is given below:
Hadoop 2.0.0-cdh4.1.1
Subversion
file:///data/1/jenkins/workspace/generic-package-centos32-6/topdir/BUILD/hadoop-2.0.0-cdh4.1.1/src/hadoop-common-project/hadoop-common-r
581959ba23e4af85afd8db98b7687662fe9c5f20
On Fri, Feb 8,
Thanks, I managed to correlate proper line numbers.
Are you using some form of custom serialization in your job code? That
is, are your keys non-Writable types and are of some other type? The
specific NPE is arising from the SerializationFactory not being able
to find a serializer for your
unsubscribe
On 08/02/13 12:19 AM, Harsh J ha...@cloudera.com wrote:
The JIRA https://issues.apache.org/jira/browse/MAPREDUCE-2584 should
help such cases, if what I speculated above is indeed the case.
On Fri, Feb 8, 2013 at 12:16 AM, Harsh J ha...@cloudera.com wrote:
Thanks, I managed to
Thanks for the clarification. I did pull the file from 2.0 into my fork and
was able to compile. I was going to submit the patch but given that it's
already been pulled in I will just drop my fork for now.
-Viral
On Thu, Feb 7, 2013 at 9:09 AM, Robert Evans ev...@yahoo-inc.com wrote:
0.23 is
Please only use CDH mailing list and do not copy this to hdfs-user.
On Thu, Feb 7, 2013 at 7:20 AM, samir das mohapatra samir.help...@gmail.com
wrote:
Any Suggestion...
On Thu, Feb 7, 2013 at 4:17 PM, samir das mohapatra
samir.help...@gmail.com wrote:
Hi All,
I could
Don't cross post to irrelevant (user@hadoop) forums.
On Thu, Feb 07, 2013 at 03:30AM, Viral Bajaria wrote:
hive-0.9.0-cdh4.1.2)
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary=bcaec502d692f132a604d5202931
--bcaec502d692f132a604d5202931
Content-Type: text/plain;
Patai,
I am still curious, how do we monitor the consumption of this value in
each datanode.
You can use the getDataNodeStats() method of your your
DistributedFileSystem instance. It returns an array of DatanodeInfo
which contains, among other things, the xceiver count that you are
Hi Suresh,
Thanks for advice,
why you are so monopoly, You shoul not be. Problem is solution not
problem.
Note: I am looking for any user does not matter bcz it is common use
Scenario.
On Fri, Feb 8, 2013 at 3:31 AM, Suresh Srinivas sur...@hortonworks.comwrote:
Please only use CDH
25 matches
Mail list logo