Hadoop 0.20.203 vertical scalability

2013-02-07 Thread blah blah
Hi I am using Hadoop 0.20.203. I have performed simple vertical scalability experiments of Hadoop with the use of Graph datasets and BFS algorithm. My experiment configuration is 20workers + Master. In each test I divided the Map slots and Reduce Slots equally (M==R), I can process the

MapReduce to load data in HBase

2013-02-07 Thread Panshul Whisper
Hello, I am trying to write MapReduce jobs to read data from JSON files and load it into HBase tables. Please suggest me an efficient way to do it. I am trying to do it using Spring Data Hbase Template to make it thread safe and enable table locking. I use the Map methods to read and parse the

Re: MapReduce to load data in HBase

2013-02-07 Thread Mohammad Tariq
Hello Panshul, My answers : 1- You can serialize the entire jSON into a byte[ ] and store it in a cell.(Is it important for you extract individual values from your JSON and then put them into the table?) 2- You can write your own datatype to pass your object to the reducer. But, it must be a

RE: Hive Metastore DB Issue ( Cloudera CDH4.1.2 MRv1 with

2013-02-07 Thread Viral Bajaria
hive-0.9.0-cdh4.1.2) MIME-Version: 1.0 Content-Type: multipart/alternative; boundary=bcaec502d692f132a604d5202931 --bcaec502d692f132a604d5202931 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable When you run a hive command like show tables or show databases, do

Re: MapReduce to load data in HBase

2013-02-07 Thread Mohammad Tariq
One correction. If your datatype is gonna be used just as values, you actually don't need it to be comparable. But if you need it to be a key as well, then it must be both. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Thu, Feb 7, 2013 at 4:58 PM, Mohammad Tariq

Re: MapReduce to load data in HBase

2013-02-07 Thread Panshul Whisper
Hello, Thank you for the reply. 1. I cannot serialize the Json and store it as a whole. I need to extract individual values and store them as later I need to query the stored values in various aggregation algorithms. 2. Can u please point me in direction where I can find out how to write a data

Re: MapReduce to load data in HBase

2013-02-07 Thread Mohammad Tariq
You might find these links helpful : http://stackoverflow.com/questions/10961474/how-in-hadoop-is-the-data-put-into-map-and-reduce-functions-in-correct-types/10965026#10965026

Re: MapReduce to load data in HBase

2013-02-07 Thread Damien Hardy
Hello, Why not using a PIG script for that ? make the json file available on HDFS Load with http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/builtin/JsonLoader.html Store with http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/backend/hadoop/hbase/HBaseStorage.html

why does OldCombinerRunner pass Reporter.NULL to the combiner instead of the real reporter?

2013-02-07 Thread Jim Donofrio
Is there a good reason why the OldCombinerRunner passes Reporter.NULL to the combiner instead of the actual TaskReporter? The NewCombinerRunner does use the TaskReporter when creating the context. If this is a bug I will submit a JIRA with a patch

Re: MapReduce to load data in HBase

2013-02-07 Thread Panshul Whisper
I am using the Map Reduce approach. I was looking into AVRO to create my own custom Data types to pass from Mapper to Reducer. With Avro I need to maintain the schema for all the types of Jason files I am receiving and since there will be many different map reduce methods running, so a different

Hadoop Cluster: System level Checklist??

2013-02-07 Thread sara raji
Hi All, Can any one list me the mandatory system level check (ulimit,firewall,selinux...) before starting a hadoop cluster. Regards Sathish

Re: More information regarding the Project suggestions given on the Hadoop website

2013-02-07 Thread Robert Evans
This conversation is probably better for common-user@ so I am moving it over there, I put common-dev@ in the BCC. I am not really sure what you mean by validate. I assume you want to test that your library does what you want it to do. I would start out with unit tests to validate the individual

Using hadoop streaming with binary data

2013-02-07 Thread Jay Hacker
Is it possible to pass unmolested binary data through a map-only streaming job from the command line? I.e., is there a way to avoid extra tabs and newlines in the output? I don't need input splits or key/value pairs, I just want one whole input file fed unmodified into a program, and its output

Re: Creating files through the hadoop streaming interface

2013-02-07 Thread Simone Leo
Hello, the lack of an HDFS API is just one of the drawbacks that motivated us to abandon Streaming and develop Pydoop. Unfortunately, in the blog post cited by Harsh J, Pydoop is just briefly mentioned because the author failed to build and install it. Here is how you solve your problem in

Re: Hadoop Cluster: System level Checklist??

2013-02-07 Thread Nitin Pawar
when you start hadoop its always better to set ulimit -n we normally disable selinux and iptables in our hadoop cluser and then put network level security enclosing entire cluster with single network security level On Thu, Feb 7, 2013 at 8:36 PM, sara raji sa848...@gmail.com wrote: Hi All,

Re: why does OldCombinerRunner pass Reporter.NULL to the combiner instead of the real reporter?

2013-02-07 Thread Harsh J
I agree its a bug if there is a discrepancy between the APIs (we are supposed to be supporting both for the time being). Please do file a JIRA with a patch - there shouldn't be any harm in re-passing the reporter object within the combiner. On Thu, Feb 7, 2013 at 7:10 PM, Jim Donofrio

Re: QuickSort array out of bound exception during Secondary Sort

2013-02-07 Thread Robert Evans
Without access to org.skyz.basic.KeyComparator.compare it is hard to say why this is happening. It looks like you got an array with 0 entries and the given function could not handle that. I don't really know why. --Bobby From: Aseem Anand aseem.ii...@gmail.commailto:aseem.ii...@gmail.com

Re: Secondary Sort example error

2013-02-07 Thread Ravi Chandran
hi, it is Hadoop 2.0.0-cdh4.1.1. the whole output is given below: Hadoop 2.0.0-cdh4.1.1 Subversion file:///data/1/jenkins/workspace/generic-package-centos32-6/topdir/BUILD/hadoop-2.0.0-cdh4.1.1/src/hadoop-common-project/hadoop-common-r 581959ba23e4af85afd8db98b7687662fe9c5f20 On Fri, Feb 8,

Re: Secondary Sort example error

2013-02-07 Thread Harsh J
Thanks, I managed to correlate proper line numbers. Are you using some form of custom serialization in your job code? That is, are your keys non-Writable types and are of some other type? The specific NPE is arising from the SerializationFactory not being able to find a serializer for your

Re: Secondary Sort example error

2013-02-07 Thread Preeti Khurana
unsubscribe On 08/02/13 12:19 AM, Harsh J ha...@cloudera.com wrote: The JIRA https://issues.apache.org/jira/browse/MAPREDUCE-2584 should help such cases, if what I speculated above is indeed the case. On Fri, Feb 8, 2013 at 12:16 AM, Harsh J ha...@cloudera.com wrote: Thanks, I managed to

Re: KerberosUtil in hadoop 0.23

2013-02-07 Thread Viral Bajaria
Thanks for the clarification. I did pull the file from 2.0 into my fork and was able to compile. I was going to submit the patch but given that it's already been pulled in I will just drop my fork for now. -Viral On Thu, Feb 7, 2013 at 9:09 AM, Robert Evans ev...@yahoo-inc.com wrote: 0.23 is

Re: Hive Metastore DB Issue ( Cloudera CDH4.1.2 MRv1 with hive-0.9.0-cdh4.1.2)

2013-02-07 Thread Suresh Srinivas
Please only use CDH mailing list and do not copy this to hdfs-user. On Thu, Feb 7, 2013 at 7:20 AM, samir das mohapatra samir.help...@gmail.com wrote: Any Suggestion... On Thu, Feb 7, 2013 at 4:17 PM, samir das mohapatra samir.help...@gmail.com wrote: Hi All, I could

Re: Hive Metastore DB Issue ( Cloudera CDH4.1.2 MRv1 with

2013-02-07 Thread Konstantin Boudnik
Don't cross post to irrelevant (user@hadoop) forums. On Thu, Feb 07, 2013 at 03:30AM, Viral Bajaria wrote: hive-0.9.0-cdh4.1.2) MIME-Version: 1.0 Content-Type: multipart/alternative; boundary=bcaec502d692f132a604d5202931 --bcaec502d692f132a604d5202931 Content-Type: text/plain;

Re: xcievers

2013-02-07 Thread George Datskos
Patai, I am still curious, how do we monitor the consumption of this value in each datanode. You can use the getDataNodeStats() method of your your DistributedFileSystem instance. It returns an array of DatanodeInfo which contains, among other things, the xceiver count that you are

Re: Hive Metastore DB Issue ( Cloudera CDH4.1.2 MRv1 with hive-0.9.0-cdh4.1.2)

2013-02-07 Thread samir das mohapatra
Hi Suresh, Thanks for advice, why you are so monopoly, You shoul not be. Problem is solution not problem. Note: I am looking for any user does not matter bcz it is common use Scenario. On Fri, Feb 8, 2013 at 3:31 AM, Suresh Srinivas sur...@hortonworks.comwrote: Please only use CDH