Re: Using Phoenix Bulk Upload CSV to upload 200GB data

Gabriel Reid Wed, 16 Sep 2015 11:47:20 -0700

Can you view (and post) the job counters values from the import job?
These should be visible in the job history server.


Also, did you see the import tool exit successfully (in the terminal
where you started it?)

- Gabriel

On Wed, Sep 16, 2015 at 6:24 PM, Gaurav Kanade <[email protected]> wrote:
> Hi guys
>
> I was able to get this to work after using bigger VMs for data nodes;
> however now the bigger problem I am facing is after my MR job completes
> successfully I am not seeing any rows loaded in my table (count shows 0 both
> via phoenix and hbase)
>
> Am I missing something simple ?
>
> Thanks
> Gaurav
>
>
> On 12 September 2015 at 11:16, Gabriel Reid <[email protected]> wrote:
>>
>> Around 1400 mappers sounds about normal to me -- I assume your block
>> size on HDFS is 128 MB, which works out to 1500 mappers for 200 GB of
>> input.
>>
>> To add to what Krishna asked, can you be a bit more specific on what
>> you're seeing (in log files or elsewhere) which leads you to believe
>> the data nodes are running out of capacity? Are map tasks failing?
>>
>> If this is indeed a capacity issue, one thing you should ensure is
>> that map output comression is enabled. This doc from Cloudera explains
>> this (and the same information applies whether you're using CDH or
>> not) -
>> http://www.cloudera.com/content/cloudera/en/documentation/cdh4/latest/CDH4-Installation-Guide/cdh4ig_topic_23_3.html
>>
>> In any case, apart from that there isn't any basic thing that you're
>> probably missing, so any additional information that you can supply
>> about what you're running into would be useful.
>>
>> - Gabriel
>>
>>
>> On Sat, Sep 12, 2015 at 2:17 AM, Krishna <[email protected]> wrote:
>> > 1400 mappers on 9 nodes is about 155 mappers per datanode which sounds
>> > high
>> > to me. There are very few specifics in your mail. Are you using YARN?
>> > Can
>> > you provide details like table structure, # of rows & columns, etc. Do
>> > you
>> > have an error stack?
>> >
>> >
>> > On Friday, September 11, 2015, Gaurav Kanade <[email protected]>
>> > wrote:
>> >>
>> >> Hi All
>> >>
>> >> I am new to Apache Phoenix (and relatively new to MR in general) but I
>> >> am
>> >> trying a bulk insert of a 200GB tar separated file in an HBase table.
>> >> This
>> >> seems to start off fine and kicks off about ~1400 mappers and 9
>> >> reducers (I
>> >> have 9 data nodes in my setup).
>> >>
>> >> At some point I seem to be running into problems with this process as
>> >> it
>> >> seems the data nodes run out of capacity (from what I can see my data
>> >> nodes
>> >> have 400GB local space). It does seem that certain reducers eat up most
>> >> of
>> >> the capacity on these - thus slowing down the process to a crawl and
>> >> ultimately leading to Node Managers complaining that Node Health is bad
>> >> (log-dirs and local-dirs are bad)
>> >>
>> >> Is there some inherent setting I am missing that I need to set up for
>> >> the
>> >> particular job ?
>> >>
>> >> Any pointers would be appreciated
>> >>
>> >> Thanks
>> >>
>> >> --
>> >> Gaurav Kanade,
>> >> Software Engineer
>> >> Big Data
>> >> Cloud and Enterprise Division
>> >> Microsoft
>
>
>
>
> --
> Gaurav Kanade,
> Software Engineer
> Big Data
> Cloud and Enterprise Division
> Microsoft

Re: Using Phoenix Bulk Upload CSV to upload 200GB data

Reply via email to