Re: Line separator option in Bulk loader

Siva Thu, 12 Feb 2015 11:50:59 -0800

Thanks Nick, I will open a JIRA request for both Phoenix and Hbase. Also I
will chip in and will contribute whatever I can :)


Thanks,
Siva.

On Thu, Feb 12, 2015 at 11:10 AM, Nick Dimiduk <[email protected]> wrote:

> Custom line separator is a reasonable request. Please open JIRAs for HBase
> and/or Phoenix import tools -- and provide a patch, if you're feeling
> generous ;)
>
> On Thu, Feb 12, 2015 at 10:39 AM, Siva <[email protected]> wrote:
>
>> Hi Gabriel,
>>
>> Having special character as line separator other than (\n) does not work
>> with even Hbase ImportTsv. But I found something richImportTsv in git.
>>
>> https://github.com/kawaa/RichImportTsv
>>
>> But it is 3 years old, was implemented by using old APIs. We should take
>> a step to rewrite with new API.
>>
>> Thanks,
>> Siva.
>>
>> On Wed, Feb 11, 2015 at 11:40 PM, Gabriel Reid <[email protected]>
>> wrote:
>>
>>> Hi Siva,
>>>
>>> Handling multi-line records with the Bulk CSV Loader (i.e.
>>> MapReduce-based loader) definitely won't support records split over
>>> multiple input lines. It could be that loading via PSQL (as described
>>> on http://phoenix.apache.org/bulk_dataload.html) will allow multi-line
>>> records, as this might be supported by the underlying CSV parsing
>>> library (commons-csv), although I'm not sure. In any case, I can't
>>> really give you any advice on how to make it work there if it isn't
>>> working right now.
>>>
>>> I assume this also won't work in HBase's ImportTsv.
>>>
>>> - Gabriel
>>>
>>>
>>> On Thu, Feb 5, 2015 at 10:28 PM, Siva <[email protected]> wrote:
>>> > We have table contains a NOTE column, this column contains lines of
>>> text
>>> > separated by new lines. When I export the data from .csv through
>>> bulkloader,
>>> > Phoenix is failing with error and Hbase terminates the text till
>>> encounters
>>> > the new line and assumes rest of NOTE as new record.
>>> >
>>> >
>>> >
>>> > Is there a way to specify new line separator in Hbase or Phoenix bulk
>>> load?
>>> >
>>> >
>>> >
>>> > With phoenix:
>>> >
>>> >
>>> >
>>> >
>>> HADOOP_CLASSPATH=/usr/hdp/2.2.0.0-2041/hbase/lib/hbase-protocol.jar:/usr/hdp/2.2.0.0-2041/hbase/conf
>>> > hadoop jar
>>> > /usr/hdp/2.2.0.0-2041/phoenix/phoenix-4.2.0.2.2.0.0-2041-client.jar
>>> > org.apache.phoenix.mapreduce.CsvBulkLoadTool --table test_leadwarehouse
>>> > --input /user/sbhavanari/test_leadwarehouse.csv --zookeeper <zookeeper
>>> > Ip>:2181:/hbase
>>> >
>>> >
>>> >
>>> > With hbase importtsv:
>>> >
>>> >
>>> >
>>> > base org.apache.hadoop.hbase.mapreduce.ImportTsv
>>> '-Dimporttsv.separator=,'
>>> > -Dimporttsv.columns=<col_list> test_leadwarehouse
>>> > /user/data/test_leadwarehouse.csv
>>>
>>
>>
>

Re: Line separator option in Bulk loader

Reply via email to