Thanks Nick, I will open a JIRA request for both Phoenix and Hbase. Also I will chip in and will contribute whatever I can :)
Thanks, Siva. On Thu, Feb 12, 2015 at 11:10 AM, Nick Dimiduk <[email protected]> wrote: > Custom line separator is a reasonable request. Please open JIRAs for HBase > and/or Phoenix import tools -- and provide a patch, if you're feeling > generous ;) > > On Thu, Feb 12, 2015 at 10:39 AM, Siva <[email protected]> wrote: > >> Hi Gabriel, >> >> Having special character as line separator other than (\n) does not work >> with even Hbase ImportTsv. But I found something richImportTsv in git. >> >> https://github.com/kawaa/RichImportTsv >> >> But it is 3 years old, was implemented by using old APIs. We should take >> a step to rewrite with new API. >> >> Thanks, >> Siva. >> >> On Wed, Feb 11, 2015 at 11:40 PM, Gabriel Reid <[email protected]> >> wrote: >> >>> Hi Siva, >>> >>> Handling multi-line records with the Bulk CSV Loader (i.e. >>> MapReduce-based loader) definitely won't support records split over >>> multiple input lines. It could be that loading via PSQL (as described >>> on http://phoenix.apache.org/bulk_dataload.html) will allow multi-line >>> records, as this might be supported by the underlying CSV parsing >>> library (commons-csv), although I'm not sure. In any case, I can't >>> really give you any advice on how to make it work there if it isn't >>> working right now. >>> >>> I assume this also won't work in HBase's ImportTsv. >>> >>> - Gabriel >>> >>> >>> On Thu, Feb 5, 2015 at 10:28 PM, Siva <[email protected]> wrote: >>> > We have table contains a NOTE column, this column contains lines of >>> text >>> > separated by new lines. When I export the data from .csv through >>> bulkloader, >>> > Phoenix is failing with error and Hbase terminates the text till >>> encounters >>> > the new line and assumes rest of NOTE as new record. >>> > >>> > >>> > >>> > Is there a way to specify new line separator in Hbase or Phoenix bulk >>> load? >>> > >>> > >>> > >>> > With phoenix: >>> > >>> > >>> > >>> > >>> HADOOP_CLASSPATH=/usr/hdp/2.2.0.0-2041/hbase/lib/hbase-protocol.jar:/usr/hdp/2.2.0.0-2041/hbase/conf >>> > hadoop jar >>> > /usr/hdp/2.2.0.0-2041/phoenix/phoenix-4.2.0.2.2.0.0-2041-client.jar >>> > org.apache.phoenix.mapreduce.CsvBulkLoadTool --table test_leadwarehouse >>> > --input /user/sbhavanari/test_leadwarehouse.csv --zookeeper <zookeeper >>> > Ip>:2181:/hbase >>> > >>> > >>> > >>> > With hbase importtsv: >>> > >>> > >>> > >>> > base org.apache.hadoop.hbase.mapreduce.ImportTsv >>> '-Dimporttsv.separator=,' >>> > -Dimporttsv.columns=<col_list> test_leadwarehouse >>> > /user/data/test_leadwarehouse.csv >>> >> >> >
