Might be a bug. Take a look at the CSVLoaderTest, as it has some testing around custom delimiters. Maybe add a test case with a sample line from your table to isolate the issue.
Patches welcome, of course. Thanks, James On Thu, Feb 6, 2014 at 6:12 AM, Devin Pinkston <[email protected]>wrote: > James, > > > > Looks like I'm on the right track, however I'm not sure why it is not > accepting my delimiters. I am using the TPC-H data set, so for instance > here is what a line from customer.csv looks like: > > > > 6967|Customer#000006967|uMPce8nER9v3PCIcsZmNlSrCKcau6tJd4qe|13|23-816-949-8373|7865.21|MACHINERY|r > pinto beans. regular multipliers detect carefully. carefully final > instructions affix quickly. packages boost af| > > > > When I try to import the csv file into my table "CUSTOMER", it looks like > psql is not liking the delimiters I pass in. If I use the 3 numbers like > in the usage below, I just get a wrong format error, but it at least > attempts to import the data. Any thoughts? > > > > ./psql.sh -t CUSTOMER -h > C_CUSTKEY,C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT > -d | localhost:2181 customer.csv > > > > > > > > Usage: psql [-t table-name] [-h comma-separated-column-names | in-line] > [-d field-delimiter-char quote-char escape-char]<zookeeper> > <path-to-sql-or-csv-file>... > > By default, the name of the CSV file is used to determine the Phoenix > table into which the CSV data is loaded > > and the ordinal value of the columns determines the mapping. > > -t overrides the table into which the CSV data is loaded > > -h overrides the column names to which the CSV data maps > > A special value of in-line indicating that the first line of the CSV > file > > determines the column to which the data maps. > > -s uses strict mode by throwing an exception if a column name doesn't > match during CSV loading. > > -d uses custom delimiters for CSV loader, need to specify single char > for field delimiter, phrase delimiter, and escape char. > > number is NOT usually a delimiter and shall be taken as 1 -> ctrl A, > 2 -> ctrl B ... 9 -> ctrl I. > > Examples: > > psql localhost my_ddl.sql > > psql localhost my_ddl.sql my_table.csv > > psql -t my_table my_cluster:1825 my_table2012-Q3.csv > > psql -t my_table -h col1,col2,col3 my_cluster:1825 my_table2012-Q3.csv > > psql -t my_table -h col1,col2,col3 -d 1 2 3 my_cluster:1825 > my_table2012-Q3.csv > > > > > > Thanks > > > > > > > > *From:* Devin Pinkston [mailto:[email protected]] > *Sent:* Thursday, February 06, 2014 8:41 AM > *To:* [email protected] > *Subject:* RE: Import Delimiter > > > > James, > > > > Interesting thanks for the info. So if I were to import data containing > pipe delimiters, I would have to use the non map-reduce bulk loader. Are > you referencing that sqlline would have to be used? > > > > Sorry I am trying to figure out how I can import these large flat files > this way. > > > > Thank you. > > > > *From:* James Taylor [mailto:[email protected]<[email protected]>] > > *Sent:* Wednesday, February 05, 2014 8:25 PM > *To:* [email protected] > *Subject:* Re: Import Delimiter > > > > You're right. It was added to the non map-reduce bulk loader. This is the > loader that loads local CSV files through the bin/psql.sh script. There's a > -d option that was added in this pull request[1]. It would be nice to add > this same functionality to our csv map-reduce bulk loader too if anyone is > interested. > > Thanks, > James > > > [1] https://github.com/forcedotcom/phoenix/pull/514 > > On Wed, Feb 5, 2014 at 9:35 AM, Nick Dimiduk <[email protected]> wrote: > > Hi James, > > > > I'm looking through the bulkload job, and it looks to me light this isn't > configurable at the moment. Have a look at > https://github.com/apache/incubator-phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/map/reduce/MapReduceJob.java#L136 > > > > Is there something I'm missing? Perhaps I'm looking in the wrong place? > > > > Thanks, > > Nick > > > > On Wed, Feb 5, 2014 at 10:16 AM, Devin Pinkston < > [email protected]> wrote: > > James, > > > > Thanks for the quick response. Do you know what the argument or command > is to pass in? > > > > For instance ./csv-bulk-loader.sh -delimiter '|' > > > > Thanks > > > > *From:* James Taylor [mailto:[email protected]] > *Sent:* Wednesday, February 05, 2014 11:51 AM > *To:* [email protected] > *Subject:* Re: Import Delimiter > > > > Hello, > > The CSV map-reduce based bulk loader supports custom delimiters. Might > need to be doc-ed, though. > > Thanks, > > James > > On Wednesday, February 5, 2014, Devin Pinkston <[email protected]> > wrote: > > Hello, > > > > I am trying to import data into HBASE however I have '|' or pipe > delimiters= in my file instead of commas. I don't see a way to pass in a > different separator/delimiter with the jar. What would be the best way to > import data = like this? > > > > Thanks > > > > The information contained in this transmission may contain privileged and > confidential information. > It is intended only for the use of the person(s) named above. > If you are not the intended recipient, you are hereby notified that any > review, dissemination, distribution or duplication of this communication is > strictly prohibited. > If you are not the intended recipient, please contact the sender by reply > e-mail and destroy all copies of the original message. > Technica Corporation does not represent this e-mail to be free from any > virus, fault or defect and it is therefore the responsibility of the > recipient to first scan it for viruses, faults and defects. > To reply to our e-mail administrator directly, please send an e-mail to > [email protected]. Thank you. > > The information contained in this transmission may contain privileged and > confidential information. > It is intended only for the use of the person(s) named above. > If you are not the intended recipient, you are hereby notified that any > review, dissemination, distribution or duplication of this communication is > strictly prohibited. > If you are not the intended recipient, please contact the sender by reply > e-mail and destroy all copies of the original message. > Technica Corporation does not represent this e-mail to be free from any > virus, fault or defect and it is therefore the responsibility of the > recipient to first scan it for viruses, faults and defects. > To reply to our e-mail administrator directly, please send an e-mail to > [email protected]. Thank you. > > > > > > The information contained in this transmission may contain privileged and > confidential information. > It is intended only for the use of the person(s) named above. > If you are not the intended recipient, you are hereby notified that any > review, dissemination, distribution or duplication of this communication is > strictly prohibited. > If you are not the intended recipient, please contact the sender by reply > e-mail and destroy all copies of the original message. > Technica Corporation does not represent this e-mail to be free from any > virus, fault or defect and it is therefore the responsibility of the > recipient to first scan it for viruses, faults and defects. > To reply to our e-mail administrator directly, please send an e-mail to > [email protected]. Thank you. > > The information contained in this transmission may contain privileged and > confidential information. > It is intended only for the use of the person(s) named above. > If you are not the intended recipient, you are hereby notified that any > review, dissemination, distribution or duplication of this communication is > strictly prohibited. > If you are not the intended recipient, please contact the sender by reply > e-mail and destroy all copies of the original message. > Technica Corporation does not represent this e-mail to be free from any > virus, fault or defect and it is therefore the responsibility of the > recipient to first scan it for viruses, faults and defects. > To reply to our e-mail administrator directly, please send an e-mail to > [email protected]. Thank you. >
