James,

Looks like I'm on the right track, however I'm not sure why it is not accepting 
my delimiters.  I am using the TPC-H data set, so for instance here is what a 
line from customer.csv looks like:


6967|Customer#000006967|uMPce8nER9v3PCIcsZmNlSrCKcau6tJd4qe|13|23-816-949-8373|7865.21|MACHINERY|r
 pinto beans. regular multipliers detect carefully. carefully final 
instructions affix quickly. packages boost af|

When I try to import the csv file into my table "CUSTOMER", it looks like psql 
is not liking the delimiters I pass in.  If I use the 3 numbers like in the 
usage below, I just get a wrong format error, but it at least attempts to 
import the data.  Any thoughts?


./psql.sh -t CUSTOMER -h 
C_CUSTKEY,C_NAME,C_ADDRESS,C_NATIONKEY,C_PHONE,C_ACCTBAL,C_MKTSEGMENT,C_COMMENT 
-d | localhost:2181 customer.csv







Usage: psql [-t table-name] [-h comma-separated-column-names | in-line] [-d 
field-delimiter-char quote-char escape-char]<zookeeper>  
<path-to-sql-or-csv-file>...

  By default, the name of the CSV file is used to determine the Phoenix table 
into which the CSV data is loaded

  and the ordinal value of the columns determines the mapping.

  -t overrides the table into which the CSV data is loaded

  -h overrides the column names to which the CSV data maps

     A special value of in-line indicating that the first line of the CSV file

     determines the column to which the data maps.

  -s uses strict mode by throwing an exception if a column name doesn't match 
during CSV loading.

  -d uses custom delimiters for CSV loader, need to specify single char for 
field delimiter, phrase delimiter, and escape char.

     number is NOT usually a delimiter and shall be taken as 1 -> ctrl A, 2 -> 
ctrl B ... 9 -> ctrl I.

Examples:

  psql localhost my_ddl.sql

  psql localhost my_ddl.sql my_table.csv

  psql -t my_table my_cluster:1825 my_table2012-Q3.csv

  psql -t my_table -h col1,col2,col3 my_cluster:1825 my_table2012-Q3.csv

  psql -t my_table -h col1,col2,col3 -d 1 2 3 my_cluster:1825 
my_table2012-Q3.csv





Thanks



From: Devin Pinkston [mailto:[email protected]]
Sent: Thursday, February 06, 2014 8:41 AM
To: [email protected]
Subject: RE: Import Delimiter

James,

Interesting thanks for the info.  So if I were to import data containing pipe 
delimiters, I would have to use the non map-reduce bulk loader.  Are you 
referencing that sqlline would have to be used?

Sorry I am trying to figure out how I can import these large flat files this 
way.

Thank you.

From: James Taylor [mailto:[email protected]]
Sent: Wednesday, February 05, 2014 8:25 PM
To: [email protected]<mailto:[email protected]>
Subject: Re: Import Delimiter

You're right. It was added to the non map-reduce bulk loader. This is the 
loader that loads local CSV files through the bin/psql.sh script. There's a -d 
option that was added in this pull request[1]. It would be nice to add this 
same functionality to our csv map-reduce bulk loader too if anyone is 
interested.
Thanks,
James

[1] https://github.com/forcedotcom/phoenix/pull/514
On Wed, Feb 5, 2014 at 9:35 AM, Nick Dimiduk 
<[email protected]<mailto:[email protected]>> wrote:
Hi James,

I'm looking through the bulkload job, and it looks to me light this isn't 
configurable at the moment. Have a look at 
https://github.com/apache/incubator-phoenix/blob/master/phoenix-core/src/main/java/org/apache/phoenix/map/reduce/MapReduceJob.java#L136

Is there something I'm missing? Perhaps I'm looking in the wrong place?

Thanks,
Nick

On Wed, Feb 5, 2014 at 10:16 AM, Devin Pinkston 
<[email protected]<mailto:[email protected]>> wrote:
James,

Thanks for the quick response.  Do you know what the argument or command is to 
pass in?

For instance ./csv-bulk-loader.sh -delimiter '|'

Thanks

From: James Taylor 
[mailto:[email protected]<mailto:[email protected]>]
Sent: Wednesday, February 05, 2014 11:51 AM
To: [email protected]<mailto:[email protected]>
Subject: Re: Import Delimiter

Hello,
The CSV map-reduce based bulk loader supports custom delimiters. Might need to 
be doc-ed, though.
Thanks,
James

On Wednesday, February 5, 2014, Devin Pinkston 
<[email protected]<mailto:[email protected]>> wrote:

Hello,



I am trying to import data into HBASE however I have '|' or pipe delimiters=  
in my file instead of commas.  I don't see a way to pass in a different 
separator/delimiter with the jar.  What would be the best way to import data = 
like this?



Thanks


The information contained in this transmission may contain privileged and 
confidential information.
It is intended only for the use of the person(s) named above.
If you are not the intended recipient, you are hereby notified that any review, 
dissemination, distribution or duplication of this communication is strictly 
prohibited.
If you are not the intended recipient, please contact the sender by reply 
e-mail and destroy all copies of the original message.
Technica Corporation does not represent this e-mail to be free from any virus, 
fault or defect and it is therefore the responsibility of the recipient to 
first scan it for viruses, faults and defects.
To reply to our e-mail administrator directly, please send an e-mail to 
[email protected]<mailto:[email protected]>. Thank you.

The information contained in this transmission may contain privileged and 
confidential information.
It is intended only for the use of the person(s) named above.
If you are not the intended recipient, you are hereby notified that any review, 
dissemination, distribution or duplication of this communication is strictly 
prohibited.
If you are not the intended recipient, please contact the sender by reply 
e-mail and destroy all copies of the original message.
Technica Corporation does not represent this e-mail to be free from any virus, 
fault or defect and it is therefore the responsibility of the recipient to 
first scan it for viruses, faults and defects.
To reply to our e-mail administrator directly, please send an e-mail to 
[email protected]<mailto:[email protected]>. Thank you.



The information contained in this transmission may contain privileged and 
confidential information.
It is intended only for the use of the person(s) named above.
If you are not the intended recipient, you are hereby notified that any review, 
dissemination, distribution or duplication of this communication is strictly 
prohibited.
If you are not the intended recipient, please contact the sender by reply 
e-mail and destroy all copies of the original message.
Technica Corporation does not represent this e-mail to be free from any virus, 
fault or defect and it is therefore the responsibility of the recipient to 
first scan it for viruses, faults and defects.
To reply to our e-mail administrator directly, please send an e-mail to 
[email protected]<mailto:[email protected]>. Thank you.
The information contained in this transmission may contain privileged and 
confidential information. 
It is intended only for the use of the person(s) named above. 
If you are not the intended recipient, you are hereby notified that any review, 
dissemination, distribution or duplication of this communication is strictly 
prohibited. 
If you are not the intended recipient, please contact the sender by reply 
e-mail and destroy all copies of the original message. 
Technica Corporation does not represent this e-mail to be free from any virus, 
fault or defect and it is therefore the responsibility of the recipient to 
first scan it for viruses, faults and defects. 
To reply to our e-mail administrator directly, please send an e-mail to 
[email protected]. Thank you.

Reply via email to