[ 
http://issues.apache.org/jira/browse/SOLR-66?page=comments#action_12447918 ] 
            
Fuad Efendi commented on SOLR-66:
---------------------------------

CSV:
- should we support standard CSVs generated by Excel, Oracle DataPump, etc?

XML: we currently preprocess some data to create XML, then we post it to SOLR.

Can we preprocess standard CSV? For instance, we have two tables: CATEGORY 
(parent), PRODUCT (child)
CSV produced by Oracle might seem like

001,IBM,001,17R7021,14 7/8 X 8 1/2" - 1/2" Greenbar
001,IBM,002,17R8018,8 1/2 x 11" Micro Perf @ 3 2/3"

Here, [001,Paper] is a single record from CATEGORY table, and rest is PK, SKU, 
NAME from PRODUCT table.

1. Use 'extended' CSV such as
001,Paper,multi-value:"001,17R7021,14 7/8 X 8 1/2"" - 1/2"" 
Greenbar002,17R8018,8 1/2 x 11"" Micro Perf @ 3 2/3"""
(multi-value:"<comma separated>,...")
- very difficult... and not compatible with exported data...

2. Standard CSV with fixed width + preprocessing (sorting, and removing 
repeated values)

001,Paper,001,17R7021,14 7/8 X 8 1/2" - 1/2" Greenbar
001,,002,17R8018,8 1/2 x 11" Micro Perf @ 3 2/3"


We removed repeated value 'Paper', but we left Primary Key of this Category 
intact... It should work with both, standard 'large' CSV and preprocessed 
one... And, we don't have huge single line in case of IBM producing different 
kinds of paper...; we have multi-line with fixed width... First column 
(repeated 001 value) is primary key, same as  <field name="id">001</field>


> bulk data loader
> ----------------
>
>                 Key: SOLR-66
>                 URL: http://issues.apache.org/jira/browse/SOLR-66
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Yonik Seeley
>         Assigned To: Yonik Seeley
>
> A way to efficiently load simple formatted text files, including CSV files.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to