I read CSV files into a table using mysqlimport --fields-terminated-by=',' --replace .....
That would work fine and, by using the appropriate unique index, it would deal with the fact that some of the data appears in multiple CSV files. However, though the same data is in several files, the date format is not the same in all of them. The result is that there are duplicate records in the resulting table without violating unique keys. I devised a way of adding an additional column and converting the dates into a "proper" date format thereby making the duplicates detectable. There are several ways I can get rid of them: a) Write out the table using SELECT * DISTINCT INTO OUTFILE with the appropriate GROUP BY. Then truncate the table and read back from the OUTFILE using LOAD DATA INFILE. b) Create a new table with the same structure but with a slightly different multiple-column unique key. Use REPLACE INTO the new table and the new key will remove the duplicates. Truncate the original table, and copy back to the records from the interim table. There's a major hassle with method a). The OUTFILE has to be removed before the same file name can be used again and that requires access privileges to the database directory. I'm not entirely happy with method b) either since it is not particularly scalable nor generic. I'd have thought this would be an FAQ, but it appears not so from my searching. I normally work with a language called S (dialect R) which handles matrices in a multitude of ways. Removing duplicates is straightforward in that language, but from what I know about SQL so far, it is rather complicated in MySQL. What do other people do with duplicates? TIA -- ___ Patrick Connolly {~._.~} _( Y )_ Good judgment comes from experience (:_~*~_:) Experience comes from bad judgment (_)-(_) -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe: http://lists.mysql.com/[EMAIL PROTECTED]