Yes, you can write your own custom mapper to do conversions (look at CsvToKeyValueMapper, CsvUpsertExecutor#createConversionFunction) or consider using chaining of jobs(where the first Job with multiple inputs standardizing the date format followed by CSVBulkLoadTool) or writing a custom TextInputFormat for standardizing the date format input to CSVBulkLoadTool or etc.
On Thu, Sep 7, 2017 at 1:37 AM, Sriram Nookala <[email protected]> wrote: > I'm still trying to set those up in Amazon EMR. However, setting the ` > phoenix.query.dateFormatTimeZone` wouldn't fix the issue for all files > since we could receive a different date format in some other type of files. > Is there an option to write a custom mapper to transform the date? > > On Tue, Sep 5, 2017 at 2:50 PM, Josh Elser <[email protected]> wrote: > >> Sriram, >> >> Did you set the timezone and date-format configuration properties >> correctly for your environment? >> >> See `phoenix.query.dateFormatTimeZone` and `phoenix.query.dateFormat` as >> described http://phoenix.apache.org/tuning.html >> >> >> On 9/5/17 2:05 PM, Sriram Nookala wrote: >> >>> I'm trying to bulkload data using the CsvBulkLoadTool, one of the >>> columns is a data in the format MMMMYYDD for example 20160912. I don't get >>> an error, but the parsing is wrong and when I use sqlline I see the date >>> show up as 20160912-01-01 00:00:00.000. I had assumed as per the fix for >>> https://issues.apache.org/jira/browse/PHOENIX-1127 all data values >>> would be parsed correctly. >>> >> >
