Hi guys,

I need to create a UDF that defines custom load location for example:

before attempting UDF i tried to do parameter substitution inside of pig script which does not work:
--myscript.pig
time = LOAD 'hdfs:/home/raw/report/last_process_time/part-r-00000' AS DATE;
start_ts = foreach time generate startTS(DATE);
raw = LOAD '/home/raw/report/$END' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS (json:map[]);

run -param PATH='/home/raw/reports/$END/*' hdfs:/home/ridwan/pig-script/update_test.pig

expecting PATH would become the content of start_ts.

so here's an attempt to a solution that i have in mind:
- creating a customLoad() UDF that accept a tuple as input:
    -constructor
    public customLoad(Tuple input) throws ExecException {
        String str = input.get(0).toString();
Date date = new Date(((Long.parseLong(str) * 1000)) + (60 * 60 * 1000));
        SimpleDateFormat sdf = new SimpleDateFormat("YYYY/MM/dd/HH");
        newpath = sdf.format(date);
    }

and updating path's location assuming default location is /home/raw/report

@Override
    public void setLocation(String location, Job job) throws IOException {
        FileInputFormat.setInputPaths(job, location + newpath + "/*");
    }

raw = LOAD '/home/raw/report/' USING customLoad(start_ts);

But this gives me an error:
ERROR 1200: <line 7, column 51> mismatched input 'start_ts' expecting RIGHT_PAREN

I wonder what have i done wrong?

Thanks alot,
Kenji

Reply via email to