I have some experiences using RCFile with new MapReduce API from the project HCatalog ( http://incubator.apache.org/hcatalog/ ).
For the output part, In your main, you need ... > job.setOutputFormatClass(RCFileMapReduceOutputFormat.class); > > RCFileMapReduceOutputFormat.setColumnNumber(job.getConfiguration(), >> numCols); // numCols is the total number of columns of your output table > > RCFileMapReduceOutputFormat.setOutputPath(job, new Path(outputPath)); > > RCFileMapReduceOutputFormat.setCompressOutput(job, true); > > The Map class would look like ... > public static class Map > > extends Mapper<Object, Text, NullWritable, BytesRefArrayWritable>{ > > private byte[] fieldData; > > private int numCols; > > private BytesRefArrayWritable bytes; > > @Override > > protected void setup(Context context) throws IOException, >> InterruptedException { > > numCols = >> context.getConfiguration().getInt("hive.io.rcfile.column.number.conf", 0); > > bytes = new BytesRefArrayWritable(numCols); > > } > > public void map(Object key, Text line, Context context > > ) throws IOException, InterruptedException { > > bytes.clear(); > > String[] cols = line.toString().split("\\|"); > > for (int i=0; i<numCols; i++){ > > fieldData = cols[i].getBytes("UTF-8"); > > BytesRefWritable cu = null; > > cu = new BytesRefWritable(fieldData, 0, fieldData.length); > > bytes.set(i, cu); > > } > > context.write(NullWritable.get(), bytes); > > } > > } > > Basically, you need to convert a row to a BytesRefArrayWritable object (which is bytes in above example). For the input part, I do not know how to use RCFileMapReduceInputFormat to write a MapReduce job for a join operation, so I customized a new InputFormat and RecordReader. You can find these two class (MultiRCFileMapReduceInputFormat and MultiRCFileMapReduceRecordReader) from http://www.cse.ohio-state.edu/~huai/RCFile/ . In this link, TestPrintTables.java is an example program that you can use it to convert tables in RCFile format to text. I hope that this example is self-explaining. If you need to Hope these can help you. Thanks, Yin On Wed, Dec 14, 2011 at 8:54 AM, Dominik Wiernicki <d...@touk.pl> wrote: > Hi, > > Can someone show me how to use RCfile in plain MapReduce job (as Input and > Output Format)? > Please. > > >