Re: ORC file across multiple HDFS blocks

2015-04-28 Thread Demai Ni
> .. Owen > > On Tue, Apr 28, 2015 at 11:02 AM, Demai Ni wrote: > >> Alan and Grant, >> >> many thanks. Grant's comment is exact on the point that I am exploring. >> >> A bit background here. I am working on a MPP way to read ORC files >> through

Re: ORC file across multiple HDFS blocks

2015-04-28 Thread Demai Ni
Alan and Grant, many thanks. Grant's comment is exact on the point that I am exploring. A bit background here. I am working on a MPP way to read ORC files through this C++ API (https://github.com/hortonworks/orc) by Owen and team. The MPP mechanism is using one(or several) independent process per

ORC file across multiple HDFS blocks

2015-04-24 Thread Demai Ni
hi, Guys, I am working on directly READ ORC files from HDFS cluster, and hopefully to leverage HDFS local shortcuit READ ( http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ShortCircuitLocalReads.html) as much as possible According to ORC design, each ORC file usually contain

load TPCH HBase tables through Hive

2015-03-02 Thread Demai Ni
hi, folks, I am using the HBaseintergration feature from hive ( https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration) to load TPCH tables into HBase. Hive 0.13 and HBase 0.98.6. The load works well. However, as documented here: https://cwiki.apache.org/confluence/display/Hive/HBaseIn