Googled, but didnot find any sample code.
On Fri, Jan 22, 2016 at 9:50 AM, Rex X wrote:
> The two SequentialTextFiles correspond to two Hive tables, say tableA and
> tableB below on
>
> hdfs://hive/tableA//MM/DD/*/part-0
> and
> hdfs://hive/tableB//
The two SequentialTextFiles correspond to two Hive tables, say tableA and
tableB below on
hdfs://hive/tableA//MM/DD/*/part-0
and
hdfs://hive/tableB//MM/DD/*/part-0
Both of them are partitioned by date, for example,
hdfs://hive/tableA/2016/01/01/*/part-0
Now we wa
have any information about your data.
>
> I don't think we can help you with this. Also, I cannot understand what
> you are trying to achieve. Please also tell us why you are using hadoop
> streaming instead of hive to do your operations.
>
> Regards,
> LLoyd
>
> O
The given sequential files correspond to an external Hive table.
They are stored in
/tableName/part-0
/tableName/part-1
...
There are about 2000 attributes in the table. Now I want to process the
data using Hadoop streaming and mapReduce. The first step is to find the
offset and length fo
Hi Camusensei,
Thank you. That's very helpful!
Rex
On Thu, Jan 21, 2016 at 1:41 AM, Namikaze Minato
wrote:
> Hi Rex X,
>
> We are using the -outputFormat option of hadoop-streaming.
> Here is the detail: http://www.infoq.com/articles/HadoopOutputFormat
>
> Regards,
t;
> .
>
> Regards
> Rohit Sarewar
>
>
> On Thu, Jan 21, 2016 at 5:13 AM, Rex X wrote:
>
>> Dear all,
>>
>> To be specific, for example, given
>>
>> hadoop jar hadoop-streaming.jar \
>> -input myInputDirs \
>> -output
Dear all,
To be specific, for example, given
hadoop jar hadoop-streaming.jar \
-input myInputDirs \
-output myOutputDir \
-mapper /bin/cat \
-reducer /usr/bin/wc
Where myInputDirs has a *dated* subfolder structure of
/input_dir//mm/dd/part-*
I want myOutp