hive version:0.10.0
hive> from testpoker select
transform(ldate,ltime,threadid,gameid,userid,pid,roundbet,fold,allin,cardtype,cards,chipwon)
using 'calcpoker.py' as
ldate,gameid,userid,pid,win,fold,allin,cardtype,cards ;
03/13/13 1009 185690475 8639 0 1 0 -1 NULL NULL NULL NULL
NULL NULL NULL NULL NULL
03/13/13 1009 187270278 92030 0 1 0 -1 NULL NULL NULL NULL
NULL NULL NULL NULL NULL
03/13/13 1009 184151687 8639 0 1 0 -1 NULL NULL NULL NULL
NULL NULL NULL NULL NULL
03/13/13 1009 186012530 8593 1 0 1 7 8|21|16|42|39 NULL NULL
NULL NULL NULL NULL NULL NULL
03/13/13 1009 180286243 92041 0 1 0 -1 NULL NULL NULL NULL
NULL NULL NULL NULL NULL
the last 8 NULLs is wrong data that I unexpected. I don't know where them
come from and how to get rid of them. pls give me some advice. thanks.
Andy
-----
python file:
[hbase@h46 hive-0.10.0]$ cat calcpoker.py
#!/usr/bin/env python
# coding:utf8
import sys
import datetime
def calcwin():
for line in sys.stdin:
#line = line.strip()
(ldate,ltime,threadid,gameid,userid,pid,roundbet,fold,allin,cardtype,cards,chipwon)=line.strip().split()
win = '0'
if fold=='1':
print '%s %s %s %s %s %s %s %s
%s'%(ldate,gameid,userid,pid,win,fold,allin,cardtype,cards)
continue
cw = []
if chipwon == "NULL":
print '%s %s %s %s %s %s %s %s
%s'%(ldate,gameid,userid,pid,win,fold,allin,cardtype,cards)
continue
#print "userid win ",userid
cw=chipwon.split('|')
chipwonv=0
roundbetv=int(roundbet)
for v in cw:
chipwonv += int(v.split(':')[1])
#print "chipwonv:%d,roundbet:%d"%(chipwonv,roundbetv)
if chipwonv > roundbetv:
win = '1'
#print '
'.join([ldate,ltime,threadid,gameid,userid,pid,roundbet,fold,allin,cardtype,cards,chipwon])
print '
'.join([ldate,gameid,userid,pid,win,fold,allin,cardtype,cards])
#print '%s %s %s %s %s %s %s %s
%s'%(ldate,gameid,userid,pid,win,fold,allin,cardtype,cards)
calcwin()
I test outside of the mapreduce, it's ok:
[hbase@h46 hive-0.10.0]$ ./calcpoker.py
03/13/13 14:59:51 00000ab4 1009 185690475 8639 240 1 0 -1 NULL NULL
03/13/13 14:59:51 00000cb4 1009 187270278 92030 600 1 0 -1 NULL NULL
03/13/13 14:59:52 000003d8 1009 184151687 8639 600 1 0 -1 NULL NULL
03/13/13 14:59:52 00000ba8 1009 186012530 8593 154135 0 1 7 8|21|16|42|39
0:73250|1:60500|2:100135
03/13/13 14:59:52 00000a88 1009 180286243 92041 100 1 0 -1 NULL NULL
03/13/13 1009 185690475 8639 0 1 0 -1 NULL
03/13/13 1009 187270278 92030 0 1 0 -1 NULL
03/13/13 1009 184151687 8639 0 1 0 -1 NULL
03/13/13 1009 186012530 8593 1 0 1 7 8|21|16|42|39
03/13/13 1009 180286243 92041 0 1 0 -1 NULL
the begin five lines is the source data on hdfs.
the last five lines is the result that calcpoker printed.