I am using HDInsight, and can issue table commands to my MongoDB system, but 
its not acting like i think it should.I realize that when i create an external 
table in hive, and give it mongodb properties, it is a shell of a table within 
hive, and that the real table resides in MongoDB. If I drop the hive table, it 
drops the shell, as it were, leaving the table and data in mongo alone.I have 
read that I can truncate said data in MongoDB from HIVE, but have never gotten 
this to work.I am using the following JARS to interact with Mongo from 
HDInsight via HIVE HQL Queries.   ADD JAR  
wasb://j...@ilreportdata.blob.core.windows.net/mongo-java-driver-2.13.0.jar; 
ADD JAR  
wasb://j...@ilreportdata.blob.core.windows.net/mongo-hadoop-core-1.3.2.jar; ADD 
JAR  
wasb://j...@ilreportdata.blob.core.windows.net/mongo-hadoop-hive-1.3.2.jar; I 
can issue a create if not exists command on a table and give it a schema. If 
the table already exists in mongo, then all is wellIf the table, sorry, 
collection, does not exist, then i get no error from hive system when i create 
the table. But if i query the table in any way, i get errors.So i have taken to 
issuing a create if not exists on the table from hive, then i drop the table, 
which removes the mongo table and the hive table, then i recreate the table in 
hive.The table doesnt actually exist in mongo yet, until data populates the 
collection.
So, to ensure that my collections get unfilled and filled, i now ensure that en 
empty collection exists in mongo, then issue a create statement from hive, then 
drop, then create, then fill with data.And i can run this over and over and be 
fine. As long as the collection exists in mongo, it will continue to empty and 
fill with new data when process is ran.
What am i doing wrong.


------Some code
 $MongoTable = "StudentUsage" switch ($Environment) { 'Test' { $TBLPROPERTIES = 
"TBLPROPERTIES( 'mongo.uri' = 
'mongodb://appAdmin:passw...@test.servers.mongodirector.com:27017/test_reports.$($MongoTable)?authsource=admin&replicaSet=RS-Test-0'
 );" } 'Prod' { $TBLPROPERTIES = "TBLPROPERTIES( 'mongo.uri' = 
'mongodb://appAdmin:passw...@production.servers.mongodirector.com:27017/production_reports.$($MongoTable)?authsource=admin&replicaSet=RS-Production-0');"
 } 'RC' { $TBLPROPERTIES = "TBLPROPERTIES( 'mongo.uri' = 
'mongodb://appAdmin:passw...@rc.servers.mongodirector.com:27017/rc_reports.$($MongoTable)?authsource=admin');"
 } 'Demo' { $TBLPROPERTIES = "TBLPROPERTIES( 'mongo.uri' = 
'mongodb://appAdmin:passw...@demo.servers.mongodirector.com:27017/demo_reports.$($MongoTable)?authsource=admin');"
 } } #The location of the flattened json documents that contain student data 
$DocTable = $Environment + "Docs" $JSonLocation = 
"wasb://d...@blob.core.windows.net/StudentData/$DocTable"; #Hive table that 
will pull out Activity Counted data about students $DocsActivityCounted = 
$Environment + "DocsActivityCounted"; #Hive table that will have properties 
indicating it is a Mongodb table $StudentUsage = $Environment + "StudentUsage"; 
$StudentUsageH = $Environment + "StudentUsageHive";
 $Query = "
 ADD JAR 
wasb://j...@ilreportdata.blob.core.windows.net/json-serde-1.3-jar-with-dependencies.jar;
 ADD JAR 
wasb://j...@ilreportdata.blob.core.windows.net/mongo-java-driver-2.13.0.jar; 
ADD JAR 
wasb://j...@ilreportdata.blob.core.windows.net/mongo-hadoop-core-1.3.2.jar; ADD 
JAR wasb://j...@ilreportdata.blob.core.windows.net/mongo-hadoop-hive-1.3.2.jar; 
--This will create a HIVE table that has properties that make it a Mongo table. 
create table if not exists $StudentUsage  ( studentid string, collectedyear 
int, collectedweek int, elapsedseconds float ) stored by 
'com.mongodb.hadoop.hive.MongoStorageHandler' $TBLPROPERTIES select count(*) 
from $StudentUsage; --But we need to drop it after it was created, so that the 
mongo table actually is dropped. drop table if exists $StudentUsage; --Then we 
need to create it again, so that it is empty and ready to be filled by the 
insertoverwrite, which only fills, not overwrites. create table if not exists 
$StudentUsage ( studentid string, collectedyear int, collectedweek int, 
elapsedseconds float ) stored by 'com.mongodb.hadoop.hive.MongoStorageHandler' 
$TBLPROPERTIES insert overwrite table $StudentUsage select  studentid, 
from_unixtime(unix_timestamp(datetime, 'yyyy-MM-dd'),'Y') as year, 
from_unixtime(unix_timestamp(datetime, 'yyyy-MM-dd'),'w') as weekofyear, 
sum(elapsedseconds) from $StudentUsageH GROUP BY studentid, 
from_unixtime(unix_timestamp(datetime, 'yyyy-MM-dd'),'Y'), 
from_unixtime(unix_timestamp(datetime, 'yyyy-MM-dd'),'w') order by year, 
weekofyear, studentid




TJay

Reply via email to