I am using HDInsight, and can issue table commands to my MongoDB system, but its not acting like i think it should.I realize that when i create an external table in hive, and give it mongodb properties, it is a shell of a table within hive, and that the real table resides in MongoDB. If I drop the hive table, it drops the shell, as it were, leaving the table and data in mongo alone.I have read that I can truncate said data in MongoDB from HIVE, but have never gotten this to work.I am using the following JARS to interact with Mongo from HDInsight via HIVE HQL Queries. ADD JAR wasb://j...@ilreportdata.blob.core.windows.net/mongo-java-driver-2.13.0.jar; ADD JAR wasb://j...@ilreportdata.blob.core.windows.net/mongo-hadoop-core-1.3.2.jar; ADD JAR wasb://j...@ilreportdata.blob.core.windows.net/mongo-hadoop-hive-1.3.2.jar; I can issue a create if not exists command on a table and give it a schema. If the table already exists in mongo, then all is wellIf the table, sorry, collection, does not exist, then i get no error from hive system when i create the table. But if i query the table in any way, i get errors.So i have taken to issuing a create if not exists on the table from hive, then i drop the table, which removes the mongo table and the hive table, then i recreate the table in hive.The table doesnt actually exist in mongo yet, until data populates the collection. So, to ensure that my collections get unfilled and filled, i now ensure that en empty collection exists in mongo, then issue a create statement from hive, then drop, then create, then fill with data.And i can run this over and over and be fine. As long as the collection exists in mongo, it will continue to empty and fill with new data when process is ran. What am i doing wrong.
------Some code $MongoTable = "StudentUsage" switch ($Environment) { 'Test' { $TBLPROPERTIES = "TBLPROPERTIES( 'mongo.uri' = 'mongodb://appAdmin:passw...@test.servers.mongodirector.com:27017/test_reports.$($MongoTable)?authsource=admin&replicaSet=RS-Test-0' );" } 'Prod' { $TBLPROPERTIES = "TBLPROPERTIES( 'mongo.uri' = 'mongodb://appAdmin:passw...@production.servers.mongodirector.com:27017/production_reports.$($MongoTable)?authsource=admin&replicaSet=RS-Production-0');" } 'RC' { $TBLPROPERTIES = "TBLPROPERTIES( 'mongo.uri' = 'mongodb://appAdmin:passw...@rc.servers.mongodirector.com:27017/rc_reports.$($MongoTable)?authsource=admin');" } 'Demo' { $TBLPROPERTIES = "TBLPROPERTIES( 'mongo.uri' = 'mongodb://appAdmin:passw...@demo.servers.mongodirector.com:27017/demo_reports.$($MongoTable)?authsource=admin');" } } #The location of the flattened json documents that contain student data $DocTable = $Environment + "Docs" $JSonLocation = "wasb://d...@blob.core.windows.net/StudentData/$DocTable"; #Hive table that will pull out Activity Counted data about students $DocsActivityCounted = $Environment + "DocsActivityCounted"; #Hive table that will have properties indicating it is a Mongodb table $StudentUsage = $Environment + "StudentUsage"; $StudentUsageH = $Environment + "StudentUsageHive"; $Query = " ADD JAR wasb://j...@ilreportdata.blob.core.windows.net/json-serde-1.3-jar-with-dependencies.jar; ADD JAR wasb://j...@ilreportdata.blob.core.windows.net/mongo-java-driver-2.13.0.jar; ADD JAR wasb://j...@ilreportdata.blob.core.windows.net/mongo-hadoop-core-1.3.2.jar; ADD JAR wasb://j...@ilreportdata.blob.core.windows.net/mongo-hadoop-hive-1.3.2.jar; --This will create a HIVE table that has properties that make it a Mongo table. create table if not exists $StudentUsage ( studentid string, collectedyear int, collectedweek int, elapsedseconds float ) stored by 'com.mongodb.hadoop.hive.MongoStorageHandler' $TBLPROPERTIES select count(*) from $StudentUsage; --But we need to drop it after it was created, so that the mongo table actually is dropped. drop table if exists $StudentUsage; --Then we need to create it again, so that it is empty and ready to be filled by the insertoverwrite, which only fills, not overwrites. create table if not exists $StudentUsage ( studentid string, collectedyear int, collectedweek int, elapsedseconds float ) stored by 'com.mongodb.hadoop.hive.MongoStorageHandler' $TBLPROPERTIES insert overwrite table $StudentUsage select studentid, from_unixtime(unix_timestamp(datetime, 'yyyy-MM-dd'),'Y') as year, from_unixtime(unix_timestamp(datetime, 'yyyy-MM-dd'),'w') as weekofyear, sum(elapsedseconds) from $StudentUsageH GROUP BY studentid, from_unixtime(unix_timestamp(datetime, 'yyyy-MM-dd'),'Y'), from_unixtime(unix_timestamp(datetime, 'yyyy-MM-dd'),'w') order by year, weekofyear, studentid TJay