I'm trying to use the Hive streaming API to put rows into Hive. I see that there are delta files created with the information, but when using SQL, Hive does not return any of them.
I made test cases using the old and new streaming APIs, both have the same issue. Table creation: CREATE TABLE test_streaming_v1 (value string) CLUSTERED BY (value) INTO 4 BUCKETS STORED AS ORC TBLPROPERTIES('orc.compress' = 'ZLIB', 'transactional' = 'true'); GRANT ALL ON test_streaming_v1 TO USER storm; sudo -u hdfs hdfs dfs -setfacl -m user:storm:rwx /warehouse/tablespace/managed/hive/test_streaming_v1 sudo -u hdfs hdfs dfs -setfacl -m default:user:storm:rwx /warehouse/tablespace/managed/hive/test_streaming_v1 # repeat for v2 After running, `SELECT COUNT(*)` from both tables returns 0. However, there are obviously delta files in HDFS: # sudo -u hdfs hdfs dfs -ls -R -h /warehouse/tablespace/managed/hive/test_streaming_v1 drwxrwx---+ - storm hadoop 0 2019-07-24 13:59 /warehouse/tablespace/managed/hive/test_streaming_v1/delta_0000001_0000010 -rw-rw----+ 3 storm hadoop 1 2019-07-24 13:59 /warehouse/tablespace/managed/hive/test_streaming_v1/delta_0000001_0000010/_orc_acid_version -rw-rw----+ 3 storm hadoop 1.1 K 2019-07-24 13:59 /warehouse/tablespace/managed/hive/test_streaming_v1/delta_0000001_0000010/bucket_00000 -rw-rw----+ 3 storm hadoop 1.1 K 2019-07-24 13:59 /warehouse/tablespace/managed/hive/test_streaming_v1/delta_0000001_0000010/bucket_00001 -rw-rw----+ 3 storm hadoop 1.1 K 2019-07-24 13:59 /warehouse/tablespace/managed/hive/test_streaming_v1/delta_0000001_0000010/bucket_00002 -rw-rw----+ 3 storm hadoop 1.1 K 2019-07-24 13:59 /warehouse/tablespace/managed/hive/test_streaming_v1/delta_0000001_0000010/bucket_00003 # sudo -u hdfs hdfs dfs -ls -R -h /warehouse/tablespace/managed/hive/test_streaming_v2 drwxrwx---+ - storm hadoop 0 2019-07-24 13:59 /warehouse/tablespace/managed/hive/test_streaming_v2/delta_0000001_0000001 -rw-rw----+ 3 storm hadoop 1 2019-07-24 13:59 /warehouse/tablespace/managed/hive/test_streaming_v2/delta_0000001_0000001/_orc_acid_version -rw-rw----+ 3 storm hadoop 974 2019-07-24 13:59 /warehouse/tablespace/managed/hive/test_streaming_v2/delta_0000001_0000001/bucket_00000 -rw-rw----+ 3 storm hadoop 989 2019-07-24 13:59 /warehouse/tablespace/managed/hive/test_streaming_v2/delta_0000001_0000001/bucket_00001 -rw-rw----+ 3 storm hadoop 983 2019-07-24 13:59 /warehouse/tablespace/managed/hive/test_streaming_v2/delta_0000001_0000001/bucket_00002 -rw-rw----+ 3 storm hadoop 997 2019-07-24 13:59 /warehouse/tablespace/managed/hive/test_streaming_v2/delta_0000001_0000001/bucket_00003 And if I examine the file contents, it looks more or less correct: > ./orc-contents /tmp/bucket_00000 | head {"operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 0, "currentTransaction": 1, "row": {"value": "3"}} {"operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 1, "currentTransaction": 1, "row": {"value": "10"}} {"operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 2, "currentTransaction": 1, "row": {"value": "15"}} {"operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 3, "currentTransaction": 1, "row": {"value": "18"}} {"operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 4, "currentTransaction": 1, "row": {"value": "21"}} {"operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 5, "currentTransaction": 1, "row": {"value": "24"}} {"operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 6, "currentTransaction": 1, "row": {"value": "36"}} {"operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 7, "currentTransaction": 1, "row": {"value": "37"}} {"operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 8, "currentTransaction": 1, "row": {"value": "46"}} {"operation": 0, "originalTransaction": 1, "bucket": 536870912, "rowId": 9, "currentTransaction": 1, "row": {"value": "48"}} I've verified that hive can read the buckets even though the storm user owns them. Java source and logs from running them are attached. Hive Server 3.1.0 Hive JARs 3.1.1 Anyone have any ideas what's going wrong and how to fix it? Sincerely, Alex Parrill
19/07/24 13:59:03 INFO conf.HiveConf: Found configuration file file:/etc/hive/3.1.0.0-78/0/hive-site.xml 19/07/24 13:59:03 WARN conf.HiveConf: HiveConf of name hive.hook.proto.base-directory does not exist 19/07/24 13:59:03 WARN conf.HiveConf: HiveConf of name hive.strict.managed.tables does not exist 19/07/24 13:59:03 WARN conf.HiveConf: HiveConf of name hive.stats.fetch.partition.stats does not exist 19/07/24 13:59:03 WARN conf.HiveConf: HiveConf of name hive.heapsize does not exist Connecting 19/07/24 13:59:04 INFO streaming.HiveStreamingConnection: Creating metastore client for streaming-connection 19/07/24 13:59:04 INFO metastore.HiveMetaStoreClient: Trying to connect to metastore with URI thrift://use1-hadoop-5.datto.lan:9083 19/07/24 13:59:04 INFO metastore.HiveMetaStoreClient: Opened a connection to metastore, current connections: 1 19/07/24 13:59:04 INFO metastore.HiveMetaStoreClient: Connected to metastore. 19/07/24 13:59:04 INFO metastore.RetryingMetaStoreClient: RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=storm (auth:SIMPLE) retries=24 delay=5 lifetime=0 19/07/24 13:59:04 INFO streaming.HiveStreamingConnection: Creating metastore client for streaming-connection-heartbeat 19/07/24 13:59:04 INFO metastore.HiveMetaStoreClient: Trying to connect to metastore with URI thrift://use1-hadoop-3.datto.lan:9083 19/07/24 13:59:04 INFO metastore.HiveMetaStoreClient: Opened a connection to metastore, current connections: 2 19/07/24 13:59:04 INFO metastore.HiveMetaStoreClient: Connected to metastore. 19/07/24 13:59:04 INFO metastore.RetryingMetaStoreClient: RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=storm (auth:SIMPLE) retries=24 delay=5 lifetime=0 19/07/24 13:59:04 INFO streaming.HiveStreamingConnection: STREAMING CONNECTION INFO: { metastore-uri: thrift://use1-hadoop-3.datto.lan:9083,thrift://use1-hadoop-5.datto.lan:9083, database: default, table: test_streaming_v2, partitioned-table: false, dynamic-partitioning: false, username: storm, secure-mode: false, record-writer: StrictJsonWriter, agent-info: hive-test-thing } Beginning Transaction 19/07/24 13:59:04 INFO streaming.HiveStreamingConnection: Starting heartbeat thread with interval: 150000 ms initialDelay: 40010 ms for agentInfo: hive-test-thing 19/07/24 13:59:05 INFO streaming.AbstractRecordWriter: Created new filesystem instance: 1806920695 19/07/24 13:59:05 INFO streaming.AbstractRecordWriter: Memory monitorings settings - autoFlush: true memoryUsageThreshold: 0.7 ingestSizeThreshold: 0 19/07/24 13:59:05 INFO common.HeapMemoryMonitor: Setting collection usage threshold to 501324177 19/07/24 13:59:05 INFO streaming.HiveStreamingConnection: Opened new transaction batch TxnId/WriteIds=[201129/1...201129/1] on connection = { metaStoreUri: thrift://use1-hadoop-3.datto.lan:9083,thrift://use1-hadoop-5.datto.lan:9083, database: default, table: test_streaming_v2 }; TxnStatus[O] LastUsed txnid:0 Writing 19/07/24 13:59:06 INFO impl.MemoryManagerImpl: orc.rows.between.memory.checks=5000 19/07/24 13:59:06 INFO impl.PhysicalFsWriter: ORC writer created for path: hdfs://hdpinternal/warehouse/tablespace/managed/hive/test_streaming_v2/delta_0000001_0000001/bucket_00001 with stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize: 32768 19/07/24 13:59:06 INFO impl.WriterImpl: ORC writer created for path: hdfs://hdpinternal/warehouse/tablespace/managed/hive/test_streaming_v2/delta_0000001_0000001/bucket_00001 with stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize: 32768 19/07/24 13:59:06 INFO impl.PhysicalFsWriter: ORC writer created for path: hdfs://hdpinternal/warehouse/tablespace/managed/hive/test_streaming_v2/delta_0000001_0000001/bucket_00002 with stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize: 32768 19/07/24 13:59:06 INFO impl.WriterImpl: ORC writer created for path: hdfs://hdpinternal/warehouse/tablespace/managed/hive/test_streaming_v2/delta_0000001_0000001/bucket_00002 with stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize: 32768 19/07/24 13:59:06 INFO impl.PhysicalFsWriter: ORC writer created for path: hdfs://hdpinternal/warehouse/tablespace/managed/hive/test_streaming_v2/delta_0000001_0000001/bucket_00000 with stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize: 32768 19/07/24 13:59:07 INFO impl.WriterImpl: ORC writer created for path: hdfs://hdpinternal/warehouse/tablespace/managed/hive/test_streaming_v2/delta_0000001_0000001/bucket_00000 with stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize: 32768 19/07/24 13:59:07 INFO impl.PhysicalFsWriter: ORC writer created for path: hdfs://hdpinternal/warehouse/tablespace/managed/hive/test_streaming_v2/delta_0000001_0000001/bucket_00003 with stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize: 32768 19/07/24 13:59:07 INFO impl.WriterImpl: ORC writer created for path: hdfs://hdpinternal/warehouse/tablespace/managed/hive/test_streaming_v2/delta_0000001_0000001/bucket_00003 with stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize: 32768 Committing Transaction 19/07/24 13:59:07 INFO streaming.AbstractRecordWriter: Stats before flush: [record-updaters: 4, partitions: 1, buffered-records: 50 total-records: 50 buffered-ingest-size: 740B, total-ingest-size: 740B tenured-memory-usage: used/max => 8.06MB/683.00MB] 19/07/24 13:59:07 INFO streaming.AbstractRecordWriter: Flushing record updater for partitions: default.test_streaming_v2 19/07/24 13:59:08 INFO streaming.AbstractRecordWriter: Stats after flush: [record-updaters: 4, partitions: 1, buffered-records: 0 total-records: 50 buffered-ingest-size: 0B, total-ingest-size: 740B tenured-memory-usage: used/max => 10.25MB/683.00MB] Closing 19/07/24 13:59:08 INFO streaming.AbstractRecordWriter: Stats before close: [record-updaters: 4, partitions: 1, buffered-records: 0 total-records: 50 buffered-ingest-size: 0B, total-ingest-size: 740B tenured-memory-usage: used/max => 10.25MB/683.00MB] 19/07/24 13:59:08 INFO streaming.AbstractRecordWriter: Closing updater for partitions: default.test_streaming_v2 19/07/24 13:59:09 INFO streaming.AbstractRecordWriter: Stats after close: [record-updaters: 0, partitions: 1, buffered-records: 0 total-records: 50 buffered-ingest-size: 0B, total-ingest-size: 740B tenured-memory-usage: used/max => 10.25MB/683.00MB] 19/07/24 13:59:09 INFO metastore.HiveMetaStoreClient: Closed a connection to metastore, current connections: 1 19/07/24 13:59:09 INFO metastore.HiveMetaStoreClient: Closed a connection to metastore, current connections: 0 19/07/24 13:59:09 INFO streaming.HiveStreamingConnection: Closed streaming connection. Agent: hive-test-thing Stats: {records-written: 50, records-size: 740, committed-transactions: 1, aborted-transactions: 0, auto-flushes: 0, metastore-calls: 7 }
Initializing 19/07/24 13:59:45 INFO conf.HiveConf: Found configuration file file:/etc/hive/3.1.0.0-78/0/hive-site.xml 19/07/24 13:59:46 WARN conf.HiveConf: HiveConf of name hive.hook.proto.base-directory does not exist 19/07/24 13:59:46 WARN conf.HiveConf: HiveConf of name hive.strict.managed.tables does not exist 19/07/24 13:59:46 WARN conf.HiveConf: HiveConf of name hive.stats.fetch.partition.stats does not exist 19/07/24 13:59:46 WARN conf.HiveConf: HiveConf of name hive.heapsize does not exist 19/07/24 13:59:46 INFO common.HiveClientCache: Initializing cache: eviction-timeout=120 initial-capacity=50 maximum-capacity=50 19/07/24 13:59:46 INFO metastore.HiveMetaStoreClient: Trying to connect to metastore with URI thrift://use1-hadoop-5.datto.lan:9083 19/07/24 13:59:46 INFO metastore.HiveMetaStoreClient: Opened a connection to metastore, current connections: 1 19/07/24 13:59:46 INFO metastore.HiveMetaStoreClient: Connected to metastore. 19/07/24 13:59:46 INFO metastore.RetryingMetaStoreClient: RetryingMetaStoreClient proxy=class org.apache.hive.hcatalog.common.HiveClientCache$CacheableHiveMetaStoreClient ugi=storm (auth:SIMPLE) retries=24 delay=5 lifetime=0 19/07/24 13:59:46 WARN metastore.HiveMetaStoreClient: Unexpected increment of user count beyond one: 2 HCatClient: thread: 1 users=2 expired=false closed=false 19/07/24 13:59:47 WARN conf.HiveConf: HiveConf of name hive.hook.proto.base-directory does not exist 19/07/24 13:59:47 WARN conf.HiveConf: HiveConf of name hive.strict.managed.tables does not exist 19/07/24 13:59:47 WARN conf.HiveConf: HiveConf of name hive.stats.fetch.partition.stats does not exist 19/07/24 13:59:47 WARN conf.HiveConf: HiveConf of name hive.heapsize does not exist 19/07/24 13:59:47 WARN metastore.HiveMetaStoreClient: Unexpected increment of user count beyond one: 3 HCatClient: thread: 1 users=3 expired=false closed=false Writing 19/07/24 13:59:48 INFO impl.MemoryManagerImpl: orc.rows.between.memory.checks=5000 19/07/24 13:59:49 INFO impl.PhysicalFsWriter: ORC writer created for path: hdfs://hdpinternal/warehouse/tablespace/managed/hive/test_streaming_v1/delta_0000001_0000010/bucket_00001 with stripeSize: 8388608 blockSize: 268435456 compression: ZLIB bufferSize: 32768 19/07/24 13:59:49 INFO impl.OrcCodecPool: Got brand-new codec ZLIB 19/07/24 13:59:49 INFO impl.WriterImpl: ORC writer created for path: hdfs://hdpinternal/warehouse/tablespace/managed/hive/test_streaming_v1/delta_0000001_0000010/bucket_00001 with stripeSize: 8388608 blockSize: 268435456 compression: ZLIB bufferSize: 32768 19/07/24 13:59:49 INFO impl.PhysicalFsWriter: ORC writer created for path: hdfs://hdpinternal/warehouse/tablespace/managed/hive/test_streaming_v1/delta_0000001_0000010/bucket_00002 with stripeSize: 8388608 blockSize: 268435456 compression: ZLIB bufferSize: 32768 19/07/24 13:59:50 INFO impl.OrcCodecPool: Got brand-new codec ZLIB 19/07/24 13:59:50 INFO impl.WriterImpl: ORC writer created for path: hdfs://hdpinternal/warehouse/tablespace/managed/hive/test_streaming_v1/delta_0000001_0000010/bucket_00002 with stripeSize: 8388608 blockSize: 268435456 compression: ZLIB bufferSize: 32768 19/07/24 13:59:50 INFO impl.PhysicalFsWriter: ORC writer created for path: hdfs://hdpinternal/warehouse/tablespace/managed/hive/test_streaming_v1/delta_0000001_0000010/bucket_00000 with stripeSize: 8388608 blockSize: 268435456 compression: ZLIB bufferSize: 32768 19/07/24 13:59:50 INFO impl.OrcCodecPool: Got brand-new codec ZLIB 19/07/24 13:59:50 INFO impl.WriterImpl: ORC writer created for path: hdfs://hdpinternal/warehouse/tablespace/managed/hive/test_streaming_v1/delta_0000001_0000010/bucket_00000 with stripeSize: 8388608 blockSize: 268435456 compression: ZLIB bufferSize: 32768 19/07/24 13:59:50 INFO impl.PhysicalFsWriter: ORC writer created for path: hdfs://hdpinternal/warehouse/tablespace/managed/hive/test_streaming_v1/delta_0000001_0000010/bucket_00003 with stripeSize: 8388608 blockSize: 268435456 compression: ZLIB bufferSize: 32768 19/07/24 13:59:50 INFO impl.OrcCodecPool: Got brand-new codec ZLIB 19/07/24 13:59:50 INFO impl.WriterImpl: ORC writer created for path: hdfs://hdpinternal/warehouse/tablespace/managed/hive/test_streaming_v1/delta_0000001_0000010/bucket_00003 with stripeSize: 8388608 blockSize: 268435456 compression: ZLIB bufferSize: 32768 Closing 19/07/24 13:59:53 WARN metastore.HiveMetaStoreClient: Non-zero user count preventing client tear down: users=2 expired=false 19/07/24 13:59:53 WARN metastore.HiveMetaStoreClient: Non-zero user count preventing client tear down: users=1 expired=false 19/07/24 13:59:57 INFO metastore.HiveMetaStoreClient: Closed a connection to metastore, current connections: 0