I'm trying to use the Hive streaming API to put rows into Hive. I see that
there are delta files created with the information, but when using SQL,
Hive does not return any of them.

I made test cases using the old and new streaming APIs, both have the same
issue.

Table creation:

    CREATE TABLE test_streaming_v1 (value string) CLUSTERED BY (value) INTO
4 BUCKETS STORED AS ORC TBLPROPERTIES('orc.compress' = 'ZLIB',
'transactional' = 'true');
    GRANT ALL ON test_streaming_v1 TO USER storm;
    sudo -u hdfs hdfs dfs -setfacl -m user:storm:rwx
/warehouse/tablespace/managed/hive/test_streaming_v1
    sudo -u hdfs hdfs dfs -setfacl -m default:user:storm:rwx
/warehouse/tablespace/managed/hive/test_streaming_v1
    # repeat for v2

After running, `SELECT COUNT(*)` from both tables returns 0. However, there
are obviously delta files in HDFS:

    # sudo -u hdfs hdfs dfs -ls -R -h
/warehouse/tablespace/managed/hive/test_streaming_v1
    drwxrwx---+  - storm hadoop          0 2019-07-24 13:59
/warehouse/tablespace/managed/hive/test_streaming_v1/delta_0000001_0000010
    -rw-rw----+  3 storm hadoop          1 2019-07-24 13:59
/warehouse/tablespace/managed/hive/test_streaming_v1/delta_0000001_0000010/_orc_acid_version
    -rw-rw----+  3 storm hadoop      1.1 K 2019-07-24 13:59
/warehouse/tablespace/managed/hive/test_streaming_v1/delta_0000001_0000010/bucket_00000
    -rw-rw----+  3 storm hadoop      1.1 K 2019-07-24 13:59
/warehouse/tablespace/managed/hive/test_streaming_v1/delta_0000001_0000010/bucket_00001
    -rw-rw----+  3 storm hadoop      1.1 K 2019-07-24 13:59
/warehouse/tablespace/managed/hive/test_streaming_v1/delta_0000001_0000010/bucket_00002
    -rw-rw----+  3 storm hadoop      1.1 K 2019-07-24 13:59
/warehouse/tablespace/managed/hive/test_streaming_v1/delta_0000001_0000010/bucket_00003
    # sudo -u hdfs hdfs dfs -ls -R -h
/warehouse/tablespace/managed/hive/test_streaming_v2
    drwxrwx---+  - storm hadoop          0 2019-07-24 13:59
/warehouse/tablespace/managed/hive/test_streaming_v2/delta_0000001_0000001
    -rw-rw----+  3 storm hadoop          1 2019-07-24 13:59
/warehouse/tablespace/managed/hive/test_streaming_v2/delta_0000001_0000001/_orc_acid_version
    -rw-rw----+  3 storm hadoop        974 2019-07-24 13:59
/warehouse/tablespace/managed/hive/test_streaming_v2/delta_0000001_0000001/bucket_00000
    -rw-rw----+  3 storm hadoop        989 2019-07-24 13:59
/warehouse/tablespace/managed/hive/test_streaming_v2/delta_0000001_0000001/bucket_00001
    -rw-rw----+  3 storm hadoop        983 2019-07-24 13:59
/warehouse/tablespace/managed/hive/test_streaming_v2/delta_0000001_0000001/bucket_00002
    -rw-rw----+  3 storm hadoop        997 2019-07-24 13:59
/warehouse/tablespace/managed/hive/test_streaming_v2/delta_0000001_0000001/bucket_00003

And if I examine the file contents, it looks more or less correct:

    > ./orc-contents /tmp/bucket_00000 | head
    {"operation": 0, "originalTransaction": 1, "bucket": 536870912,
"rowId": 0, "currentTransaction": 1, "row": {"value": "3"}}
    {"operation": 0, "originalTransaction": 1, "bucket": 536870912,
"rowId": 1, "currentTransaction": 1, "row": {"value": "10"}}
    {"operation": 0, "originalTransaction": 1, "bucket": 536870912,
"rowId": 2, "currentTransaction": 1, "row": {"value": "15"}}
    {"operation": 0, "originalTransaction": 1, "bucket": 536870912,
"rowId": 3, "currentTransaction": 1, "row": {"value": "18"}}
    {"operation": 0, "originalTransaction": 1, "bucket": 536870912,
"rowId": 4, "currentTransaction": 1, "row": {"value": "21"}}
    {"operation": 0, "originalTransaction": 1, "bucket": 536870912,
"rowId": 5, "currentTransaction": 1, "row": {"value": "24"}}
    {"operation": 0, "originalTransaction": 1, "bucket": 536870912,
"rowId": 6, "currentTransaction": 1, "row": {"value": "36"}}
    {"operation": 0, "originalTransaction": 1, "bucket": 536870912,
"rowId": 7, "currentTransaction": 1, "row": {"value": "37"}}
    {"operation": 0, "originalTransaction": 1, "bucket": 536870912,
"rowId": 8, "currentTransaction": 1, "row": {"value": "46"}}
    {"operation": 0, "originalTransaction": 1, "bucket": 536870912,
"rowId": 9, "currentTransaction": 1, "row": {"value": "48"}}

I've verified that hive can read the buckets even though the storm user
owns them.
Java source and logs from running them are attached.
Hive Server 3.1.0
Hive JARs 3.1.1

Anyone have any ideas what's going wrong and how to fix it?

Sincerely,
Alex Parrill
19/07/24 13:59:03 INFO conf.HiveConf: Found configuration file file:/etc/hive/3.1.0.0-78/0/hive-site.xml
19/07/24 13:59:03 WARN conf.HiveConf: HiveConf of name hive.hook.proto.base-directory does not exist
19/07/24 13:59:03 WARN conf.HiveConf: HiveConf of name hive.strict.managed.tables does not exist
19/07/24 13:59:03 WARN conf.HiveConf: HiveConf of name hive.stats.fetch.partition.stats does not exist
19/07/24 13:59:03 WARN conf.HiveConf: HiveConf of name hive.heapsize does not exist
Connecting
19/07/24 13:59:04 INFO streaming.HiveStreamingConnection: Creating metastore client for streaming-connection
19/07/24 13:59:04 INFO metastore.HiveMetaStoreClient: Trying to connect to metastore with URI thrift://use1-hadoop-5.datto.lan:9083
19/07/24 13:59:04 INFO metastore.HiveMetaStoreClient: Opened a connection to metastore, current connections: 1
19/07/24 13:59:04 INFO metastore.HiveMetaStoreClient: Connected to metastore.
19/07/24 13:59:04 INFO metastore.RetryingMetaStoreClient: RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=storm (auth:SIMPLE) retries=24 delay=5 lifetime=0
19/07/24 13:59:04 INFO streaming.HiveStreamingConnection: Creating metastore client for streaming-connection-heartbeat
19/07/24 13:59:04 INFO metastore.HiveMetaStoreClient: Trying to connect to metastore with URI thrift://use1-hadoop-3.datto.lan:9083
19/07/24 13:59:04 INFO metastore.HiveMetaStoreClient: Opened a connection to metastore, current connections: 2
19/07/24 13:59:04 INFO metastore.HiveMetaStoreClient: Connected to metastore.
19/07/24 13:59:04 INFO metastore.RetryingMetaStoreClient: RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=storm (auth:SIMPLE) retries=24 delay=5 lifetime=0
19/07/24 13:59:04 INFO streaming.HiveStreamingConnection: STREAMING CONNECTION INFO: { metastore-uri: thrift://use1-hadoop-3.datto.lan:9083,thrift://use1-hadoop-5.datto.lan:9083, database: default, table: test_streaming_v2, partitioned-table: false, dynamic-partitioning: false, username: storm, secure-mode: false, record-writer: StrictJsonWriter, agent-info: hive-test-thing }
Beginning Transaction
19/07/24 13:59:04 INFO streaming.HiveStreamingConnection: Starting heartbeat thread with interval: 150000 ms initialDelay: 40010 ms for agentInfo: hive-test-thing
19/07/24 13:59:05 INFO streaming.AbstractRecordWriter: Created new filesystem instance: 1806920695
19/07/24 13:59:05 INFO streaming.AbstractRecordWriter: Memory monitorings settings - autoFlush: true memoryUsageThreshold: 0.7 ingestSizeThreshold: 0
19/07/24 13:59:05 INFO common.HeapMemoryMonitor: Setting collection usage threshold to 501324177
19/07/24 13:59:05 INFO streaming.HiveStreamingConnection: Opened new transaction batch TxnId/WriteIds=[201129/1...201129/1] on connection = { metaStoreUri: thrift://use1-hadoop-3.datto.lan:9083,thrift://use1-hadoop-5.datto.lan:9083, database: default, table: test_streaming_v2 };  TxnStatus[O] LastUsed txnid:0
Writing
19/07/24 13:59:06 INFO impl.MemoryManagerImpl: orc.rows.between.memory.checks=5000
19/07/24 13:59:06 INFO impl.PhysicalFsWriter: ORC writer created for path: hdfs://hdpinternal/warehouse/tablespace/managed/hive/test_streaming_v2/delta_0000001_0000001/bucket_00001 with stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize: 32768
19/07/24 13:59:06 INFO impl.WriterImpl: ORC writer created for path: hdfs://hdpinternal/warehouse/tablespace/managed/hive/test_streaming_v2/delta_0000001_0000001/bucket_00001 with stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize: 32768
19/07/24 13:59:06 INFO impl.PhysicalFsWriter: ORC writer created for path: hdfs://hdpinternal/warehouse/tablespace/managed/hive/test_streaming_v2/delta_0000001_0000001/bucket_00002 with stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize: 32768
19/07/24 13:59:06 INFO impl.WriterImpl: ORC writer created for path: hdfs://hdpinternal/warehouse/tablespace/managed/hive/test_streaming_v2/delta_0000001_0000001/bucket_00002 with stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize: 32768
19/07/24 13:59:06 INFO impl.PhysicalFsWriter: ORC writer created for path: hdfs://hdpinternal/warehouse/tablespace/managed/hive/test_streaming_v2/delta_0000001_0000001/bucket_00000 with stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize: 32768
19/07/24 13:59:07 INFO impl.WriterImpl: ORC writer created for path: hdfs://hdpinternal/warehouse/tablespace/managed/hive/test_streaming_v2/delta_0000001_0000001/bucket_00000 with stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize: 32768
19/07/24 13:59:07 INFO impl.PhysicalFsWriter: ORC writer created for path: hdfs://hdpinternal/warehouse/tablespace/managed/hive/test_streaming_v2/delta_0000001_0000001/bucket_00003 with stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize: 32768
19/07/24 13:59:07 INFO impl.WriterImpl: ORC writer created for path: hdfs://hdpinternal/warehouse/tablespace/managed/hive/test_streaming_v2/delta_0000001_0000001/bucket_00003 with stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize: 32768
Committing Transaction
19/07/24 13:59:07 INFO streaming.AbstractRecordWriter: Stats before flush: [record-updaters: 4, partitions: 1, buffered-records: 50 total-records: 50 buffered-ingest-size: 740B, total-ingest-size: 740B tenured-memory-usage: used/max => 8.06MB/683.00MB]
19/07/24 13:59:07 INFO streaming.AbstractRecordWriter: Flushing record updater for partitions: default.test_streaming_v2
19/07/24 13:59:08 INFO streaming.AbstractRecordWriter: Stats after flush: [record-updaters: 4, partitions: 1, buffered-records: 0 total-records: 50 buffered-ingest-size: 0B, total-ingest-size: 740B tenured-memory-usage: used/max => 10.25MB/683.00MB]
Closing
19/07/24 13:59:08 INFO streaming.AbstractRecordWriter: Stats before close: [record-updaters: 4, partitions: 1, buffered-records: 0 total-records: 50 buffered-ingest-size: 0B, total-ingest-size: 740B tenured-memory-usage: used/max => 10.25MB/683.00MB]
19/07/24 13:59:08 INFO streaming.AbstractRecordWriter: Closing updater for partitions: default.test_streaming_v2
19/07/24 13:59:09 INFO streaming.AbstractRecordWriter: Stats after close: [record-updaters: 0, partitions: 1, buffered-records: 0 total-records: 50 buffered-ingest-size: 0B, total-ingest-size: 740B tenured-memory-usage: used/max => 10.25MB/683.00MB]
19/07/24 13:59:09 INFO metastore.HiveMetaStoreClient: Closed a connection to metastore, current connections: 1
19/07/24 13:59:09 INFO metastore.HiveMetaStoreClient: Closed a connection to metastore, current connections: 0
19/07/24 13:59:09 INFO streaming.HiveStreamingConnection: Closed streaming connection. Agent: hive-test-thing Stats: {records-written: 50, records-size: 740, committed-transactions: 1, aborted-transactions: 0, auto-flushes: 0, metastore-calls: 7 }
Initializing
19/07/24 13:59:45 INFO conf.HiveConf: Found configuration file file:/etc/hive/3.1.0.0-78/0/hive-site.xml
19/07/24 13:59:46 WARN conf.HiveConf: HiveConf of name hive.hook.proto.base-directory does not exist
19/07/24 13:59:46 WARN conf.HiveConf: HiveConf of name hive.strict.managed.tables does not exist
19/07/24 13:59:46 WARN conf.HiveConf: HiveConf of name hive.stats.fetch.partition.stats does not exist
19/07/24 13:59:46 WARN conf.HiveConf: HiveConf of name hive.heapsize does not exist
19/07/24 13:59:46 INFO common.HiveClientCache: Initializing cache: eviction-timeout=120 initial-capacity=50 maximum-capacity=50
19/07/24 13:59:46 INFO metastore.HiveMetaStoreClient: Trying to connect to metastore with URI thrift://use1-hadoop-5.datto.lan:9083
19/07/24 13:59:46 INFO metastore.HiveMetaStoreClient: Opened a connection to metastore, current connections: 1
19/07/24 13:59:46 INFO metastore.HiveMetaStoreClient: Connected to metastore.
19/07/24 13:59:46 INFO metastore.RetryingMetaStoreClient: RetryingMetaStoreClient proxy=class org.apache.hive.hcatalog.common.HiveClientCache$CacheableHiveMetaStoreClient ugi=storm (auth:SIMPLE) retries=24 delay=5 lifetime=0
19/07/24 13:59:46 WARN metastore.HiveMetaStoreClient: Unexpected increment of user count beyond one: 2 HCatClient: thread: 1 users=2 expired=false closed=false
19/07/24 13:59:47 WARN conf.HiveConf: HiveConf of name hive.hook.proto.base-directory does not exist
19/07/24 13:59:47 WARN conf.HiveConf: HiveConf of name hive.strict.managed.tables does not exist
19/07/24 13:59:47 WARN conf.HiveConf: HiveConf of name hive.stats.fetch.partition.stats does not exist
19/07/24 13:59:47 WARN conf.HiveConf: HiveConf of name hive.heapsize does not exist
19/07/24 13:59:47 WARN metastore.HiveMetaStoreClient: Unexpected increment of user count beyond one: 3 HCatClient: thread: 1 users=3 expired=false closed=false
Writing
19/07/24 13:59:48 INFO impl.MemoryManagerImpl: orc.rows.between.memory.checks=5000
19/07/24 13:59:49 INFO impl.PhysicalFsWriter: ORC writer created for path: hdfs://hdpinternal/warehouse/tablespace/managed/hive/test_streaming_v1/delta_0000001_0000010/bucket_00001 with stripeSize: 8388608 blockSize: 268435456 compression: ZLIB bufferSize: 32768
19/07/24 13:59:49 INFO impl.OrcCodecPool: Got brand-new codec ZLIB
19/07/24 13:59:49 INFO impl.WriterImpl: ORC writer created for path: hdfs://hdpinternal/warehouse/tablespace/managed/hive/test_streaming_v1/delta_0000001_0000010/bucket_00001 with stripeSize: 8388608 blockSize: 268435456 compression: ZLIB bufferSize: 32768
19/07/24 13:59:49 INFO impl.PhysicalFsWriter: ORC writer created for path: hdfs://hdpinternal/warehouse/tablespace/managed/hive/test_streaming_v1/delta_0000001_0000010/bucket_00002 with stripeSize: 8388608 blockSize: 268435456 compression: ZLIB bufferSize: 32768
19/07/24 13:59:50 INFO impl.OrcCodecPool: Got brand-new codec ZLIB
19/07/24 13:59:50 INFO impl.WriterImpl: ORC writer created for path: hdfs://hdpinternal/warehouse/tablespace/managed/hive/test_streaming_v1/delta_0000001_0000010/bucket_00002 with stripeSize: 8388608 blockSize: 268435456 compression: ZLIB bufferSize: 32768
19/07/24 13:59:50 INFO impl.PhysicalFsWriter: ORC writer created for path: hdfs://hdpinternal/warehouse/tablespace/managed/hive/test_streaming_v1/delta_0000001_0000010/bucket_00000 with stripeSize: 8388608 blockSize: 268435456 compression: ZLIB bufferSize: 32768
19/07/24 13:59:50 INFO impl.OrcCodecPool: Got brand-new codec ZLIB
19/07/24 13:59:50 INFO impl.WriterImpl: ORC writer created for path: hdfs://hdpinternal/warehouse/tablespace/managed/hive/test_streaming_v1/delta_0000001_0000010/bucket_00000 with stripeSize: 8388608 blockSize: 268435456 compression: ZLIB bufferSize: 32768
19/07/24 13:59:50 INFO impl.PhysicalFsWriter: ORC writer created for path: hdfs://hdpinternal/warehouse/tablespace/managed/hive/test_streaming_v1/delta_0000001_0000010/bucket_00003 with stripeSize: 8388608 blockSize: 268435456 compression: ZLIB bufferSize: 32768
19/07/24 13:59:50 INFO impl.OrcCodecPool: Got brand-new codec ZLIB
19/07/24 13:59:50 INFO impl.WriterImpl: ORC writer created for path: hdfs://hdpinternal/warehouse/tablespace/managed/hive/test_streaming_v1/delta_0000001_0000010/bucket_00003 with stripeSize: 8388608 blockSize: 268435456 compression: ZLIB bufferSize: 32768
Closing
19/07/24 13:59:53 WARN metastore.HiveMetaStoreClient: Non-zero user count preventing client tear down: users=2 expired=false
19/07/24 13:59:53 WARN metastore.HiveMetaStoreClient: Non-zero user count preventing client tear down: users=1 expired=false
19/07/24 13:59:57 INFO metastore.HiveMetaStoreClient: Closed a connection to metastore, current connections: 0

Reply via email to