Hi,

I'm using the PutHive3Streaming processor to write round about 30,000 avro 
records into a Hive table (round about 150 GB). This works for a while but the 
used memory is increasing over the time. And after some thousands records, the 
memory is full.

When I remove the PutHive3Streaming processor from my flow, everything works 
OK. So, the problem must be the PutHive3Streaming processor.

As Record Reader I'm using the AvroReader, the NiFi version is 1.11.4, Hive is 
version 3.1.0.

The Hive Table is created as follow:
create table yyyy(id string, data string) clustered by (id) into 50 buckets 
stored as orc tblproperties("transactional"="true")

Any ideas if I make something wrong or is there really a memory leak in the 
processor?

Thanks in advance and best regards

Martin



2020-05-08 09:37:22,076 INFO [Timer-Driven Process Thread-44] 
o.a.h.streaming.HiveStreamingConnection Creating metastore client for 
streaming-connection
2020-05-08 09:37:22,091 INFO [Timer-Driven Process Thread-44] 
o.a.h.hive.metastore.HiveMetaStoreClient Trying to connect to metastore with 
URI thrift://xxxxxx:9083
2020-05-08 09:37:22,092 INFO [Timer-Driven Process Thread-44] 
o.a.h.hive.metastore.HiveMetaStoreClient Opened a connection to metastore, 
current connections: 1
2020-05-08 09:37:22,092 INFO [Timer-Driven Process Thread-44] 
o.a.h.hive.metastore.HiveMetaStoreClient Connected to metastore.
2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44] 
o.a.h.h.m.RetryingMetaStoreClient RetryingMetaStoreClient proxy=class 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=nifi (auth:SIMPLE) 
retries=24 delay=5 lifetime=0
2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44] 
o.a.h.streaming.HiveStreamingConnection Creating metastore client for 
streaming-connection-heartbeat
2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44] 
o.a.h.hive.metastore.HiveMetaStoreClient Trying to connect to metastore with 
URI thrift://xxxx:9083
2020-05-08 09:37:22,094 INFO [Timer-Driven Process Thread-44] 
o.a.h.hive.metastore.HiveMetaStoreClient Opened a connection to metastore, 
current connections: 2
2020-05-08 09:37:22,095 INFO [Timer-Driven Process Thread-44] 
o.a.h.hive.metastore.HiveMetaStoreClient Connected to metastore.
2020-05-08 09:37:22,095 INFO [Timer-Driven Process Thread-44] 
o.a.h.h.m.RetryingMetaStoreClient RetryingMetaStoreClient proxy=class 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=nifi (auth:SIMPLE) 
retries=24 delay=5 lifetime=0
2020-05-08 09:37:22,220 INFO [Timer-Driven Process Thread-44] 
o.a.h.streaming.HiveStreamingConnection STREAMING CONNECTION INFO: { 
metastore-uri: thrift://xxxxxxxx:9083, database: default, table: yyyy, 
partitioned-table: false, dynamic-partitioning: false, username: nifi, 
secure-mode: false, record-writer: HiveRecordWriter, agent-info: NiFi 
PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread 
152[Timer-Driven Process Thread-44] }
2020-05-08 09:37:22,220 INFO [Timer-Driven Process Thread-44] 
o.a.h.streaming.HiveStreamingConnection Starting heartbeat thread with 
interval: 150000 ms initialDelay: 64376 ms for agentInfo: NiFi 
PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread 
152[Timer-Driven Process Thread-44]
2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44] 
o.a.hive.streaming.AbstractRecordWriter Created new filesystem instance: 
484197097
2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44] 
o.a.hive.streaming.AbstractRecordWriter Memory monitorings settings - 
autoFlush: true memoryUsageThreshold: 0.7 ingestSizeThreshold: 0
2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44] 
o.a.hadoop.hive.common.HeapMemoryMonitor Setting collection usage threshold to 
2147483647
2020-05-08 09:37:22,233 WARN [Timer-Driven Process Thread-44] 
o.a.hive.streaming.AbstractRecordWriter LOW MEMORY ALERT! Tenured gen memory is 
already low. Increase memory to improve performance. Used: 129.54GB Max: 
130.00GB
2020-05-08 09:37:22,233 INFO [Timer-Driven Process Thread-44] 
o.a.h.streaming.HiveStreamingConnection Opened new transaction batch 
TxnId/WriteIds=[292449/7336...292449/7336] on connection = { metaStoreUri: 
thrift://xxxxxxxx:9083, database: default, table: yyyy };  TxnStatus[O] 
LastUsed txnid:0
2020-05-08 09:37:22,272 INFO [Timer-Driven Process Thread-44] 
org.apache.orc.impl.PhysicalFsWriter ORC writer created for path: 
hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024 with 
stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize: 32768
2020-05-08 09:37:22,292 INFO [Timer-Driven Process Thread-44] 
org.apache.orc.impl.WriterImpl ORC writer created for path: 
hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024 with 
stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize: 32768
2020-05-08 09:37:22,420 INFO [Timer-Driven Process Thread-44] 
o.a.hive.streaming.AbstractRecordWriter Stats before flush: [record-updaters: 
50, partitions: 1, buffered-records: 1 total-records: 1 buffered-ingest-size: 
0B, total-ingest-size: 0B tenured-memory-usage: used/max => 129.54GB/130.00GB]
2020-05-08 09:37:22,420 INFO [Timer-Driven Process Thread-44] 
o.a.hive.streaming.AbstractRecordWriter Flushing record updater for partitions: 
default.yyyy
2020-05-08 09:37:22,578 INFO [Timer-Driven Process Thread-44] 
o.a.hive.streaming.AbstractRecordWriter Stats after flush: [record-updaters: 
50, partitions: 1, buffered-records: 0 total-records: 1 buffered-ingest-size: 
0B, total-ingest-size: 0B tenured-memory-usage: used/max => 129.54GB/130.00GB]
2020-05-08 09:37:22,582 INFO [Timer-Driven Process Thread-44] 
o.a.hive.streaming.AbstractRecordWriter Stats before close: [record-updaters: 
50, partitions: 1, buffered-records: 0 total-records: 1 buffered-ingest-size: 
0B, total-ingest-size: 0B tenured-memory-usage: used/max => 129.54GB/130.00GB]
2020-05-08 09:37:22,582 INFO [Timer-Driven Process Thread-44] 
o.a.hive.streaming.AbstractRecordWriter Closing updater for partitions: 
default.yyyy
2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44] 
o.a.hive.streaming.AbstractRecordWriter Stats after close: [record-updaters: 0, 
partitions: 1, buffered-records: 0 total-records: 1 buffered-ingest-size: 0B, 
total-ingest-size: 0B tenured-memory-usage: used/max => 129.54GB/130.00GB]
2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44] 
o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to metastore, 
current connections: 1
2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44] 
o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to metastore, 
current connections: 0
2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44] 
o.a.h.streaming.HiveStreamingConnection Closed streaming connection. Agent: 
NiFi PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread 
152[Timer-Driven Process Thread-44] Stats: {records-written: 1, records-size: 
0, committed-transactions: 1, aborted-transactions: 0, auto-flushes: 0, 
metastore-calls: 7 }

At the beginning the memory starts with Used: 2.63GB Max: 130.00GB

------------------------------------------------------------------------------

FIZ Karlsruhe - Leibniz-Institut für Informationsinfrastruktur GmbH.
Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB 
101892.
Geschäftsführerin: Sabine Brünger-Weilandt.
Vorsitzende des Aufsichtsrats: MinDirig’in Dr. Angelika Willms-Herget.

FIZ Karlsruhe ist zertifiziert mit dem Siegel "audit berufundfamilie".

Reply via email to