Hi,
I'm using the PutHive3Streaming processor to write round about 30,000 avro
records into a Hive table (round about 150 GB). This works for a while but the
used memory is increasing over the time. And after some thousands records, the
memory is full.
When I remove the PutHive3Streaming processor from my flow, everything works
OK. So, the problem must be the PutHive3Streaming processor.
As Record Reader I'm using the AvroReader, the NiFi version is 1.11.4, Hive is
version 3.1.0.
The Hive Table is created as follow:
create table yyyy(id string, data string) clustered by (id) into 50 buckets
stored as orc tblproperties("transactional"="true")
Any ideas if I make something wrong or is there really a memory leak in the
processor?
Thanks in advance and best regards
Martin
2020-05-08 09:37:22,076 INFO [Timer-Driven Process Thread-44]
o.a.h.streaming.HiveStreamingConnection Creating metastore client for
streaming-connection
2020-05-08 09:37:22,091 INFO [Timer-Driven Process Thread-44]
o.a.h.hive.metastore.HiveMetaStoreClient Trying to connect to metastore with
URI thrift://xxxxxx:9083
2020-05-08 09:37:22,092 INFO [Timer-Driven Process Thread-44]
o.a.h.hive.metastore.HiveMetaStoreClient Opened a connection to metastore,
current connections: 1
2020-05-08 09:37:22,092 INFO [Timer-Driven Process Thread-44]
o.a.h.hive.metastore.HiveMetaStoreClient Connected to metastore.
2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44]
o.a.h.h.m.RetryingMetaStoreClient RetryingMetaStoreClient proxy=class
org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=nifi (auth:SIMPLE)
retries=24 delay=5 lifetime=0
2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44]
o.a.h.streaming.HiveStreamingConnection Creating metastore client for
streaming-connection-heartbeat
2020-05-08 09:37:22,093 INFO [Timer-Driven Process Thread-44]
o.a.h.hive.metastore.HiveMetaStoreClient Trying to connect to metastore with
URI thrift://xxxx:9083
2020-05-08 09:37:22,094 INFO [Timer-Driven Process Thread-44]
o.a.h.hive.metastore.HiveMetaStoreClient Opened a connection to metastore,
current connections: 2
2020-05-08 09:37:22,095 INFO [Timer-Driven Process Thread-44]
o.a.h.hive.metastore.HiveMetaStoreClient Connected to metastore.
2020-05-08 09:37:22,095 INFO [Timer-Driven Process Thread-44]
o.a.h.h.m.RetryingMetaStoreClient RetryingMetaStoreClient proxy=class
org.apache.hadoop.hive.metastore.HiveMetaStoreClient ugi=nifi (auth:SIMPLE)
retries=24 delay=5 lifetime=0
2020-05-08 09:37:22,220 INFO [Timer-Driven Process Thread-44]
o.a.h.streaming.HiveStreamingConnection STREAMING CONNECTION INFO: {
metastore-uri: thrift://xxxxxxxx:9083, database: default, table: yyyy,
partitioned-table: false, dynamic-partitioning: false, username: nifi,
secure-mode: false, record-writer: HiveRecordWriter, agent-info: NiFi
PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread
152[Timer-Driven Process Thread-44] }
2020-05-08 09:37:22,220 INFO [Timer-Driven Process Thread-44]
o.a.h.streaming.HiveStreamingConnection Starting heartbeat thread with
interval: 150000 ms initialDelay: 64376 ms for agentInfo: NiFi
PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread
152[Timer-Driven Process Thread-44]
2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44]
o.a.hive.streaming.AbstractRecordWriter Created new filesystem instance:
484197097
2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44]
o.a.hive.streaming.AbstractRecordWriter Memory monitorings settings -
autoFlush: true memoryUsageThreshold: 0.7 ingestSizeThreshold: 0
2020-05-08 09:37:22,232 INFO [Timer-Driven Process Thread-44]
o.a.hadoop.hive.common.HeapMemoryMonitor Setting collection usage threshold to
2147483647
2020-05-08 09:37:22,233 WARN [Timer-Driven Process Thread-44]
o.a.hive.streaming.AbstractRecordWriter LOW MEMORY ALERT! Tenured gen memory is
already low. Increase memory to improve performance. Used: 129.54GB Max:
130.00GB
2020-05-08 09:37:22,233 INFO [Timer-Driven Process Thread-44]
o.a.h.streaming.HiveStreamingConnection Opened new transaction batch
TxnId/WriteIds=[292449/7336...292449/7336] on connection = { metaStoreUri:
thrift://xxxxxxxx:9083, database: default, table: yyyy }; TxnStatus[O]
LastUsed txnid:0
2020-05-08 09:37:22,272 INFO [Timer-Driven Process Thread-44]
org.apache.orc.impl.PhysicalFsWriter ORC writer created for path:
hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024 with
stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize: 32768
2020-05-08 09:37:22,292 INFO [Timer-Driven Process Thread-44]
org.apache.orc.impl.WriterImpl ORC writer created for path:
hdfs://hadoop1/tmp/nifi/yyyy/delta_0007336_0007336/bucket_00024 with
stripeSize: 8388608 blockSize: 268435456 compression: NONE bufferSize: 32768
2020-05-08 09:37:22,420 INFO [Timer-Driven Process Thread-44]
o.a.hive.streaming.AbstractRecordWriter Stats before flush: [record-updaters:
50, partitions: 1, buffered-records: 1 total-records: 1 buffered-ingest-size:
0B, total-ingest-size: 0B tenured-memory-usage: used/max => 129.54GB/130.00GB]
2020-05-08 09:37:22,420 INFO [Timer-Driven Process Thread-44]
o.a.hive.streaming.AbstractRecordWriter Flushing record updater for partitions:
default.yyyy
2020-05-08 09:37:22,578 INFO [Timer-Driven Process Thread-44]
o.a.hive.streaming.AbstractRecordWriter Stats after flush: [record-updaters:
50, partitions: 1, buffered-records: 0 total-records: 1 buffered-ingest-size:
0B, total-ingest-size: 0B tenured-memory-usage: used/max => 129.54GB/130.00GB]
2020-05-08 09:37:22,582 INFO [Timer-Driven Process Thread-44]
o.a.hive.streaming.AbstractRecordWriter Stats before close: [record-updaters:
50, partitions: 1, buffered-records: 0 total-records: 1 buffered-ingest-size:
0B, total-ingest-size: 0B tenured-memory-usage: used/max => 129.54GB/130.00GB]
2020-05-08 09:37:22,582 INFO [Timer-Driven Process Thread-44]
o.a.hive.streaming.AbstractRecordWriter Closing updater for partitions:
default.yyyy
2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]
o.a.hive.streaming.AbstractRecordWriter Stats after close: [record-updaters: 0,
partitions: 1, buffered-records: 0 total-records: 1 buffered-ingest-size: 0B,
total-ingest-size: 0B tenured-memory-usage: used/max => 129.54GB/130.00GB]
2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]
o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to metastore,
current connections: 1
2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]
o.a.h.hive.metastore.HiveMetaStoreClient Closed a connection to metastore,
current connections: 0
2020-05-08 09:37:22,600 INFO [Timer-Driven Process Thread-44]
o.a.h.streaming.HiveStreamingConnection Closed streaming connection. Agent:
NiFi PutHive3Streaming [01711000-51db-1b1c-227a-f276a5410458] thread
152[Timer-Driven Process Thread-44] Stats: {records-written: 1, records-size:
0, committed-transactions: 1, aborted-transactions: 0, auto-flushes: 0,
metastore-calls: 7 }
At the beginning the memory starts with Used: 2.63GB Max: 130.00GB
------------------------------------------------------------------------------
FIZ Karlsruhe - Leibniz-Institut für Informationsinfrastruktur GmbH.
Sitz der Gesellschaft: Eggenstein-Leopoldshafen, Amtsgericht Mannheim HRB
101892.
Geschäftsführerin: Sabine Brünger-Weilandt.
Vorsitzende des Aufsichtsrats: MinDirig’in Dr. Angelika Willms-Herget.
FIZ Karlsruhe ist zertifiziert mit dem Siegel "audit berufundfamilie".