Flume Problem - Data Lost

Patricio Eduardo Huichulef Carvajal Tue, 20 Jun 2017 13:52:13 -0700

Hello Folks:

I would like to request your help regarding Flum's Configuration toreplicate files from one node to other, where we currently have an issueabout lost files during replication process.

The following diagram represent the actual architecture where flumis working, replicating files in Avro format to HDFS and SolR.

When we check the information at both destination we have foundthat not all the information were replicate from source, losing files


    Below is the configuration file for Node 2:


*_Nodo2:_*

a1.sources = r1_
__*
*_*# Describe/configure the source

*a1.sources.r1.type = avro
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 50001

a1.channels = c1 c2_
__*
*_*# Use a channel c1 which buffers events in memory
*
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000*

# Use a channel c2 which buffers events in memory

*a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000**_*

*_*# Definición de Interceptor en caso de ser Multiplexación*_*

_// Customer wants to use Replicating, Is it necessary to keep theInterceptor declaration inside the file? Because Interceptor is forMultiplexing only. //_


*_a1.sources.r1.interceptors = pcgInterceptor
a1.sources.r1.interceptors.pcgInterceptor.type = pcg.PcgInterceptor$Builder

a1.sources.r1.interceptors.pcgInterceptor.solrServer=http://XXX.YYY.ZZZ.WWW:NN/solr/pcgs_dt_datos_panel_cntrl_shard1_replica1/a1.sources.r1.interceptors.pcgInterceptor.paramKeys =TPO_REG,START_YEAR,START_MONTH,START_DAY,START_HOUR,START_MINUTE,START_SECONDS,END_YEAR,END_MONTH,END_DAY,END_HOUR,END_MINUTE,END_SECONDS,COD_EST,PROCESS_NAME,TPO_MLL,SGL_SIS,SGL_SUB_SIS,NOM_TAR,COD_SEC,PSO_ARQ,REG_LEI,REG_PCS,REG_RCH,NOM_ARQ,FEC_OPE,DISK,MEMORY,CPU,PID,RANKING,END_LINE

**_*
*_*# Define channel selector and define mapping

*a1.sources.r1.selector.type = replicating
a1.sinks = k1 k2**_*

*_*# Definición de los Sumideros (Sinks) o Destinos

# Describe first SOLR sink k1 to store manager's data only, itsassociated with channel c1


*a1.sinks.k1.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink
a1.sinks.k1.channel = c1
a1.sinks.k1.morphlineFile = k1.conf
a1.sinks.k1.morphlineId = pcg
a1.sinks.k1.isProductionMode = true
a1.sinks.k1.batchSize = 1**_*

*_*# Describe k2 sink k2 to store developer’s data only, its associatedwith channel c2

*
a1.sinks.k2.type = hdfs
a1.sinks.k2.channel = c2
a1.sinks.k2.hdfs.path = hdfs://pcg/pcgs_dt_evnto
a1.sinks.k2.hdfs.rollInterval = 0
a1.sinks.k2.hdfs.rollSize = 1073741824
a1.sinks.k2.hdfs.rollCount = 0
a1.sinks.k2.hdfs.idleTimeout = 28800
a1.sinks.k2.hdfs.kerberosPrincipal = ing...@corp.corp
a1.sinks.k2.hdfs.kerberosKeytab = /home/ingest/ingest.keytab**_*

*_*# Enlazar la fuente y los sumideros (Sinks) al Canal
# Bind the source and sink to the channel
# a1.sources.spoolDirectory.channels = c1 c2

*a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2*_
_*

I will appreciate your feedback to this doubt

Best Regards
PEHC

Flume Problem - Data Lost

Reply via email to