Hello Folks:
I would like to request your help regarding Flum's Configuration to
replicate files from one node to other, where we currently have an issue
about lost files during replication process.
The following diagram represent the actual architecture where flum
is working, replicating files in Avro format to HDFS and SolR.
When we check the information at both destination we have found
that not all the information were replicate from source, losing files
Below is the configuration file for Node 2:
*_Nodo2:_*
a1.sources = r1_
__*
*_*# Describe/configure the source
*a1.sources.r1.type = avro
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 50001
a1.channels = c1 c2_
__*
*_*# Use a channel c1 which buffers events in memory
*
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000*
# Use a channel c2 which buffers events in memory
*a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000**_*
*_*# Definición de Interceptor en caso de ser Multiplexación*_*
_// Customer wants to use Replicating, Is it necessary to keep the
Interceptor declaration inside the file? Because Interceptor is for
Multiplexing only. //_
*_a1.sources.r1.interceptors = pcgInterceptor
a1.sources.r1.interceptors.pcgInterceptor.type = pcg.PcgInterceptor$Builder
a1.sources.r1.interceptors.pcgInterceptor.solrServer=
http://XXX.YYY.ZZZ.WWW:NN/solr/pcgs_dt_datos_panel_cntrl_shard1_replica1/
a1.sources.r1.interceptors.pcgInterceptor.paramKeys =
TPO_REG,START_YEAR,START_MONTH,START_DAY,START_HOUR,START_MINUTE,START_SECONDS,END_YEAR,END_MONTH,END_DAY,END_HOUR,END_MINUTE,END_SECONDS,COD_EST,PROCESS_NAME,TPO_MLL,SGL_SIS,SGL_SUB_SIS,NOM_TAR,COD_SEC,PSO_ARQ,REG_LEI,REG_PCS,REG_RCH,NOM_ARQ,FEC_OPE,DISK,MEMORY,CPU,PID,RANKING,END_LINE
**_*
*_*# Define channel selector and define mapping
*a1.sources.r1.selector.type = replicating
a1.sinks = k1 k2**_*
*_*# Definición de los Sumideros (Sinks) o Destinos
# Describe first SOLR sink k1 to store manager's data only, its
associated with channel c1
*a1.sinks.k1.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink
a1.sinks.k1.channel = c1
a1.sinks.k1.morphlineFile = k1.conf
a1.sinks.k1.morphlineId = pcg
a1.sinks.k1.isProductionMode = true
a1.sinks.k1.batchSize = 1**_*
*_*# Describe k2 sink k2 to store developer’s data only, its associated
with channel c2
*
a1.sinks.k2.type = hdfs
a1.sinks.k2.channel = c2
a1.sinks.k2.hdfs.path = hdfs://pcg/pcgs_dt_evnto
a1.sinks.k2.hdfs.rollInterval = 0
a1.sinks.k2.hdfs.rollSize = 1073741824
a1.sinks.k2.hdfs.rollCount = 0
a1.sinks.k2.hdfs.idleTimeout = 28800
a1.sinks.k2.hdfs.kerberosPrincipal = ing...@corp.corp
a1.sinks.k2.hdfs.kerberosKeytab = /home/ingest/ingest.keytab**_*
*_*# Enlazar la fuente y los sumideros (Sinks) al Canal
# Bind the source and sink to the channel
# a1.sources.spoolDirectory.channels = c1 c2
*a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2*_
_*
I will appreciate your feedback to this doubt
Best Regards
PEHC