Hi,

I have large set of small files , each file is around 7 - 10 K in size
Total I have 350K files with around 6 GB.

I have changed my flume configuration with many options but whatever the config 
change Solr takes 2 sec for each file to ingest


agent.sources = SpoolDirSrc
agent.channels = FileChannel
agent.sinks = SolrSink

# Configure Source

agent.sources.SpoolDirSrc.channels = fileChannel
agent.sources.SpoolDirSrc.type = spooldir
agent.sources.SpoolDirSrc.spoolDir = /app/home/solr/final
agent.sources.SpoolDirSrc.basenameHeader = true
#agent.sources.SpoolDirSrc.batchSize = 100000

agent.sources.SpoolDirSrc.fileHeader = true
agent.sources.SpoolDirSrc.deserializer = 
org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder


# Use a channel that buffers events in memory
agent.channels.FileChannel.type = file
agent.channels.FileChannel.capacity = 1000
agent.channels.FileChannel.transactionCapacity = 1000

#agent.channels.FileChannel.transactionCapacity = 10000

# Configure Solr Sink

agent.sinks.SolrSink.type = 
org.apache.flume.sink.solr.morphline.MorphlineSolrSink
agent.sinks.SolrSink.morphlineFile = /etc/flume/conf/morphline.conf
#agent.sinks.SolrSink.batchsize = 100000
#agent.sinks.SolrSink.batchDurationMillis = 5000
agent.sinks.SolrSink.channel = fileChannel
agent.sinks.SolrSink.morphlineId = morphline1
agent.sinks.SolrSink.tika.config = tikaConfig.xml
agent.sinks.SolrSink.rollCount = 0
agent.sinks.SolrSink.rollInterval = 0
agent.sinks.SolrSink.rollsize = 100000000
agent.sinks.SolrSink.idleTimeout = 0
agent.sinks.SolrSink.batchSize = 100000
agent.sinks.SolrSink.txnEventMax = 10000000

agent.sources.SpoolDirSrc.channels = FileChannel
agent.sinks.SolrSink.channel = FileChannel

My Collection is on 2 shards and 1 replication

Kindly let me know how do I make this better

Regards,
~Sri

Reply via email to