Hi, I have large set of small files , each file is around 7 - 10 K in size Total I have 350K files with around 6 GB.
I have changed my flume configuration with many options but whatever the config change Solr takes 2 sec for each file to ingest agent.sources = SpoolDirSrc agent.channels = FileChannel agent.sinks = SolrSink # Configure Source agent.sources.SpoolDirSrc.channels = fileChannel agent.sources.SpoolDirSrc.type = spooldir agent.sources.SpoolDirSrc.spoolDir = /app/home/solr/final agent.sources.SpoolDirSrc.basenameHeader = true #agent.sources.SpoolDirSrc.batchSize = 100000 agent.sources.SpoolDirSrc.fileHeader = true agent.sources.SpoolDirSrc.deserializer = org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder # Use a channel that buffers events in memory agent.channels.FileChannel.type = file agent.channels.FileChannel.capacity = 1000 agent.channels.FileChannel.transactionCapacity = 1000 #agent.channels.FileChannel.transactionCapacity = 10000 # Configure Solr Sink agent.sinks.SolrSink.type = org.apache.flume.sink.solr.morphline.MorphlineSolrSink agent.sinks.SolrSink.morphlineFile = /etc/flume/conf/morphline.conf #agent.sinks.SolrSink.batchsize = 100000 #agent.sinks.SolrSink.batchDurationMillis = 5000 agent.sinks.SolrSink.channel = fileChannel agent.sinks.SolrSink.morphlineId = morphline1 agent.sinks.SolrSink.tika.config = tikaConfig.xml agent.sinks.SolrSink.rollCount = 0 agent.sinks.SolrSink.rollInterval = 0 agent.sinks.SolrSink.rollsize = 100000000 agent.sinks.SolrSink.idleTimeout = 0 agent.sinks.SolrSink.batchSize = 100000 agent.sinks.SolrSink.txnEventMax = 10000000 agent.sources.SpoolDirSrc.channels = FileChannel agent.sinks.SolrSink.channel = FileChannel My Collection is on 2 shards and 1 replication Kindly let me know how do I make this better Regards, ~Sri
