Perfect Iain. Worked like a charm.
> On Aug 31, 2015, at 11:19 AM, iain wright <[email protected]> wrote: > > I'd expect it to work with any source, ive used it with exec & > spoolingdirsource > > Cheers, > > -- > Iain Wright > > This email message is confidential, intended only for the recipient(s) named > above and may contain information that is privileged, exempt from disclosure > under applicable law. If you are not the intended recipient, do not disclose > or disseminate the message to anyone except the intended recipient. If you > have received this message in error, or are not the named recipient(s), > please immediately notify the sender by return email, and delete all copies > of this message. > > On Mon, Aug 31, 2015 at 11:14 AM, Guyle M. Taber <[email protected] > <mailto:[email protected]>> wrote: > Fantastic. > > So with this deserializer setting, it’s not dependent on the source being a > logger type? > > >> On Aug 31, 2015, at 11:12 AM, iain wright <[email protected] >> <mailto:[email protected]>> wrote: >> >> Hi Guyle, >> >> We ran into the same thing. >> >> Please see https://flume.apache.org/FlumeUserGuide.html#line >> <https://flume.apache.org/FlumeUserGuide.html#line> >> >> On the originating source/where the event enters flume for the first time, >> increase maxLineLength, ie: >> ... >> agent1.sources.source1.deserializer.maxLineLength = 1048576 >> ... >> >> Best, >> >> -- >> Iain Wright >> >> This email message is confidential, intended only for the recipient(s) named >> above and may contain information that is privileged, exempt from disclosure >> under applicable law. If you are not the intended recipient, do not disclose >> or disseminate the message to anyone except the intended recipient. If you >> have received this message in error, or are not the named recipient(s), >> please immediately notify the sender by return email, and delete all copies >> of this message. >> >> On Mon, Aug 31, 2015 at 11:03 AM, Guyle M. Taber <[email protected] >> <mailto:[email protected]>> wrote: >> I’m using an Avrosink to send events to HDFS and we’re seeing with long >> content lines, our lines seem to be getting truncated at about the 2060 >> character mark. How can I prevent long lines from being truncated when using >> an Avro sink in this fashion? >> >> Here’s a snippet of an event from the raw logs before flume is involved. >> I’ve toggled hidden characters so you can see the EOL character being >> inserted, which breaks up the event into two lines. >> >> …utm_campaign=%E5%81%A5%E5%BA%B7%E7%BE%8E%E6%8A%A4&camp=%E5%81%A5%E5%BA%B7%E7%BE%8E%E6%8A%A4^Isearch-term[=]^Isession-id[=]720D69AB19F1DD17D27A948C9B31D380^Istore-id[=]^Itracking-ticket-id[=]^Itracking-ticket-number[=]^Ievent-session-id[=]98df4905-51ab-43a9-92d9-35d879a69b9a >> $ >> >> Here’s a snippet of an event that gets truncated. >> >> …utm_campaign=%E5%81%A5%E5%BA%B7%E7%BE%8E%E6%8A%A4&camp=%E5%81%A5%E5%BA%$ >> >> B7%E7%BE%8E%E6%8A%A4^Isearch-term[=]^Isession-id[=]720D69AB19F1DD17D27A948C9B31D380^Istore-id[=]^Itracking-ticket-id[=]^Itracking-ticket-number[=]^Ievent-session-id[=]98df4905-51ab-43a9-92d9-35d879a69b9a >> $ >> >> Here is our sink on the sending node. >> >> agent.sinks = AvroSink >> agent.sinks.AvroSink.type = avro >> agent.sinks.AvroSink.channel = memoryChannel >> agent.sinks.AvroSink.hostname = flume.mydomain.int >> <http://flume.mydomain.int/> >> agent.sinks.AvroSink.port = 4169 >> agent.sinks.AvroSink.batchSize = 0 >> agent.sinks.AvroSink.rollSize = 0 >> agent.sinks.AvroSink.rollInterval = 0 >> agent.sinks.AvroSink.rollCount = 0 >> agent.sinks.AvroSink.idleTimeout = 0 >> agent.sinks.AvroSink.useLocalTimeStamp = true >> >> Here is our sink on the HDFS receiving side. >> >> dp1.sinks.sinkCN.type = hdfs >> dp1.sinks.sinkCN.channel = channelCN >> dp1.sinks.sinkCN.hdfs.filePrefix = %{basename}- >> dp1.sinks.sinkCN.hdfs.path = >> hdfs://sf1-hadoopnn1.mydomain.int/flume/events/ods/cn/fe_event/%{host}/%y-%m-%d >> <> >> dp1.sinks.sinkCN.hdfs.fileType = DataStream >> dp1.sinks.sinkCN.hdfs.writeFormat = Text >> dp1.sinks.sinkCN.hdfs.rollSize = 0 >> dp1.sinks.sinkCN.hdfs.rollCount = 0 >> dp1.sinks.sinkCN.hdfs.batchSize = 5000 >> > >
