How can you make it success to upload so much dataset into s3?  I have a 
question for you. How do you fix the below WARN?


I want to upload those collected data into amazon s3 cluster. But I encounter 
the below when  I set

agent.sinks.k1.hdfs.path = s3://bigdata/test <s3://bigdata/test>


14 Nov 2015 16:26:01,518 INFO  [hdfs-k1-call-runner-0] 
(org.apache.flume.sink.hdfs.AbstractHDFSWriter.reflectGetNumCurrentReplicas:188)
  - FileSystem's output stream doesn't support getNumCurrentReplicas; 
--HDFS-826 not available; 
fsOut=com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream; 
err=java.lang.NoSuchMethodException: 
com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream.getNumCurrentReplicas()
14 Nov 2015 16:26:01,518 INFO  [hdfs-k1-call-runner-0] 
(org.apache.flume.sink.hdfs.AbstractHDFSWriter.reflectGetNumCurrentReplicas:188)
  - FileSystem's output stream doesn't support getNumCurrentReplicas; 
--HDFS-826 not available; 
fsOut=com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream; 
err=java.lang.NoSuchMethodException: 
com.amazon.ws.emr.hadoop.fs.s3n.MultipartUploadOutputStream.getNumCurrentReplicas()
14 Nov 2015 16:26:01,518 WARN  [SinkRunner-PollingRunner-DefaultSinkProcessor] 
(org.apache.flume.sink.hdfs.BucketWriter.getRefIsClosed:183)  - isFileClosed is 
not available in the version of HDFS being used. Flume will not attempt to 
close files if the close fails on the first attempt
java.lang.NoSuchMethodException: 
com.amazon.ws.emr.hadoop.fs.EmrFileSystem.isFileClosed(org.apache.hadoop.fs.Path)
        at java.lang.Class.getMethod(Class.java:1665)
        at 
org.apache.flume.sink.hdfs.BucketWriter.getRefIsClosed(BucketWriter.java:180)
        at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:268)
        at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:514)
        at 
org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:418)
        at 
org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
        at java.lang.Thread.run(Thread.java:745)


I have found the jira issue about it: 
https://issues.apache.org/jira/browse/FLUME-2427 
<https://issues.apache.org/jira/browse/FLUME-2427>

But i don’t find a solution to fix the WARN issue and i can’t find the  
com.amazon.ws.emr.hadoop.fs.EmrFileSystem API.

Please help me fix it.

Thanks


> On 14 Nov, 2015, at 9:30 am, iain wright <[email protected]> wrote:
> 
> @Hari -- good to know thank you. I had trouble with filechannel at very large 
> capacities when rerunning a 1TB dataset into S3 (source was much faster than 
> the sink). can confirm it worked fine with ~50M in channel. i ended up 
> splitting up source data and fanning out to multiple s3 writing agents
> 
> -- 
> Iain Wright
> 
> This email message is confidential, intended only for the recipient(s) named 
> above and may contain information that is privileged, exempt from disclosure 
> under applicable law. If you are not the intended recipient, do not disclose 
> or disseminate the message to anyone except the intended recipient. If you 
> have received this message in error, or are not the named recipient(s), 
> please immediately notify the sender by return email, and delete all copies 
> of this message.
> 
> On Thu, Nov 12, 2015 at 11:19 AM, Hari Shreedharan <[email protected] 
> <mailto:[email protected]>> wrote:
> So there are a couple of issues related to int overflows - basically the 
> checkpoint file is mmap-ed, so indexing is on integer, and since read 16 
> bytes per event — the total number of events can be about 2 billion / 16 or 
> so (give or take) — so your channel capacity needs to be below that. I have 
> not looked at the exact numbers, but this is an approximate range. If this is 
> something that concerns you, please file a jira. I wanted to get to this at 
> some point, but didn’t see the urgency.
> 
> Thanks,
> Hari Shreedharan
> 
> 
> 
> 
>> On Nov 12, 2015, at 8:39 AM, Jeff Alfeld <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Now that the channels are working again it raises the question of why did 
>> this occur? If there is a theoretical limit to a filechannel size outside of 
>> disk space limitations, what is that limit?
>> 
>> Jeff
>> 
>> On Thu, Nov 12, 2015 at 10:23 AM Jeff Alfeld <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Thanks for the assist, it seems that clearing the directories once more and 
>> lowering the capacity of the channel has allowed the service to start 
>> successfully on this server.
>> 
>> Jeff
>> 
>> On Thu, Nov 12, 2015 at 10:03 AM Ahmed Vila <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 10M channel capacity seems to be exaggerated to me. Try to lower it down.
>> Please check if you have at least 512MB of free space on the device where 
>> you're storing channel data and checkpoint.
>> 
>> To me, this seems that it tries to reply the channel log, but it encounters 
>> an EOF. Please make sure that there is no hidden files in there.
>> Maybe removing settings for data and checkpoint dirs would be the best bet 
>> to try first, so it creates ~/.flume/file-channel/checkpoint and 
>> ~/.flume/file-channel/data
>> 
>> At the end, you might want to try playing with setting use-fast-reply or 
>> even use-log-reply-v1 to true.
>> 
>> 
>> On Tue, Nov 10, 2015 at 5:38 PM, Jeff Alfeld <[email protected] 
>> <mailto:[email protected]>> wrote:
>> I am having an issue on a server that I am standing up to forward log data 
>> from a spooling directory to our hadoop cluster. I am receiving the 
>> following errors when flume is starting up:
>> 
>> 10 Nov 2015 16:13:25,751 INFO  [conf-file-poller-0] 
>> (org.apache.flume.node.Application.startAllComponents:145)  - Starting 
>> Channel bluecoat-channel
>> 10 Nov 2015 16:13:25,751 INFO  [lifecycleSupervisor-1-0] 
>> (org.apache.flume.channel.file.FileChannel.start:269)  - Starting 
>> FileChannel bluecoat-channel { dataDirs: 
>> [/Dropbox/flume_tmp/bluecoat-channel/data] }...
>> 10 Nov 2015 16:13:25,751 INFO  [conf-file-poller-0] 
>> (org.apache.flume.node.Application.startAllComponents:145)  - Starting 
>> Channel fs-channel
>> 10 Nov 2015 16:13:25,751 INFO  [lifecycleSupervisor-1-2] 
>> (org.apache.flume.channel.file.FileChannel.start:269)  - Starting 
>> FileChannel fs-channel { dataDirs: [/Dropbox/flume_tmp/fs-channel/data] }...
>> 10 Nov 2015 16:13:25,778 INFO  [lifecycleSupervisor-1-2] 
>> (org.apache.flume.channel.file.Log.<init>:336)  - Encryption is not enabled
>> 10 Nov 2015 16:13:25,778 INFO  [lifecycleSupervisor-1-0] 
>> (org.apache.flume.channel.file.Log.<init>:336)  - Encryption is not enabled
>> 10 Nov 2015 16:13:25,779 INFO  [lifecycleSupervisor-1-2] 
>> (org.apache.flume.channel.file.Log.replay:382)  - Replay started
>> 10 Nov 2015 16:13:25,779 INFO  [lifecycleSupervisor-1-0] 
>> (org.apache.flume.channel.file.Log.replay:382)  - Replay started
>> 10 Nov 2015 16:13:25,780 INFO  [lifecycleSupervisor-1-0] 
>> (org.apache.flume.channel.file.Log.replay:394)  - Found NextFileID 0, from []
>> 10 Nov 2015 16:13:25,780 INFO  [lifecycleSupervisor-1-2] 
>> (org.apache.flume.channel.file.Log.replay:394)  - Found NextFileID 0, from []
>> 10 Nov 2015 16:13:25,784 ERROR [lifecycleSupervisor-1-0] 
>> (org.apache.flume.channel.file.Log.replay:492)  - Failed to initialize Log 
>> on [channel=bluecoat-channel]
>> java.io.EOFException
>>      at java.io.RandomAccessFile.readInt(RandomAccessFile.java:827)
>>      at java.io.RandomAccessFile.readLong(RandomAccessFile.java:860)
>>      at 
>> org.apache.flume.channel.file.EventQueueBackingStoreFactory.get(EventQueueBackingStoreFactory.java:80)
>>      at org.apache.flume.channel.file.Log.replay(Log.java:426)
>>      at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:290)
>>      at 
>> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
>>      at 
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>      at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>>      at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>>      at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>      at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>      at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>      at java.lang.Thread.run(Thread.java:745)
>> 10 Nov 2015 16:13:25,786 ERROR [lifecycleSupervisor-1-0] 
>> (org.apache.flume.channel.file.FileChannel.start:301)  - Failed to start the 
>> file channel [channel=bluecoat-channel]
>> java.io.EOFException
>>      at java.io.RandomAccessFile.readInt(RandomAccessFile.java:827)
>>      at java.io.RandomAccessFile.readLong(RandomAccessFile.java:860)
>>      at 
>> org.apache.flume.channel.file.EventQueueBackingStoreFactory.get(EventQueueBackingStoreFactory.java:80)
>>      at org.apache.flume.channel.file.Log.replay(Log.java:426)
>>      at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:290)
>>      at 
>> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
>>      at 
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>      at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>>      at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>>      at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>      at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>      at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>      at java.lang.Thread.run(Thread.java:745)
>> 10 Nov 2015 16:13:25,784 ERROR [lifecycleSupervisor-1-2] 
>> (org.apache.flume.channel.file.Log.replay:492)  - Failed to initialize Log 
>> on [channel=fs-channel]
>> java.io.EOFException
>>      at java.io.RandomAccessFile.readInt(RandomAccessFile.java:827)
>>      at java.io.RandomAccessFile.readLong(RandomAccessFile.java:860)
>>      at 
>> org.apache.flume.channel.file.EventQueueBackingStoreFactory.get(EventQueueBackingStoreFactory.java:80)
>>      at org.apache.flume.channel.file.Log.replay(Log.java:426)
>>      at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:290)
>>      at 
>> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
>>      at 
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>      at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>>      at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>>      at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>      at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>      at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>      at java.lang.Thread.run(Thread.java:745)
>> 10 Nov 2015 16:13:25,787 ERROR [lifecycleSupervisor-1-2] 
>> (org.apache.flume.channel.file.FileChannel.start:301)  - Failed to start the 
>> file channel [channel=fs-channel]
>> java.io.EOFException
>>      at java.io.RandomAccessFile.readInt(RandomAccessFile.java:827)
>>      at java.io.RandomAccessFile.readLong(RandomAccessFile.java:860)
>>      at 
>> org.apache.flume.channel.file.EventQueueBackingStoreFactory.get(EventQueueBackingStoreFactory.java:80)
>>      at org.apache.flume.channel.file.Log.replay(Log.java:426)
>>      at org.apache.flume.channel.file.FileChannel.start(FileChannel.java:290)
>>      at 
>> org.apache.flume.lifecycle.LifecycleSupervisor$MonitorRunnable.run(LifecycleSupervisor.java:251)
>>      at 
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
>>      at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
>>      at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
>>      at 
>> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
>>      at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>      at 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>      at java.lang.Thread.run(Thread.java:745)
>> 
>> Any suggestions on why this is occurring? I have tried stopping the service 
>> and clearing the  contents of the  data and checkpoint directories with no 
>> change. I have  verified that the flume daemon user account has full 
>> permissions to the checkpoint and data directories also.
>> 
>> Below is the config that I am currently trying to use:
>> 
>> 
>> #global
>> agent.sources = bluecoat-src fs-src
>> agent.channels = bluecoat-channel fs-channel
>> agent.sinks = bc-avro fs-avro
>> 
>> 
>> #kc bluecoat logs
>> agent.sources.bluecoat-src.type = spooldir
>> agent.sources.bluecoat-src.channels = bluecoat-channel
>> agent.sources.bluecoat-src.spoolDir = /Dropbox/flume
>> agent.sources.bluecoat-src.basenameHeader = true
>> agent.sources.bluecoat-src.basenameHeaderKey = basename
>> agent.sources.bluecoat-src.deserializer = line
>> agent.sources.bluecoat-src.deserializer.maxLineLength = 32000
>> agent.sources.bluecoat-src.deletePolicy = immediate
>> agent.sources.bluecoat-src.decodeErrorPolicy = IGNORE
>> agent.sources.bluecoat-src.maxBackoff = 10000
>> 
>> agent.channels.bluecoat-channel.type = file
>> agent.channels.bluecoat-channel.capacity = 100000000
>> agent.channels.bluecoat-channel.checkpointDir = 
>> /Dropbox/flume_tmp/bluecoat-channel/checkpoint
>> agent.channels.bluecoat-channel.dataDirs = 
>> /Dropbox/flume_tmp/bluecoat-channel/data
>> 
>> agent.sinks.bc-avro.type = avro
>> agent.sinks.bc-avro.channel = bluecoat-channel
>> agent.sinks.bc-avro.hostname = {destination server address}
>> agent.sinks.bc-avro.port = 4141
>> agent.sinks.bc-avro.batch-size = 250
>> agent.sinks.bc-avro.compression-type = deflate
>> agent.sinks.bc-avro.compression-level = 9
>> 
>> 
>> #kc fs logs
>> agent.sources.fs-src.type = spooldir
>> agent.sources.fs-src.channels = fs-channel
>> agent.sources.fs-src.spoolDir = /Dropbox/fs
>> agent.sources.fs-src.deserializer = line
>> agent.sources.fs-src.deserializer.maxLineLength = 32000
>> agent.sources.fs-src.deletePolicy = immediate
>> agent.sources.fs-src.decodeErrorPolicy = IGNORE
>> agent.sources.fs-src.maxBackoff = 10000
>> 
>> agent.channels.fs-channel.type = file
>> agent.channels.fs-channel.capacity = 100000000
>> agent.channels.fs-channel.checkpointDir = 
>> /Dropbox/flume_tmp/fs-channel/checkpoint
>> agent.channels.fs-channel.dataDirs = /Dropbox/flume_tmp/fs-channel/data
>> 
>> agent.sinks.fs-avro.type = avro
>> agent.sinks.fs-avro.channel = fs-channel
>> agent.sinks.fs-avro.hostname = {destination server address}
>> agent.sinks.fs-avro.port = 4145
>> agent.sinks.fs-avro.batch-size = 250
>> agent.sinks.fs-avro.compression-type = deflate
>> agent.sinks.fs-avro.compression-level = 9
>> 
>> 
>> Thanks!
>> 
>> 
>> 
>> 
>> 
>> -- 
>> Best regards,
>> 
>> Ahmed Vila | Senior software developer
>> DevLogic | Sarajevo | Bosnia and Herzegovina
>> 
>> Office : +387 33 942 123 <tel:%2B387%2033%20942%20123> 
>> Mobile: +387 62 139 348 <tel:%2B387%2062%20139%20348>
>> 
>> Website: www.devlogic.eu <http://www.devlogic.eu/> 
>> E-mail   : [email protected] 
>> <mailto:[email protected]>---------------------------------------------------------------------
>> This e-mail and any attachment is for authorised use by the intended 
>> recipient(s) only. This email contains confidential information. It should 
>> not be copied, disclosed to, retained or used by, any party other than the 
>> intended recipient. Any unauthorised distribution, dissemination or copying 
>> of this E-mail or its attachments, and/or any use of any information 
>> contained in them, is strictly prohibited and may be illegal. If you are not 
>> an intended recipient then please promptly delete this e-mail and any 
>> attachment and all copies and inform the sender directly via email. Any 
>> emails that you send to us may be monitored by systems or persons other than 
>> the named communicant for the purposes of ascertaining whether the 
>> communication complies with the law and company policies.
>> 
>> ---------------------------------------------------------------------
>> This e-mail and any attachment is for authorised use by the intended 
>> recipient(s) only. This email contains confidential information. It should 
>> not be copied, disclosed to, retained or used by, any party other than the 
>> intended recipient. Any unauthorised distribution, dissemination or copying 
>> of this E-mail or its attachments, and/or any use of any information 
>> contained in them, is strictly prohibited and may be illegal. If you are not 
>> an intended recipient then please promptly delete this e-mail and any 
>> attachment and all copies and inform the sender directly via email. Any 
>> emails that you send to us may be monitored by systems or persons other than 
>> the named communicant for the purposes of ascertaining whether the 
>> communication complies with the law and company policies.
> 
> 

Reply via email to