Ok, I pulled over all of the hadoop jar files. Now I am seeing this:
0 Sep 2014 19:39:26,973 INFO [Twitter Stream consumer-1[initializing]]
(twitter4j.internal.logging.SLF4JLogger.info:83) - Establishing connection.
30 Sep 2014 19:39:28,204 INFO [Twitter Stream consumer-1[Establishing
connection]] (twitter4j.internal.logging.SLF4JLogger.info:83) - Connection
established.
30 Sep 2014 19:39:28,205 INFO [Twitter Stream consumer-1[Establishing
connection]] (twitter4j.internal.logging.SLF4JLogger.info:83) - Receiving
status stream.
30 Sep 2014 19:39:28,442 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor]
(org.apache.flume.sink.hdfs.HDFSDataStream.configure:58) - Serializer = TEXT,
UseRawLocalFileSystem = false
30 Sep 2014 19:39:28,591 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor]
(org.apache.flume.sink.hdfs.BucketWriter.open:261) - Creating
hdfs://10.0.0.14/tmp//twitter.1412105968443.ds.tmp
30 Sep 2014 19:39:28,690 ERROR [SinkRunner-PollingRunner-DefaultSinkProcessor]
(org.apache.flume.sink.hdfs.HDFSEventSink.process:467) - process failed
java.lang.UnsupportedOperationException: Not implemented by the
DistributedFileSystem FileSystem implementation
at org.apache.hadoop.fs.FileSystem.getScheme(FileSystem.java:214)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2365)
at
org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2375)
at
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2392)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
at
org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:270)
at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:262)
at
org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:718)
at
org.apache.flume.sink.hdfs.BucketWriter.runPrivileged(BucketWriter.java:183)
at
org.apache.flume.sink.hdfs.BucketWriter.access$1700(BucketWriter.java:59)
at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:715)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Is there something misconfigured on my hadoop node?
Thanks.
On Sep 30, 2014, at 2:51 PM, Hari Shreedharan <[email protected]> wrote:
> You actually need to add of all Hadoop’s dependencies to Flume classpath.
> Looks like Apache Commons Configuration is missing in classpath.
>
> Thanks,
> Hari
>
>
> On Tue, Sep 30, 2014 at 11:48 AM, Ed Judge <[email protected]> wrote:
>
> Thank you. I am using hadoop 2.5 which I think uses protobuf-java-2.5.0.jar.
>
> I am getting the following error even after adding those 2 jar files to my
> flume-ng classpath:
>
> 30 Sep 2014 18:27:03,269 INFO [lifecycleSupervisor-1-0]
> (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start:61)
> - Configuration provider starting
> 30 Sep 2014 18:27:03,278 INFO [conf-file-poller-0]
> (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:133)
> - Reloading configuration file:./src.conf
> 30 Sep 2014 18:27:03,288 INFO [conf-file-poller-0]
> (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)
> - Processing:k1
> 30 Sep 2014 18:27:03,289 INFO [conf-file-poller-0]
> (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:930)
> - Added sinks: k1 Agent: a1
> 30 Sep 2014 18:27:03,289 INFO [conf-file-poller-0]
> (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)
> - Processing:k1
> 30 Sep 2014 18:27:03,292 WARN [conf-file-poller-0]
> (org.apache.flume.conf.FlumeConfiguration.<init>:101) - Configuration
> property ignored: i# = Describe the sink
> 30 Sep 2014 18:27:03,292 INFO [conf-file-poller-0]
> (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)
> - Processing:k1
> 30 Sep 2014 18:27:03,292 INFO [conf-file-poller-0]
> (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)
> - Processing:k1
> 30 Sep 2014 18:27:03,293 INFO [conf-file-poller-0]
> (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)
> - Processing:k1
> 30 Sep 2014 18:27:03,293 INFO [conf-file-poller-0]
> (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)
> - Processing:k1
> 30 Sep 2014 18:27:03,293 INFO [conf-file-poller-0]
> (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)
> - Processing:k1
> 30 Sep 2014 18:27:03,293 INFO [conf-file-poller-0]
> (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)
> - Processing:k1
> 30 Sep 2014 18:27:03,293 INFO [conf-file-poller-0]
> (org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty:1016)
> - Processing:k1
> 30 Sep 2014 18:27:03,312 INFO [conf-file-poller-0]
> (org.apache.flume.conf.FlumeConfiguration.validateConfiguration:140) -
> Post-validation flume configuration contains configuration for agents: [a1]
> 30 Sep 2014 18:27:03,312 INFO [conf-file-poller-0]
> (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:150) -
> Creating channels
> 30 Sep 2014 18:27:03,329 INFO [conf-file-poller-0]
> (org.apache.flume.channel.DefaultChannelFactory.create:40) - Creating
> instance of channel c1 type memory
> 30 Sep 2014 18:27:03,351 INFO [conf-file-poller-0]
> (org.apache.flume.node.AbstractConfigurationProvider.loadChannels:205) -
> Created channel c1
> 30 Sep 2014 18:27:03,352 INFO [conf-file-poller-0]
> (org.apache.flume.source.DefaultSourceFactory.create:39) - Creating instance
> of source r1, type org.apache.flume.source.twitter.TwitterSource
> 30 Sep 2014 18:27:03,363 INFO [conf-file-poller-0]
> (org.apache.flume.source.twitter.TwitterSource.configure:110) - Consumer
> Key: 'tobhMtidckJoe1tByXDmI4pW3'
> 30 Sep 2014 18:27:03,363 INFO [conf-file-poller-0]
> (org.apache.flume.source.twitter.TwitterSource.configure:111) - Consumer
> Secret: '6eZKRpd6JvGT3Dg9jtd9fG9UMEhBzGxoLhLUGP1dqzkKznrXuQ'
> 30 Sep 2014 18:27:03,363 INFO [conf-file-poller-0]
> (org.apache.flume.source.twitter.TwitterSource.configure:112) - Access
> Token: '1588514408-o36mOSbXYCVacQ3p6Knsf6Kho17iCwNYLZyA9V5'
> 30 Sep 2014 18:27:03,364 INFO [conf-file-poller-0]
> (org.apache.flume.source.twitter.TwitterSource.configure:113) - Access Token
> Secret: 'vBtp7wKsi2BOQqZSBpSBQSgZcc93oHea38T9OdckDCLKn'
> 30 Sep 2014 18:27:03,825 INFO [conf-file-poller-0]
> (org.apache.flume.sink.DefaultSinkFactory.create:40) - Creating instance of
> sink: k1, type: hdfs
> 30 Sep 2014 18:27:03,874 ERROR [conf-file-poller-0]
> (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:145)
> - Failed to start agent because dependencies were not found in classpath.
> Error follows.
> java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
> at
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:38)
> at
> org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<clinit>(DefaultMetricsSystem.java:36)
> at
> org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGroupInformation.java:106)
> at
> org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:208)
> at
> org.apache.flume.sink.hdfs.HDFSEventSink.authenticate(HDFSEventSink.java:553)
> at
> org.apache.flume.sink.hdfs.HDFSEventSink.configure(HDFSEventSink.java:272)
> at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
> at
> org.apache.flume.node.AbstractConfigurationProvider.loadSinks(AbstractConfigurationProvider.java:418)
> at
> org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:103)
> at
> org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:140)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.commons.configuration.Configuration
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> ... 17 more
> 30 Sep 2014 18:27:33,491 INFO [agent-shutdown-hook]
> (org.apache.flume.lifecycle.LifecycleSupervisor.stop:79) - Stopping
> lifecycle supervisor 10
> 30 Sep 2014 18:27:33,493 INFO [agent-shutdown-hook]
> (org.apache.flume.node.PollingPropertiesFileConfigurationProvider.stop:83) -
> Configuration provider stopping
> [vagrant@localhost 6]$
>
> Is there another jar file I need?
>
> Thanks.
>
> On Sep 29, 2014, at 9:04 PM, shengyi.pan <[email protected]> wrote:
>
>> you need hadoop-common-x.x.x.jar and hadoop-hdfs-x.x.x.jar under your
>> flume-ng classpath, and the dependent hadoop jar version must match your
>> hadoop system.
>>
>> if sink to hadoop-2.0.0, you should use "protobuf-java-2.4.1.jar"
>> (defaultly, flume-1.5.0 uses "protobuf-java-2.5.0.jar", the jar file is
>> under flume lib directory ), because the pb interface of hdfs-2.0 is
>> compiled wtih protobuf-2.4, while using protobuf-2.5 the flume-ng will fail
>> to start....
>>
>>
>>
>>
>> 2014-09-30
>> shengyi.pan
>> 发件人:Ed Judge <[email protected]>
>> 发送时间:2014-09-29 22:38
>> 主题:HDFS sink to a remote HDFS node
>> 收件人:"[email protected]"<[email protected]>
>> 抄送:
>>
>> I am trying to run the flume-ng agent on one node with an HDFS sink pointing
>> to an HDFS filesystem on another node.
>> Is this possible? What packages/jar files are needed on the flume agent
>> node for this to work? Secondary goal is to install only what is needed on
>> the flume-ng node.
>>
>> # Describe the sink
>> a1.sinks.k1.type = hdfs
>> a1.sinks.k1.hdfs.path = hdfs://<remote IP address>/tmp/
>>
>>
>> Thanks,
>> Ed
>
>