Flume安装使用

发布 : 2016-02-12 分类 : 大数据 浏览 :

配置Flume被动收集日志

1.解压Flume压缩文件到指定目录

1
[root@node1 software]# tar -zxf apache-flume-1.6.0-bin.tar.gz -C /opt/modules

1.1.文件重命名

1
[root@node1 modules]# mv apache-flume-1.6.0-bin flume-1.6.0

2.配置Flume环境变量

1
2
[root@node1 ~]# ls -a
[root@node1 ~]# vi .bash_profile

1
2
export FLUME_HOME=/opt/modules/flume-1.6.0
export PATH=$PATH:$FLUME_HOME/bin

3.使配置生效

1
[root@node1 ~]# source .bash_profile

4.配置Flume配置文件

1
[root@node1 flume-1.6.0]# vi option1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# example.conf: A single-node Flume configuration

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = node1
a1.sources.r1.port = 44444

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

5.启动Flume日志收集

1
2
[root@node1 flume-1.6.0]# bin/flume-ng agent --conf /opt/modules/flume-1.6.0/conf --conf-file  
/opt/modules/flume-1.6.0/option1 --name a1 -Dflume.root.logger=INFO,console

6.查看端口

1
[root@node1 ~]# netstat -ntpl

7.与44444端口进行通信

1
[root@node1 ~]# telnet node1 44444

1
2
发送消息
hello matrix! good job!

8.Flume收集到监听端口发送过来的消息

配置Flume主动收集日志

1.配置/opt/modules/flume-1.6.0/option2文件

1
[root@node1 flume-1.6.0]# cp /opt/modules/flume-1.6.0/option1 ./option2

1
[root@node1 flume-1.6.0]# vi option2
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# example.conf: A single-node Flume configuration

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = spoolDir
a1.sources.r1.spoolDir = /opt/data/

# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://matrix/usr/flume/%Y-%m-%d/%H-%M
a1.sinks.k1.hdfs.rollInterval = 0
a1.sinks.k1.hdfs.rollSize = 10240
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 5
a1.sinks.k1.hdfs.roundUnit = minute
a1.sinks.k1.hdfs.useLocalTimeStamp = true
a1.sinks.k1.hdfs.callTimeout = 360000

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

2.创建Flume主动收集日志的目录

1
[root@node1 flume-1.6.0]# mkdir data

3.创建HDFS存放Flume收集数据的父目录

1
[root@node1 hadoop-2.5.1]# hadoop fs -mkdir /usr/flume

4.启动Flume,开始主动收集数据

1
[root@node1 flume-1.6.0]# ./bin/flume-ng agent --conf /opt/modules/flume-1.6.0/conf --conf-file  /opt/modules/flume-1.6.0/option2 --name a1 -Dflume.root.logger=INFO,console

5.上传一个文件到/opt/data目录下

6.Flume会主动去/opt/data目录下收集日志

7.通过Hadoop Web UI查看Flume在HDFS上创建的目录

1
2
[root@node1 conf]# mv flume-env.sh.template flume-env.sh
[root@node1 conf]# vi flume-env.sh

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
一天一个目录,每个小时生成一个文件

错误
2016-03-23 16:49:09,406 (pool-5-thread-1) [ERROR - org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:256)] FATAL: Spool Directory source r1: { spoolDir: /opt/data/ }: Uncaught exception in SpoolDirectorySource thread. Restart or reconfigure Flume to continue processing.
java.nio.charset.MalformedInputException: Input length = 1

2016-03-23 16:46:40,945 (pool-4-thread-1) [ERROR - org.apache.flume.source.SpoolDirectorySource$SpoolDirectoryRunnable.run(SpoolDirectorySource.java:256)] FATAL: Spool Directory source r1: { spoolDir: /opt/data/ }: Uncaught exception in SpoolDirectorySource thread. Restart or reconfigure Flume to continue processing.
java.lang.IllegalStateException: File name has been re-used with different files. Spooling assumptions violated for /opt/data/aaa.COMPLETED

2016-03-23 16:29:44,464 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:234)] Creating hdfs://usr/flume/2016-03-23/16-25/FlumeData.1458721731655.tmp
2016-03-23 16:29:54,465 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:455)] HDFS IO error
java.io.IOException: Callable timed out after 10000 ms on file: hdfs://usr/flume/2016-03-23/16-25/FlumeData.1458721731655.tmp
at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:693)
at org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:235)
at org.apache.flume.sink.hdfs.BucketWriter.append(BucketWriter.java:514)
at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:418)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.TimeoutException
at java.util.concurrent.FutureTask.get(FutureTask.java:201)
at org.apache.flume.sink.hdfs.BucketWriter.callWithTimeout(BucketWriter.java:686)
... 6 more

Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: usr
本文作者 : Matrix
原文链接 : https://matrixsparse.github.io/2016/02/12/Flume安装使用/
版权声明 : 本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明出处!

知识 & 情怀 | 二者兼得

微信扫一扫, 向我投食

微信扫一扫, 向我投食

支付宝扫一扫, 向我投食

支付宝扫一扫, 向我投食

留下足迹