As discussed in previous post, we will discuss in detail about the properties in flume agent configuration properties. For ease of understanding, we will consider the same flume.conf file created in our previous post.
Flume agent configuration file flume.conf resembles a Java property file format with hierarchical property settings. Here the filename flume.conf is not fixed, and we can provide any name to it and need to use the same name in <conf-file> when starting agent with flume-ng command.
Table of Contents
Flume Agent Configuration:
We will describe the properties in our flume.conf file by section wise.
Agent1.sources = netcat-source
Agent1.channels = memory-channel
Agent1.sinks = logger-sink
These first three lines name the agent and define the sources, sinks, and channels associated with it.
The first qualifier in the above three lines is the agent name. We can give any name (starting with character) to agent but it should not start with any digit or special character.
Second qualifier denotes any component among sources, channels and sinks. Here the keywords (sources, channels, sinks) used for second qualifier are fixed and these can’t be replaced with any other names to refer the same components.
Right hand side values are just names given to three components of agent. These can be any strings without space in between. Though it is optional but it is preferable to use descriptive names which will help in debugging log messages. If we want to specify multiple values on each line then values should be space separated.
For example, netcat-source is a single value but if we specify it as netcat source then it is treated as two sources (netcat, source) for the same agent.
Agent1.sources.netcat-source.type = netcat
Agent1.sources.netcat-source.bind = localhost
Agent1.sources.netcat-source.port = 44444
These lines specify the configuration for the source. Here, first qualifier is same as the first qualifier in first section and it is agent name. Second qualifier is a reserved keyword for sources. Third qualifier is the source name given in the first section. Fourth qualifier specifies additional properties of source. Right side values are specific values for source. We can specify as many properties as available for the source.
Since we are using the Netcat source, the configuration values specify how it should bind to the network.
A netcat-like source opens a specified port and listens for data and turns each line of text into an event. The expectation is that the supplied data is newline separated text. Each line of text is turned into a Flume event and sent via the connected channel.
Below are some of the additional properties that can be set on netcat source. The required properties are in bold.
|type||–||The component type name, needs to be netcat|
|bind||–||Host name or IP address to bind to|
|port||–||Port # to bind to|
|max-line-length||–||Max line length per event body (in bytes)|
|ack-every-event||512||Respond with an “OK” for every event received|
|selector.type||TRUE||replicating or multiplexing|
|selector.*||replicating||Depends on the selector.type value|
Agent1.sinks.logger-sink.type = logger
Qualifiers and values are similar to the same as in second section. The above line specifies the sink to be used is the logger sink which is further configured via the command line or the log4j property file.
Logger sink is typically useful for testing/debugging purpose. If the sink type is logger, and other configuration properties are specified in log4j.properties file in FLUME_CONF_DIR as shown below then all the events will be written into log file specified in flume.log.file under flume.log.dir directory.
As in the above settings, flume.log.dir is ./logs, whenever we start flume agent, it will create logs folder in pwd (present working directory) and writes its log messages into flume.log file in the same folder.
In the previous post, we passed a Java option (-Dflume.root.logger=INFO,console) when starting agent with flume-ng command to force Flume to log to the console.
$ flume-ng agent --conf $FLUME_CONF_DIR --conf-file $FLUME_CONF_DIR/flume.conf--name Agent1 -Dflume.root.logger=INFO,console
So, we were able to see the event messages on console itself.
In this post, at the bottom, we will start the same agent Agent1 without the above java option (-Dflume.root.logger=INFO,console) , then it will automatically picks the properties from log4j.properties file and log messages will be written into log file.
Agent1.channels.memory-channel.type = memory
Agent1.channels.memory-channel.capacity = 1000
Agent1.channels.memory-channel.transactionCapacity = 100
These lines specify the channel to be used and then add the type specific configuration values. In this case we are using the memory channel and we specify its capacity but it is non-persistent there is no external storage mechanism.
The events are stored in an in-memory queue with configurable max size. It’s ideal for flows that need higher throughput and are prepared to lose the staged data in the event of a agent failures.
Below are some of the additional properties for memory channel. The required properties are in bold.
|type||–||The component type name, needs to be memory|
|capacity||100||The maximum number of events stored in the channel|
|transactionCapacity||100||The maximum number of events the channel will take from a source or give to a sink per transaction|
|keep-alive||3||Timeout in seconds for adding or removing an event|
Agent1.sources.netcat-source.channels = memory-channel
Agent1.sinks.logger-sink.channel = memory-channel
These last lines configure the channel to be used for the source and sink.
Note: The <conf-dir> directory would include a shell script flume-env.sh and potentially a log4j properties file.
Sections named in this post are not the actual sections in the flume configuration file, just for ease of explanation, the properties are divided into sections.
Logging Messages into Log File Instead of Console:
Start agent with below command.
$ flume-ng agent --conf $FLUME_CONF_DIR --conf-file $FLUME_CONF_DIR/flume.conf --name Agent1
Open another terminal and connect to specified port through curl utility and type messages and hit enter. Finally close curl connection with ctrl+c key.
Now open flume.log file in logs folder in the current working directory.
$ gedit logs/flume.log
We can see message events written into flume.log file in the below screen shot.
So, the network traffic events are successfully routed to log file.