Grep, sort, uniq, awk, sed are all worthy tools, but part of the job is deciding when one tool, or tool-set is no longer enough to accomplish your mission. When should you turn over the job to a true sorting engine, a relational database?

Unix systems (everyone uses a unix system for serious crunching of logs right?) have a whole host of great command-line tools:

  1. sort
  2. grep
  3. awk
  4. sed
  5. uniq
  6. cut (which I personally shun)

These are excellent for quickie jobs like: analyzing router logs during an incident, finding trends in web logs, looking for that problem child in syslog output. These tools, however, fall short in terms of flexibility for longer term searching and grouping of results. Reporting on a regular basis can have some serious enhancement from simple database queries, the power that a database brings to log reporting and analysis is truly a sight to behold.

For simple firewall or access-list logs the ability to look over a time period with ease and group by source, destination, ports, protocols and combinations of these is incredible. Compare this with the arduous task of tracking totals in awk or sed or perl and parsing compressed logfiles over time. Clearly at some point you’d have to agree that using a database to analyze logs can be useful. Most of the time the decision to use a database comes too late in the process I believe, after there is already a monstrosity of a awk or perl script doing far too much work and taking up far too much time (personnel and machine).

So, I present a simple log file analysis process for firewall logs, assuming ‘firewall’ means iptables/netfilter, though any firewall should produce logs of a format suitable for processing in this manner. First things first, decide on an appropriate database backend, something simple and easy and available. I use MySQL it’s all of these things.  Now let’s assume that it’s actually running and you have created atleast the bare-bones tables and privileges, securing MySQL is left as an exercise to the reader thus far. I don’t believe in GUI management tools for databases, or much of anything else actually. Perhaps I’m a luddite I apologize for that, but I believe you learn more through doing than clicking. Let’s create the new database to hold our magical firewall data:

create database firewall;

Simple as that we have a database named ‘firewall’! Let’s create a user to query the database, it would be nice to have a ‘READ ONLY’ and a ‘READ/WRITE’ user, but for now we’ll just use one userid:

insert into user values (
‘localhost’,’firewall’,password(’your-passwd-here’),
‘Y’,’Y’,’Y’,’Y’,’Y’,’Y’,’Y’,’Y’,’Y’,’Y’,’Y’,’Y’,’Y’,’Y’
);

insert into db values (
‘localhost’,’firewall’,’firewall’,’Y’,’Y’,’Y’,’Y’,’Y’,’Y’,’Y’,’Y’,’Y’,’Y’
);

Now we have a userid that can access the firewall database and do our bidding. Next we need to connect to the database and create our tables to hold the data:


connect firewall;

Let’s create the table to hold our firewall log data:

--
-- Data from the firewall logs
create table firewall (
SRC int(10) unsigned,
DST int(10) unsigned,
PROTO VARCHAR(8),
SPORT int(5) unsigned,
DPORT int(5) unsigned
);

First things first, notice that the SRC (source address) and DST (destination address) are stored not as varchar or text fields but as unsigned integers. This permits us to sort by ip address, match inside ranges of ip addresses and lots of other nice things. PostgreSQL has a possibly better method of handling ip address information with their included ‘inet’ datatype, but for simple things MySQL seems to work just fine with unsigned integers. Also, we are storing the ports as integers, you can always make a reference table for ports and join against it, but heck who doesn’t already know that tcp/2105 is eklogin? Of course, a great reference for these things is the IANA address family numbers list.

Once you have the tables defined all you need do is load data, the standard netfilter/iptables log format is:

IN=eth0 OUT= MAC=00:f0:a6:01:3b:c2:01:08:2a:ff:dd:26:40:08:00 SRC=69.20.1.77 DST=128.2.35.50 LEN=32 TOS=0x00 PREC=0x00 TTL=3 ID=779 PROTO=UDP SPT=11124 DPT=33438 LEN=12

Of this the important parts for our reporting are fields (in awk order) are 4, 5, 11, 12, 13. Unfortunately we need to so some gymnastics to make these convert over well for mysql load, first find these elements:


awk ‘{print $4 “ “ $5 “ “ $11 “ “ $12 “ “ $13 }’ firewall-log | \
sed ‘s/ / = /g’ | \
awk -F= ‘{print $2 “,” $4 “,” $6 “,” $8 “,” $10 }’ > /tmp/firewall-log.csv

Now that we have them in a comma seperated value list we can load that into mysql which should be fairly simple:


mysql -u firewall -p firewall << EOF
load data local infile "/tmp/firewall-log.csv"
into table firewall;
quit;
EOF

This should load all data in the firewall log into the database. You should be able to query it now in a structured way. We’ll cover later some possible reports that might be of interest, simple things like:

  1. Top source ips per protocol
  2. Top destination ports per protocol
  3. Trending of port usage over time
  4. Queries for specifc ip source or destination activities.

There are perhaps a limitless number of reports and things to query, now you have the engine in place to do the querying.