Correlation rule tuning

Lots
of organizations are deploying SIEM systems either to do their due diligence or
because it’s part of a regulatory requirement. One of the misconceptions that
typically is derived from marketing material is that you plug it in, turn it on,
and voila, instant security. This couldn’t be further from the truth. I look
at SIEM like a meta-IDS (Intrusion Detection System). It is attempting to find
those needles in the haystack. Most of the deployments I’ve worked on receive
millions of events per day. Many of the events are informational. Sometimes it
is mandatory to send those events to the SIEM because of regulatory
requirements, so my goal is always to maximize our resources and make the best
of the situation. When you’re getting millions of firewall events per day for
example you can either have them take up space on your SAN uselessly or you can
try to detect misuse with them.

The
first thing you need to do is identify which systems will be forwarding events.
Typically all switches, routers, servers, application, and security systems
(Network/Host Intrusion Prevention, Firewalls, anti-malware, etc). The number
of devices you forward events from to the SIEM will depend on how much money you
are willing to spend on event collectors that receive and normalize events, and
the storage necessary to keep all of this data around. Deciding what events to
send to your SIEM is often challenging. The system you are investigating is
going to have two capacity limits to be aware of. The first is storage. How
much space will your events take? To get a rough estimate I would go to every
system that will be forwarding events and report on how much space they logged
in a day then multiply that by your retention policy and add them all together.
So for instance take your (firewall logs for the day * 90) + (IPS logs for the
day * 90) = required storage. The second is events per second. At the very
least it is recommended to go to all of the devices that will be forwarding
events and report on how many they generated in a day and divide that by 86400
(number of seconds in a day). This will get an approximate number of total
events per second which will determine the number and size of event
collectors.

The
purpose of this post is to help develop ideas for custom correlation rule use
cases. Maybe a SIEM sizing and requirements guide can come later. So for now
let’s assume that you already have a SIEM in place and you want to get started
with it.

Vendor
Provided Correlation Rules

My
general methodology with SIEM (and any Intrusion Prevention System for that
matter) is to enable everything to see what happens and tune back what you are
not interested in. In many cases you have paid for the content and what better
way to get the best bang for your buck that to see how it works in your
environment. The idea would be to enable the correlation rules once your events
are being forwarded to see how they react. If there is a specific firewall
event of your network monitoring system sending UDP packets on port 162 to poll
system information via SNMP triggering a port scanning detection rule for
example, you would not turn off the entire correlation rule. The idea would be
to find the mechanism to ignore that specific traffic for that specific rule.

I
have seen rules that need to be modified slightly to become effective. For
example a correlation rule monitoring for TCP port 31337 is going to trigger
backdoor rules. Firewall events will trigger this occasionally accidentally
because of an outbound connection. Not to get too detailed here but when a
computer initiates a connection to a web server on TCP port 80 it has to open a
random port between 1024-65535 which could trigger here. Modifying the rule to
monitor for 31337 as a destination port may be a good way to tune this
rule.

Using
the same example, McAfee Rogue System Detector scans hosts for TCP 31337 during
service discovery of the network. Even though internal firewalls/routers may be
permitting and logging this traffic the target hosts may not (hopefully not) be
running these services. In this case you may want to ignore the Rogue System
Detectors with a destination TCP port of 31337.

Potential
Malware Calling Home

The
way malware behaves in our networks is a moving target, but it does tend to move
like cars on a highway rather than at light speed. So today there are several
indicators we can monitor for that would allow us to infer that there is either
an infection or misuse internally by an employee or contractor.

Resolving
domain names can be important to keep stability in the malware and allow for
quick changes of IP addresses. For example if I program my malware to connect
to a web server at pwnd.example.net it would be nice for me as the malware
administrator to change the IP of my web server in the event that someone pulls
the plug on the one I’m using. If the malware is programmed to use a static IP
to connect to I will lose that malware network. If I use DNS I may be able to
mitigate some of this risk by getting a new web server, setting up shop, and
changing the IP of pwnd.example.net to the new IP. In most environments I’ve
been in, there are only a handful of DNS servers that all systems internally are
configured to use. Part of this correlation rule would be if the following is
NOT true, source or destination port is UDP or TCP 53 and source or destination
IPs your list of approved DNS servers then trigger the alert.

Another
stanza to add to this rule could be approved proxy servers if you are using one
that is not in transparent mode. From your border firewalls you should only see
traffic from the LAN subnet coming from the proxy server to anywhere on TCP port
80. Anything else could be an attempt to subvert this control by an employee or
contractor or malware configured to do so. In addition to the above rule if the
source IP is NOT your proxy and the destination is TCP port 80 trigger the
alert. You may also want to include an AND operator for the logging device
being that of the border firewall to reduce the number of logs that need to be
investigated.

Another
stanza may be to monitor for IRC traffic. If IRC is permitted you will see
pretty quickly how many people are using it (it won’t be many) and can hopefully
tune the rule to only trigger when a certain amount of events are found in a
certain amount of time. They you could look for source or destination port of
TCP 6666, 6667, 7777 and a few others. Another thing I like to do with this is
configure a rule on my Network Intrusion Prevention System to look for any
packets with IRC as the protocol and trigger an IPS event. Then look for that
IPS event in this stanza of the rule too which should make sure you catch
anything at your egress point.

Yet
another stanza could be hosts attempting to use an SMTP server other than
yours.

Misuse
of Administration Account

Every
environment I have been in has Windows and *nix servers. These systems have
default administration accounts, administrator and root respectively. It is
best practice to provide actual system administrators with dedicated
administration user accounts so that there is accountability during
administration. If someone were to login as root and shut down a service how
would you know who it was? You may be able to track it back by IP, but not
certainly. Typically administrators don’t want the administration team using
their regular user accounts to have administrative privileges so that they
mitigate mistakes. Administrators typically will have a separate user account
for administration to ensure a certain level of assurance that the changes are
deliberate, for example username_a. The default administration accounts are
then printed and locked in a fireproof box somewhere and used for emergencies
only.

That
means that if we someone logging into a system with the username administrator
or root, either an administrator is misusing the default account or it may have
been compromised. It is important to alert specifically when the login was
successful. This rule can easily be tested. Most environments will have
systems and/or scripts that automate administration tasks so you will need to
filter those out of the correlation rule. This does leave residual risk, but we
are doing the most with what we have available to us. If you don’t like the
risk with that, then do the right thing and change the user account
;).

HTTP
Tunneling

This
rule is similar to the malware calling home rule in the sense that we are
looking for potential misuse by first looking at strange behavior. If a network
is enforcing least privilege the user network will be able to send HTTP and
HTTPS from the inside network out to the Internet. All of their SMTP traffic
should go to the internal mail relay. If users are tunneling other protocols
through HTTP they are likely attempting to evade controls, or it could be
malware attempting to evade controls. This rule requires a Network Intrusion
Detection/Prevention System or Application Layer Firewall. You will need to
create a rule that is monitoring for TCP port 80 OR 443 traffic that is NOT HTTP
protocol. On the SIEM you would just have to monitor for one of these events to
be received to trigger the alert. Again when you first create this rule you may
need to tune the rule on the log generating device(s) and/or filter certain
hosts from triggering the correlation rule.

Potential
Server Compromise

This
rule can be time consuming to create for your environment, but I have to say
that this is one of my favorites. It could be that you create this type of rule
only for critical hosts. Here is the concept. We will use a public facing web
server as the example but this obviously applies to any server.

A
typical web server is listening for connections on TCP port 80. The only
connections you should see in firewall logs are random source IP addresses being
permitted to access TCP port 80 on your server as the destination. When you
open up a web browser and connect to a website your computer opens up one of
these ports locally between 1024-65535 and makes a connection to TCP port 80 on
the web server. So if you see a firewall log that shows your web server making
a connection on a high source port to any other system someone is initiating a
connection from that webserver. If they are browsing websites or hoping to
other systems from here that should be frowned upon and corrected. Maybe this
is someone who has already compromised the system and is sending information
back to their website or FTP server. Similarly if you see someone connect to a
port other than 80 on that webserver then you have another server running.
Either someone set something new up, or maybe this is a backdoor
running.

In
conclusion these are some ideas to get you started with developing correlation
rules. Be creative. When building these rules you are always going to get a
lot of false positives in the beginning. Do not get discouraged. Create your
rule, either replay several weeks work of data through it or let it run and keep
an eye on it.

There
are many other things to consider when deploying a SIEM. One of the things that
senior engineers should be doing with the SIEM at least a couple of times per
week is perusing the base events to look at the logs that are NOT getting
correlated. There could be a lot of things happening that you don’t want to
have happen but just don’t have a correlation rule yet. Importing
Vulnerability Assessment results can really help to increase effectiveness and
efficiency. Events need to be monitored to ensure that they are getting
normalized correctly. Perhaps we will dig into some of these issues another
time.

Strange
Bandwidth Utilization

There
are a couple of ways to look at this, Potential DDoS Detections, and Potential
Exfiltration. The most common way to get this data would be to use switch and
router flow events. There may be other ways depending on the environment such
as forwarding Arbor Networks events or Network Intrusion Prevention events, etc
to the SIM. Regardless, this can take some time to benchmark and tune because
bandwidth utilization is typically somewhat sporadic.

To
detect potential DDoS attacks a good start would be to start with monitoring for
traffic ingress to the network targeted to a handful of critical system assets
that would prevent the organization from functioning should they become
inaccessible. The rule would look something like if the bandwidth directed to
my web servers is greater than 40Mb/s for 10 minutes or more, trigger an
alert.

Exfiltration
is the act of pulling data out of the network after it has been compromised. As
an example, bandwidth utilization may increase egress to the network from a file
share server. The rule would look similar to the DDoS rule where if traffic
leaving an asset is greater than 3Mb/s for 10 minutes or more, trigger an
event.

The
purpose of these rules are to provide you with some guidance on how to further
leverage your SIEM solution. Even if they do not apply to your network
specifically I hope they help you to think about some custom correlation events
you can create to fit your environment. Feel free to reach out if you want to
discuss further. Some of my favorite SIM systems are ArcSight and Q1 Labs
(QRadar)