Mitigating bandwidth-saturation attacks

Status

This article is a reprise of the paper “Identifying Legitimate Clients under Distributed Denial-of-Service Attacks” published in NSS2010 (Network and System Security), September 2010.  It attempts to refine some of the reasoning in the paper, and address some comments given in response to its presentation.

Abstract

This page describes a mechanism to mitigate the effects of a certain class of Distributed Denial of Service (DDoS) attacks, specifically those that aim to consume the bandwidth of network links approaching a target server with packets that form no useful part of the service.

Problem statement

There are many kinds of Distributed Denial of Service (DDoS) attacks, but a particularly difficult one to deal with occurs at the IP level.  The purpose of an IP network is to deliver packets from A to B as best it can, without regard to the utility of those packets as far as B is concerned; if a packet turns out to be problematic, B will deal with it in due course.  Although a single host A1 is able to send only a relatively small volume of such traffic, a large collection of such hosts A1…An can send altogether n separate flows which aggregate together as they approach the target B, such that their volume exceeds the capacity of some of the links approaching B.  This illegitimate aggregate competes proportionately with legitimate traffic where there is insufficient capacity for both, thus causing most legitimate packets to be dropped and reducing the service provided by the target.

To mitigate this problem, network elements upstream of the overwhlemed links must be able to a distinguish legitimate packet from an illegitimate one, before deciding whether to forward it.  The target is surely best placed to determine an individual packet’s legitimacy, as a legitimate packet will form a consistent part of a flow packets, the previous ones of which are already being held and processed by the target.  However, this determination takes place too late to prevent saturation by illegitimate packets.

Can a network element be instrumented to discriminate packets by their utility to the destination with sufficient accuracy?  Packet size seems a good indicator of utility, but only from an operational perspective; 90% link utilization could be seen as an indication of the worth of installing the link, despite most of it being ultimately useless during an attack.  Deep Packet Inspection (DPI) could allow the network element to get at least a transport-level view of traffic, but it is not guaranteed to get a complete view, and might need to let through several potential attack packets before it can determine that they belong to illegitimate flows, or consume resources storing them until the determination is made (which itself risks postponing that determination, if those initial packets must be delivered before any others are seen).

If we see the target as the ultimate agent of packet discrimination, we transform the problem into one of delivering this discrimination upstream to network elements beyond the saturated links, where the discrimination can be applied.  This discrimination could simply be that some source IP addresses are good, and others are bad.  This, of course, has its own problems:

  • An element at the network edge (the target) must persuade an entity deep in the network, probably in a different administrative domain, to filter its incoming traffic on some criteria.  The administrators of these two elements likely have no prior relationship (unless they are immediate neighbours), and so do not trust each other.  The server admin cannot expect the network admin to do something for them, and the network admin cannot trust that they would be doing it for those purported to request it.
  • Source-IP-based filtering criteria need only be propagated towards the sources identified, but routing is based on destination IP.  The route from target server to client might not be the inverse of the route from client to server, i.e. the routes are asymmetric.
  • Discrimination is made harder by the possibility of spoofed source addresses.  The fact that one packet with source A has been observed at the server to be illegitimate does not imply that all source-A packets are bad, as it might not have come from source A at all.  Indeed, if an attacking host has managed to spoof A, it might be able to spoof another address for its next packet, so attempting to block A does not necessarily block the next packet.
  • Network elements are optimized for destination-IP-based discrimination, not source-based.  Filtering on a set of source addresses could excessively consume a network’s resources, it must be scalable with the number of attackers and legitimate clients, and must not expose the core elements to new vectors of attack.

The goal of the mechanism described herein is, in the face of spoofing and large numbers of attacking hosts, to prevent illegitimate packets from reaching the server and a sufficient number of adjacent links such that there is sufficient capacity for legitimate traffic.  Failing that, it should at least restrict illegitimate packets with spoofed source addresses, allowing the target server to make transport-/application-level decisions based on source addresses with confidence.  It is also important that the mechanism should work with clients subject to Network-Address Translation (NAT), whereby legitimate clients could unwittingly share an IP address with attacking hosts.

Mechanism overview

First, the mechanism is not engaged by default.  Under normal conditions, routers make no special discrimination, and clients take no special actions.  When a particular server is attacked, network elements are instructed to take steps to protect it, but have no effect on unrelated traffic.

The approach to discrimination is to use the target server to indicate whether a particular source address is legitimate, having behaved properly so far in interacting with the server.  Network elements are informed of source addresses known to be legitimate, i.e. each keeps a ‘whitelist’.  Clients are motivated to behave well, because this gets them into the whitelist.  Attackers, who cannot behave well by definition, do not get whitelisted.  Additional techniques are employed to allow clients to behave well enough to interact with the server long enough to get whitelisted.

The mechanism operates under the following assumptions:

  • Adjacent ASes are prepared to interact to protect target servers, with special behaviour at the edges hosting targets and/or their clients.  While this might not be the case now, we hope that the quality of a mechanism to not require an AS to trust beyond its immediate neighbours will be appealing to AS operators.
  • Advertised BGP routes have not been compromised.  If they are, You Have Bigger Problems.
  • Routes are relatively stable.  The mechanism should cope with occasional changes, with the cost of temporary disruption.
  • Clients respond to ICMP echo requests with echo replies.  Although this feature is often turned off, a client could make specific exceptions for certain ICMP messages which the client can see are useful to it.

Techniques for discriminating packets

The mechanism employs three techniques to help discriminate good and bad packets.

Computational proof of work

This technique requires that an Issuer, on behalf of a Server (which could be the Issuer), publishes a nonce or salt value to potential Clients.  Clients attempt to reach the Server by communicating through one or more Verifiers, which have also received the nonce.  A Verifier only passes a message on to the Server if it contains a proof.

A Client generates a proof by hashing a candidate proof with details of the communication (e.g. source and destination addresses) and the published nonce.  Verifiers impose a constraint on the proof in that the hash should yield a result with a pattern within it, e.g. a minimum number of trailing zeros.  If the hash is irreversible, the Client can only find a proof by hashing many candidates and checking their results.  Statistically, this will involve a certain amount of work, hence the successful candidate acts as a proof of that work.

A Client must perform many computations to find a proof.  In contrast, a Verifier can check a proof with a single computation.

There may be several degrees of constraint, e.g. varying numbers of trailing zeros, and a Verifier may use this to prioritize messages.  Those with more zeros demonstrate more work, and could be queued ahead of those with fewer, or the threshold below which messages are discarded could be varied to impose greater load on the Clients.

Ping cookies

Ping cookies are essentially flow cookies, i.e. a host includes some opaque data (the cookie) in its packets to another host, which does nothing but echo back the cookie unchanged.  The first host then knows that it is talking to the right peer.

Ping cookies are additionally cryptographically signed with an asymmetric cypher.  This allows entities that observe the cookie in transit to know that it came from a certain originator, provided they have that originator’s public key.

Finally, ping cookies travel as the payload of ICMP Echo Request/Reply messages.  An originator can dispatch a cookie to another host in order to deliver a message to all nodes on the path from the originator to the host, and from the host back to the originator, even if these are different paths.

Bloom filters

A Bloom filter is an approximation of a set.  A basic Bloom filter consists of an sequence of bits, initially zero, and a fixed number of independent hash functions.  To add a value to the set, it is passed through each hash function, yielding the same number of bit positions (i.e. 8 hash functions yields 8 bit positions).  Each bit in the sequence identified by a position is then set.  To test if a value is in the set, it is passed through the hash functions as before, and each identified bit is inspected.  If any are zero, it is certain that the value is not in the set.  If they are all non-zero, there is a predictable chance that the value is in the set, as determined by the number of bits in the sequence, and the number of hash functions.

Therefore, with the benefits of fixed size and O(1) insertion and inspection time, a Bloom filter yields no false negatives, but risks some false positives.

Removal from a simple bit-based Bloom filter is not possible, except to clear the set, as that would risk removing a bit shared by several values.

A counting Bloom filter replaces the bits with multibit counters.  When adding, each identified counter is incremented.  Removal can be done with care, if it is certain that a value is present when removed — however, this cannot be determined by the Bloom filter itself, because of the risk of false positives.

A decaying Bloom filter is a counting Bloom filter.  Instead of removing specific values, the filter is periodically decayed by reducing the value of all counters (or a random subset thereof).  By incrementing linearly, and decrementing exponentially, one can model the presence of a value in the set as a stable ‘cylinder of liquid’.  Regular, periodic commands to include the value cause its level of presence to rise initially, until it reaches a point where the exponential decay matches the linear increase.  If the commands are suddenly stopped, the level will decrease to below a threshold in an amount of time related to the previous commands’ frequency, rather than the duration of the era that the commands where issued.

Additions to the network

This mechanism requires changes to the network.  The new components can remain dormant while no attack is taking place, and the cost of their additional overhead to mitigate a current attack must be less than the cost of the attack itself to the target server.

  1. It is assumed that a co-ordinating entity co-located with the target monitors the target’s traffic and detects network-level DDoS attacks.  This Mitigation Initiator (MI) makes decisions about when to deploy and withdraw the mitigation mechanism.
  2. Each AS runs a Mitigation Co-ordinator (MC), which keeps open a channel of communication with each of its neighbour ASes’ MCs, and with the MIs of each potential target in its own network.  These channels are used to propagate deployment/withdrawal commands across the entire network.
  3. Optionally, each AS implements filtering for any traffic passing through it.  Typically, it would exist on each ingress, but could also be placed on dedicated scrubbing machines, with suspect traffic tunnelled to it.  Filtering is under the control of the local MC.
  4. Clients are able to include computational proofs of work into certain outgoing packets used to establish connections (such as TCP handshake packets).  Failing that, agents on clients’ gateways inject such proofs on their behalf.  In either case, the entity responsible for adding proofs to packets is referred to as a Client Agent (CA), and it interacts with a local MC to receive propagated deployment/withdrawal commands.

Filtering

For each target host T, where an AS implements filtering, and which AS has been informed that T is under attack, the filtering mechanism must maintain two sets:

  1. A set of IP addresses, called the host set.
  2. A set of <IP, port> tuples, called the host-port set.

These are also informally referred to as whitelists.

Where an AS implements filtering at separate locations (e.g. at each ingress), a pair of sets may be maintained at each location, and need not be consistent.

All traffic to T should pass through a filter (in a supporting AS).  By default, the filter restricts such traffic, e.g. by applying a rate limiter.  However, a packet can by-pass this restriction entirely under any of three conditions:

  1. The packet contains a verifiable and unique proof of work.
  2. The packet’s source host is present in the host set.
  3. The packet’s source host and source port as a tuple are present in the host-port set.

Parameters for rate limiting, and verification of proofs of work and ping cookies are set by the MI, and distributed across ASes through the MCs.  These parameters may be cycled periodically to prevent attackers from using previously successful proofs or cracked cyphers.  When parameters are cycled, filters must be prepared to accept previous values for a short time during transition.

Unique proofs

Unique proofs are identified by retaining a set of recent proofs as (for example) a decaying Bloom filter, or as the union of several stages of a simple Bloom filter (periodically discarding the oldest stage and inserting a new empty stage).

Whitelisting and de-listing

An IP address, or a host-port tuple, can be whitelisted by adding it to the host set or host-port set as appropriate.  The filter must only do this if it observes an ICMP Echo Reply containing a verifiable ping cookie for the host or host-port, with the host as source address.  Unless refreshed periodically, each entry in the sets must be removed.

The sets can be implemented as decaying Bloom filters.  When the verified Echo Reply is observed, the key (host or host-port) is passed through each hash function to identify a counter.  Each counter is then incremented by a fixed amount.  Periodically, each counter decays exponentially, e.g. by shifting right one bit.  Notwithstanding false positives, this guarantees that a particular key will disappear entirely within a time bounded by the rate at which refreshment took place, rather than by the arbitrary duration that the key was repeatedly added.

Propagation of mitigation commands

A Mitigation Initiator (MI) issues commands to its neighbour MCs about target T that is currently under attack.  A command includes filtering parameters for verifying ping cookies, for generating and verifying proof of work, and for configuring rate limiting.  It is also timestamped.

On receiving a command, an MC must determine whether it is valid.  It is valid if it comes from a neighbour to which it relays packets destined to T.  This determination could be temporary as routes change, so it should not discard the command, but hold it until it becomes valid, times out, or is overridden by a later command from the same neighbour.

An MC should re-issue a valid command to MCs (and ultimately to CAs) in other neighbour ASes.  These should not be seen as the same command being relayed, but a new command conveying the same requirement to protect T.  If the AS has several egresses to the same destination T, it should merge any valid commands to form the new command; for example, rate limits could be summed.  When re-issuing to several MCs, the MC could divide the rate limit among them according to local policies.  In contrast, proof-of-work and ping-cookie parameters must always be the latest issued.

Analysis

Client-target interactions

Here are classes of hosts interacting with the target server during an attack.  We show that legitimate clients should be able to talk to the server with little hinderance, while attackers either don’t get through, or must reveal their true IP addresses and expose themselves to transport- and application-level defences.

*define well-behaved, i.e. conforming to TCP flow control

Legitimate clients (not behind NAT)

The client’s CA has already received proof-of-work parameters for T, and continues to receive updates for them.  For the latest parameters, it intercepts the SYN and ACK packets from the client, generates and injects proof-of-work, and relays the packets.

Because of the proofs, such packets reach T with relatively little hinderance.  On receiving the ACK, T adds the client’s IP to a set of addresses for which it is issuing periodic ping cookies.

Subsequent packets from the client do not carry proofs, so must compete with attack traffic.  However, a ping cookie eventually comes through, and is echoed back ahead of the clients other packets.  As it passes through filters, they recognize the cookie, verify it, and add the client’s IP to their host sets.  Subsequent traffic from the host passes through unhindered.

If the client starts a new connection to T, and its address is recently refreshed in the whitelists, it can pass through unhindered, whether proofs are added to the handshake packets or not.

If a client is already talking to T when the attack starts, T can start issuing ping cookies for it straight away.  Its packets will shortly get whitelisted, and re-affirm to T that it is a well-behaved client, and deserves refreshing of its whitelist status.  This happens even if the client has no CA.

Spoofing attackers

The attacking host sends packets belonging to no particular connection, and with source addresses which the host cannot receive on.

If the host adds unique proofs of work to its packets, they get through the filters unhindered, but it can only send at the rate it can generate unique proofs.  If it uses the same proofs with several packets, only the first gets through, until that proof expires.  If it cycles through several proofs, these will become invalid after the MI issues new proof parameters.

Also, because the attacker is spoofing, it cannot form connections on any of the spoofed addresses.  Therefore, it cannot get T to issue ping cookies for it, and it cannot get any of the addresses it uses whitelisted.

Non-spoofing attackers not forming connections

The attacking host sends packets belonging to no particular connection, but using the address it normally receives on.

If the host adds unique proofs of work to its packets, they get through the filters unhindered, but it can only send at the rate it can generate unique proofs.  If it uses the same proofs with several packets, only the first gets through, until that proof expires.  If it cycles through several proofs, these will become invalid after the MI issues new proof parameters.

Also, although the attacker is not spoofing, it is not behaving well enough to form connections on its address.  Therefore, it cannot get T to issue ping cookies for it, and it cannot get any of the addresses it uses whitelisted.

Non-spoofing attackers forming connections

The attacker correctly tries to form connections with the server.

If it does not add proofs to its handshake packets, it will have difficulty forming connections.  As long as it does not form connections, it cannot get whitelisted.

Until the host forms a connection, if it adds unique proofs of work to all its packets, they will get through the filters unhindered, but it can only send at the rate it can generate unique proofs.  If it uses the same proofs with several packets, only the first gets through, until that proof expires.  If it cycles through several proofs, these will become invalid after the MI issues new proof parameters.

After it forms a connection and gets itself whitelisted, it can send packets that reach T using the same source address at any rate.  However,since these have a greater chance of reaching the server, the server can observe them and stop whitelisting the attacker’s IP address.

If the attacker obeys the flow-control rules of the connection, it cannot attack with bandwidth, and can only attack at upper layers.

Mixed spoofing and non-spoofing attackers

This may occur:

  • when an attacker establishes a well-behaved connection, and simultaneously sends spoofed-source packets; or
  • when an attacker spoofs a legitimate client; or
  • when an attacker and a legitimate client are behind the same NAT.

In the first case, the well-behaved connection cannot be used for bandwidth saturation, and the spoofed packets are not using whitelisted addresses, so they don’t do any damage.  It is practically the same as two independent cases, one well-behaved, the other attacking, each handled independently as described in previous sections.

In the second case, T may initially see many of the attack packets, because it has whitelisted the legitimate client.  However, as a result of seeing this bad behaviour, it now stops refreshing the whitelisting of the client’s host address, but can start whitelisting its full host-port tuple.  This has the disadvantage that, once the client’s host whitelisting has decayed, the client will have to add proofs to the handshake packets of all its connections.  However, the attacker’s packets no longer get through unless it can guess which ports the client is using.

The third case is almost identical to the second, although it is the NAT that makes the attacker appear to spoof the legitimate client.

Incomplete deployment

It is sufficient that deployment covers all potential legitimate clients, and all ASes between those clients and the target.  In practice, this probably means universal deployment, though the mechanism might yield partial success in blocking illegitimate traffic under partial deployment.  In particular, deployment is not necessary along paths where there is only illegitimate traffic.

Indeed, clients are encouraged to take part, because it promises better service, compared to non-participating clients, when that service is under attack.

Why it should work

Essentially, an attacker is forced to do something not in its interests (use its real IP address), or subject its packets to aggressive filtering.  Getting whitelisted requires forming connections, and that in turn require not spoofing.  Staying whitelisted requires behaving properly, at least upto the transport level, because the server can veto an IP address at this level simply by closing any existing connections from it and refusing new connections. Attempting to attack even higher up risks detection and blocking by the attacker’s (now revealed) IP address.

Trying to by-pass filtering by using proof-of-work requires the attacker to slow down, or to send packets more frequently but with inadequate proofs.

We can consider three overlapping sets of peers interacting with the target server T:

  1. those who are not spoofing,
  2. those who are participating in ping cookies,
  3. those who are attacking.

Consider those who are participating in cookies, but are also spoofing.  Since they are spoofing, they don’t receive and echo back the cookies for the addresses they are spoofing.  The return path is therefore not necessarily the right path for the attacker to use to send further packets with the same spoofed address.  Effectively, it is not participating in cookies, so this area is (in most cases) empty.

We now label the five remaining areas, A-E.  A and B are non-spoofing legitimate clients, but B is participating in cookies.  C and D are also not spoofing, but intend to do damage; D is participating in cookies.  Finally, E is attacking and spoofing; it can’t be participating in cookies.

Ideally, we would want a method of discrimination along the edge of the third set (separating our good clients A and B from the others), but we can’t achieve that directly, because this discrimination must be performed within the network.

Instead, we can discriminate on the border of the second set, i.e. between those who are participating in ping cookies (B and D), and those who are not (A, C, E).  Now consider that anyone in the participating sets (B and D) can be excluded from it under the control of the target, and forced into A and C.  This effectively separates B and D (i.e. along our ideal discriminant), as the server can identify bad clients in D, and re-classify them as C.

Outstanding issues

What is the impact of partial deployment?  How many ASes must implement the mechanism?  Does every AS need to implement filtering, or just be prepared to relay filtering to its neighbours?  What is the prospect of wide deployment of CAs?

Can filters be overloaded by bogus ping cookies and proofs of work?

Is there a cypher which is either extremely difficult to break, or can be very easily recycled (i.e. key pairs can be discarded and re-generated with low cost), yet which can be verified cheaply?