Botz-4-Sale: Surviving Organized DDoS Attacks That Mimic Flash Crowds

Srikanth Kandula

  Dina Katabi

       Matthias Jacob

Arthur Berger

         MIT        Princeton MIT/Akamai

{kandula,dina}@csail.mit.edu         mjacob@princeton.edu awberger@mit.edu

Abstract- Recent denial of service attacks are mounted by professionals using Botnets of tens of thousands of compromised machines. To circumvent detection, attackers are increasingly moving away from bandwidth floods to attacks that mimic the Web browsing behavior of a large number of clients, and target expensive higher-layer resources such as CPU, database and disk bandwidth. The resulting attacks are hard to defend against using standard techniques, as the malicious requests differ from the legitimate ones in intent but not in content.

We present the design and implementation of Kill-Bots, a kernel extension to protect Web servers against DDoS attacks that masquerade as flash crowds. Kill-Bots provides authentication using graphical tests but is different from other systems that use graphical tests. First, Kill-Bots uses an intermediate stage to identify the IP addresses that ignore the test, and persistently bombard the server with requests despite repeated failures at solving the tests. These machines are bots because their intent is to congest the server. Once these machines are identified, Kill-Bots blocks their requests, turns the graphical tests off, and allows access to legitimate users who are unable or unwilling to solve graphical tests. Second, Kill-Bots sends a test and checks the client's answer without allowing unauthenticated clients access to sockets, TCBs, and worker processes. Thus, it protects the authentication mechanism from being DDoSed. Third, Kill-Bots combines authentication with admission control. As a result, it improves performance, regardless of whether the server overload is caused by DDoS or a true Flash Crowd.

1 Introduction

Denial of service attacks are increasingly mounted by professionals in exchange for money or material benefits (35). Botnets of thousands of compromised machines are rented by the hour on IRC and used to DDoS online businesses to extort money or obtain commercial advantages (26,17,45). The DDoS business is thriving; increasingly aggressive worms can infect up to 30,000 new machines per day. These zombies/bots are then used for DDoS and other attacks (17,43). In particular, (35) reports that a Massachusetts businessman "paid members of the computer underground to launch organized, crippling DDoS attacks against three of his competitors." The attackers used Botnets of more than 10,000 machines. When the simple SYN flood failed, they launched an HTTP flood, downloading many large image files from the victim server. "At its peak, the onslaught allegedly kept the victim company offline for two weeks." In another instance, attackers ran a massive number of queries through the victim's search engine, bringing the server down (35).

To circumvent detection, attackers are increasingly moving away from pure bandwidth floods to stealthy DDoS attacks that masquerade as flash crowds. They profile the victim server and mimic legitimate Web browsing behavior of a large number of clients. These attacks target higher layer server resources like sockets, disk bandwidth, database bandwidth and worker processes (13,35,24). We call such DDoS attacks CyberSlam, after the first FBI case involving DDoS-for-hire (35). The MyDoom worm (13), many DDoS extortion attacks (24), and recent DDoS-for-hire attacks are all instances of CyberSlam (12,35,24).

Countering CyberSlam is a challenge because the requests originating from the zombies are indistinguishable from the requests generated by legitimate users. The malicious requests differ from the legitimate ones in intent but not in content. The malicious requests arrive from a large number of geographically distributed machines; thus they cannot be filtered on the IP prefix. Also, many sites do not use passwords or login information, and even when they do, passwords could be easily stolen from the hard disk of a compromised machine. Further, checking the site-specific password requires establishing a connection and allowing unauthenticated clients to access socket buffers, TCBs, and worker processes, making it easy to mount an attack on the authentication mechanism itself. Defending against CyberSlam using computational puzzles, which require the client to perform heavy computation before accessing the site, is not effective because computing power is usually abundant in a Botnet. Finally, in contrast to bandwidth attacks (40,27), it is difficult to detect big resource consumers when the attack targets higher-layer bottlenecks such as CPU, database, and disk because commodity operating systems do not support fine-grained resource monitoring (15,48). Further, an attacker can resort to mutating attacks which cycle between different bottlenecks (25).

This paper proposes Kill-Bots, a kernel extension that protects Web servers against CyberSlam attacks. It is targeted towards small or medium online businesses as well as non-commercial Web sites. Kill-Bots combines two functions: authentication and admission control.

(a) Authentication: The authentication mechanism is activated when the server is overloaded. It has 2 stages:

In , Kill-Bots requires each new session to solve a reverse Turing test to obtain access to the server. Humans can easily solve a reverse Turing test, but zombies cannot. We focus on graphical tests (47), though Kill-Bots works with other types of reverse Turing tests. Legitimate clients either solve the test, or try to reload a few times and, if they still cannot access the server, decide to come back later. In contrast, the zombies which want to congest the server continue sending new requests without solving the test. Kill-Bots uses this difference in behavior between legitimate users and zombies to identify the IP addresses that belong to zombies and drop their requests. Kill-Bots uses SYN cookies to prevent spoofing of IP addresses and a Bloom filter to count how often an IP address failed to solve a test. It discards requests from a client if the number of its unsolved tests exceeds a given threshold (e.g., 32).
Kill-Bots switches to after the set of detected zombie IP addresses stabilizes (i.e., the filter does not learn any new bad IP addresses). In this stage, tests are no longer served. Instead, Kill-Bots relies solely on the Bloom filter to drop requests from malicious clients. This allows legitimate users who cannot, or do not want to solve graphical puzzles access to the server despite the ongoing attack.

(b) Admission Control: Kill-Bots combines authentication with admission control. A Web site that performs authentication to protect itself from DDoS encounters a general problem: It has a certain pool of resources, which it needs to divide between authenticating new arrivals and servicing clients that are already authenticated. Devoting excess resources to authentication might leave the server unable to fully serve the authenticated clients, and hence, wastes server resources on authenticating new clients that it cannot serve. On the other hand, devoting excess resources to serving authenticated clients reduces the rate at which new clients are authenticated and admitted, leading to idle periods with no clients in service.

Kill-Bots computes the admission probability $\alpha$ that maximizes the server's goodput (i.e., the optimal probability with which new clients should be authenticated). It also provides a controller that allows the server to converge to the desired admission probability using simple measurements of the server's utilization. Admission control is a standard mechanism for combating server overload (18,46,48), but Kill-Bots examines admission control in the context of malicious clients and connects it with client authentication.

**Figure 1:** **Kill-Bots Overview. Note that graphical puzzles are only served during .**
$\begin{figure}\centering \includegraphics*[width=3in]{figures/intro-flowchart.eps} \vskip -.05in\vskip -.15in \end{figure}$

Fig. 1 summarizes Kill-Bots. When a new connection arrives, it is first checked against the list of detected zombie addresses. If the IP address is not recognized as a zombie, Kill-Bots admits the connection with probability $\alpha=f(load)$ . In , admitted connections are served a graphical puzzle. If the client solves the puzzle, it is given a Kill-Bots HTTP cookie which allows its future connections, for a short period, to access the server without being subject to admission control and without having to solve new puzzles. In , Kill-Bots no longer issues puzzles; admitted connections are immediately given a Kill-Bots HTTP cookie.

Kill-Bots has a few important characteristics.

Kill-Bots addresses graphical tests' bias against users who are unable or unwilling to solve them. Prior work that employs graphical tests ignores the resulting user inconvenience as well as their bias against blind and inexperienced humans (32). Kill-Bots is the first system to employ graphical tests to distinguish humans from automated zombies, while limiting their negative impact on legitimate users who cannot or do not want to solve them.
Kill-Bots sends a puzzle without giving access to TCBs or socket buffers. Typically, sending the client a puzzle requires establishing a connection and allowing unauthenticated clients to access socket buffers, TCB's, and worker processes, making it easy to DoS the authentication mechanism itself. Ideally, a DDoS protection mechanism minimizes the resources consumed by unauthenticated clients. Kill-Bots introduces a modification to the server's TCP stack that can send a 1-2 packet puzzle at the end of the TCP handshake without maintaining any connection state, and while preserving TCP congestion control semantics.
Kill-Bots improves performance, regardless of whether server overload is caused by DDoS attacks or true Flash Crowds, making it the first system to address both DDoS and Flash Crowds within a single framework. This is an important side effect of using admission control, which allows the server to admit new connections only if it can serve them.
In addition, Kill-Bots requires no modifications to client software, is transparent to Web caches, and is robust to a wide variety of attacks (see §4).

We implement Kill-Bots in the Linux kernel and evaluate it in the wide-area network using PlanetLab. Additionally, we conduct an experiment on human users to quantify user willingness to solve graphical puzzles to access a Web server. On a standard 2GHz Pentium IV machine with 1GB of memory and 512kB L2 cache running a mathopd (9) web-server on top of a modified Linux 2.4.10 kernel, Kill-Bots serves graphical tests in 31 $\mu$ s; identifies malicious clients using the Bloom filter in less than 1 $\mu$ s; and can survive DDoS attacks of up to 6000 HTTP requests per second without affecting response times. Compared to a server that does not use Kill-Bots, our system survives attack rates 2 orders of magnitude higher, while maintaining response times around their values with no attack. Furthermore, in our Flash Crowds experiments, Kill-Bots delivers almost twice as much goodput as the baseline server and improves response times by 2 orders of magnitude. These results are for an event driven OS that relies on interrupts. The per-packet cost of taking an interrupt is fairly large $\approx 10\mu$ s (23). We expect better performance with polling drivers (30).

2 Threat Model

Kill-Bots aims to improve server performance under CyberSlam attacks, which mimic legitimate Web browsing behavior and consume higher layer server resources such as CPU, memory, database and disk bandwidth. Prior work proposes various filters for bandwidth floods (7,27,16,21); Kill-Bots does not address these attacks. Attacks on the server's DNS entry or on the routing entries are also outside the scope of this paper.

We assume the attacker may control an arbitrary number of machines that can be widely distributed across the Internet. The attacker may also have arbitrarily large CPU and memory resources. An attacker cannot sniff packets on the server's local network or on a major link that carries traffic for a large number of legitimate users. Further, the attacker does not have physical access to the server itself. Finally, the zombies cannot solve the graphical test and the attacker is not able to concentrate a large number of humans to continuously solve puzzles.

3 The Design of Kill-Bots

Kill-Bots is a kernel extension to Web servers. It combines authentication with admission control.

3.1 Authentication

During periods of severe overload, Kill-Bots authenticates clients before granting them service. The authentication has two stages that use different authentication mechanisms. Below, we explain in detail.

**Figure 2:** **A Kill-Bots server transitions between NORMAL and SUSPECTED_ATTACK modes based on server load.**
$\begin{figure}\centering \includegraphics*[width=3in]{figures/states.eps} \vskip -.05in\vskip -.0in \end{figure}$

3.1.1 Activating the Authentication Mechanism

A Kill-Bots Web-server is in either of two modes, NORMAL or SUSPECTED_ATTACK, as shown in Fig. 2. When the Web server perceives resource depletion beyond an acceptable limit, $\kappa_1$ , it shifts to the SUSPECTED_ATTACK mode. In this mode, every new connection has to solve a graphical test before allocation of any state on the server takes place. When the user correctly solves the test, the server grants the client access to the server for the duration of an HTTP session. Connections that began before the server switched to the SUSPECTED_ATTACK mode continue to be served normally until they terminate. However, the server will timeout these connections if they last longer than a certain duration (our implementation uses 5 minutes). The server continues to operate in the SUSPECTED_ATTACK mode until the load goes down to its normal range and crosses a particular threshold $\kappa_2 < \kappa_1$ . The load is estimated using an exponential weighted average. The values of $\kappa_1$ and $\kappa_2$ will vary depending on the normal server load. For example, if the server is provisioned to work with 40% utilization, then one may choose $\kappa_1=70\%$ and $\kappa_2=50\%$ .

A couple of points are worth noting. First, the server behavior is unchanged in the NORMAL mode, and thus the system has no overhead in the common case of no attack. Second, an attack that forces Kill-Bots to switch back-and-forth between the two modes is harmless because the cost for switching is minimal. The only potential switching cost is the need to timeout very long connections that started in the NORMAL mode. Long connections that started in a prior SUSPECTED_ATTACK mode need not be timed out because their users have already been authenticated.

**Figure 4:** **Screenshot of a graphical puzzle.**
$\begin{figure}\begin{center} \includegraphics*[width=3in]{figures/abc.eps} \vskip -.1in\vskip -.15in \end{center}\end{figure}$

3.1.2 Stage 1: CAPTCHA-Based Authentication

After switching to the SUSPECTED_ATTACK mode, the server enters

, in which it authenticates clients using graphical tests, i.e., CAPTCHAs (47), as in Fig. 4.

(a) Modifications to Server's TCP Stack: Upon the arrival of a new HTTP request, Kill-Bots sends a graphical test and validates the corresponding answer without allocating any TCBs, socket buffers, or worker processes on the server. We achieve this by a minor modification to the server TCP stack. As shown in Fig. 3, similarly to a typical TCP connection, a Kill-Bots server responds to a SYN packet with a SYN cookie. The client receives the SYN cookie, increases its congestion window to two packets, transmits a SYNACKACK and the first data packet that usually contains the HTTP request. In contrast to a typical connection, the Kill-Bots kernel does not create a new socket upon completion of the TCP handshake. Instead, the SYNACKACK is discarded because the first data packet from the client repeats the same acknowledgment sequence number as the SYNACKACK.

**Figure 3:** **Kill-Bots modifies server's TCP stack to send tests to new clients without allocating a socket or other connection resources.**
$\begin{figure}\centering \includegraphics*[width=2.8in]{figures/HG_connect.eps} \vskip -0.05in\vskip -.05in \end{figure}$

When the server receives the client's data packet, it checks whether it is a puzzle answer. (An answer has an HTTP request of the form GET /validate?answer=ANSWER, where is the puzzle id.) If the packet is not an answer, the server replies with a new graphical test, embedded in an HTML form (Fig. 5). Our implementation uses CAPTCHA images that fit in 1-2 packets. Then, the server immediately closes the connection by sending a FIN packet and does not wait for the FIN ack. On the other hand, the client packet could be a puzzle answer. When a human answers the graphical test, the HTML form (Fig. 5) generates an HTTP request GET /validate?answer=ANSWER that reports the answer to the server. If the packet is an answer, the kernel checks the cryptographic validity of the ANSWER (see (c) below). If the check succeeds, a socket is established and the request is delivered to the application.

**Figure 5:** **HTML source for the puzzle**
$\begin{figure}\begin{center} \includegraphics*[width=2.8in]{figures/form.eps} \vskip -.05in\end{center}\vskip -0.15in \end{figure}$

Note the above scheme preserves TCP congestion control semantics, does not require modifying the client software, and prevents attacks that hog TCBs and sockets by establishing connections that exchange no data.

(b) One Test Per Session: It would be inconvenient if legitimate users had to solve a puzzle for every HTTP request or every TCP connection. The Kill-Bots server gives an HTTP cookie to a user who solves the test correctly. This cookie allows the user to re-enter the system for a specific period of time, (in our implementation, min). If a new HTTP request is accompanied by a cryptographically valid HTTP cookie, the Kill-Bots server creates a socket and hands the request to the application without serving a new graphical test.

**Figure 6:** **Kill-Bots Token**
$\begin{figure}\begin{center} \includegraphics*[width=3in]{figures/pid.eps} \vskip -.05in\end{center}\vskip -0.25in \end{figure}$

(c) Cryptographic Support: When the Kill-Bots server issues a puzzle, it creates a Token as shown in Fig. 6. The token consists of a 32-bit puzzle ID , a 96-bit random number , the 32-bit creation time of the token, and a 32-bit collision-resistant hash of , , and along with the server secret. The token is embedded in the same HTML form as the puzzle (Fig. 6) and sent to the client.

When a user solves the puzzle, the browser reports the answer to the server along with the Kill-Bots token. The server first verifies the token by recomputing the hash. Second, the server checks the Kill-Bots token to ensure the token was created no longer than 4 minutes ago. Next, the server checks if the answer to the puzzle is correct. If all checks are successful, the server creates a Kill-Bots HTTP cookie and gives it to the user. The cookie is created from the token by updating the token creation time and recording the token in the table of valid Kill-Bots cookies. Subsequently, when a user issues a new TCP connection with an existing Kill-Bots cookie, the server validates the cookie by recomputing the hash and ensuring that the cookie has not expired, i.e., no more than 30 minutes have passed since cookie creation. The Kill-Bots server uses the cookie table to keep track of the number of simultaneous HTTP requests that belong to each cookie.

(d) Protecting Against Copy Attacks: What if the attacker solves a single graphical test and distributes the HTTP cookie to a large number of bots? Kill-Bots introduces a notion of per-cookie fairness to address this issue. Each correctly answered graphical test allows the client to execute a maximum of 8 simultaneous HTTP requests. Distributing the cookie to multiple zombies makes them compete among themselves for these 8 connections. Most legitimate Web browsers open no more than 8 simultaneous connections to a single server (20).

3.1.3 Stage 2: Authenticating Users Who Do Not Answer CAPTCHAs

An authentication mechanism that relies solely on CAPTCHAs has two disadvantages. First, the attacker can force the server to continuously send graphical tests, imposing an unnecessary overhead on the server. Second, and more important, humans who are unable or unwilling to solve CAPTCHAs may be denied service.

To deal with this issue, Kill-Bots distinguishes legitimate users from zombies by their reaction to the graphical test rather than their ability to solve it. Once the zombies are identified, they are blocked from using the server. When presented with a graphical test, legitimate users may react as follows: (1) they solve the test, immediately or after a few reloads; (2) they do not solve the test and give up on accessing the server for some period, which might happen immediately after receiving the test or after a few attempts to reload. The zombies have two options; (1) either imitate human users who cannot solve the test and leave the system after a few trials, in which case the attack has been subverted, or (2) keep sending requests though they cannot solve the test. However, by continuing to send requests without solving the test, the zombies become distinguishable from legitimate users, both human and machine.

In , Kill-Bots tracks how often a particular IP address has failed to solve a puzzle. It maintains a Bloom filter (10) whose entries are 8-bit counters. Whenever a client is given a graphical puzzle, its IP address is hashed and the corresponding entries in the Bloom filter are incremented. In contrast, whenever a client comes back with a correct answer, the corresponding entries in the Bloom filter are decremented. Once all the counters corresponding to an IP address reach a particular threshold $\xi$ (in our implementation $\xi$ =32), the server drops all packets from that IP and gives no further tests to that client.

When the attack starts, the Bloom filter has no impact and users are authenticated using graphical puzzles. Yet, as the zombies receive more puzzles and do not answer them, their counters pile up. Once a client has $\xi$ unanswered puzzles, it will be blocked. As more zombies get blocked, the server's load will decrease and approach its normal level. Once this happens the server no longer issues puzzles; instead it relies solely on the Bloom filter to block requests from the zombie clients. We call this mode . Sometimes the attack rate is so high that even though the Bloom filter catches all attack packets, the overhead of receiving the packets by the device driver dominates. If the server notices that both the load is stable and the Bloom filter is not catching any new zombie IPs, then the server concludes that the Bloom filter has caught all attack IP addresses and switches off issuing puzzles, i.e., the server switches to . If subsequently the load increases, then the server resumes issuing puzzles.

In our experiments, the Bloom filter detects and blocks all offending clients within a few minutes. In general, the higher the attack rate, the faster the Bloom filter will detect the zombies and block their requests. A full description of the Bloom filter is in §5. We detail Kill-Bots interaction with Web proxies/NATs in §8.

3.2 Admission Control

A Web site that performs authentication to protect itself from DDoS has to divide its resources between authenticating new clients and servicing those already authenticated. Devoting excess resources to authentication might leave the server unable to fully service the authenticated clients; thereby wasting the resources on authenticating new clients that it cannot serve. On the other hand, devoting excess resources to serving authenticated clients may cause the server to go idle because it hasn't authenticated enough new clients. Thus, there is an optimal authentication probability, $\alpha^*$ , that maximizes the server's goodput.

In (39), we have modeled a server that implements an authentication procedure in the interrupt handler. This is a standard location for packet filters and kernel firewalls (38,29,3). It allows dropping unwanted packets as early as possible. Our model is fairly general and independent of how the authentication is performed. The server may be checking client certificates, verifying their passwords, or asking them to solve a puzzle. Furthermore, we make no assumptions about the distribution or independence of the inter-arrival times of legitimate sessions, or of attacker requests, or of service times.

The model in (39) computes the optimal probability with which new clients should be authenticated. Below, we summarize these results and discuss their implications. Table 1 describes our variables.

Table 1: Variables used in the analysis

Var	Description
$\alpha$	Admission Prob. Drop probability= $1-\alpha$ .
$\lambda_a$	Arrival rate of attacking HTTP requests
$\lambda_s$	Arrival rate of legitimate HTTP sessions
$\frac{1}{\mu_p}$	Mean time to serve a puzzle
$\frac{1}{\mu_h}$	Mean time to serve an HTTP request
$\rho_p$	Fraction of server time spent in authenticating clients
$\rho_h$	Fraction of server time spent in serving authenticated clients
$\rho_i$	Fraction of time the server is idle
$\frac{1}{q}$	Mean # of requests per legitimate session

When a request from an unauthenticated client arrives, the server attempts to authenticate it with probability $\alpha$ and drop it with probability $1-\alpha$ . The optimal value of $\alpha$ -i.e., the value that maximizes the server's goodput (the CPU time spent on serving HTTP requests) is:

$\begin{displaymath} {\small \alpha^* = \min \left(\frac {q \mu_p}{(B+q)\lambda_s... ...ambda_a},1 \right), \mbox { and } B=\frac {\mu_p} {\mu_h}, } \end{displaymath}$

(1)

where $\lambda_a$ is the attack request rate, $\lambda_s$ is the legitimate users' session rate, $\frac{1}{\mu_p}$ is the average time taken to serve a puzzle, $\frac{1}{\mu_h}$ is the average time to serve an HTTP request, and $\frac{1}{q}$ is the average number of requests in a session. This yields an optimal server goodput, which is given by:

$\begin{displaymath} {\small \rho_g^* = \min\left(\frac{\lambda_s}{q\mu_h}, \frac... ...} {(1+\frac{q}{B})\lambda_s + q \frac{\lambda_a}{B}}\right). } \end{displaymath}$

(2)

In comparison, a server that does not use authentication has goodput:

$\begin{displaymath} {\small \rho^b_g = \min\left(\frac{\lambda_s}{q\mu_h}, \frac{\lambda_s} {\lambda_s +q\lambda_a}\right). } \end{displaymath}$

(3)

To combat DDoS, authentication should consume fewer resources than service, i.e., $\mu_p » \mu_h$ . Hence,

, and the server with authentication can survive attack rates that are

times larger without loss in goodput.

Also, compare the optimal goodput, $\rho^*_g$ , with the goodput of a server that implements authentication without admission control (i.e., $\alpha=1$ ) given by:

$\begin{displaymath} {\small \rho^a_g = \min \left(\frac{\lambda_s}{q\mu_h}, \ma... ...t( 0, 1 - \frac{\lambda_a+\lambda_s} {\mu_p} \right)\right). } \end{displaymath}$

(4)

For attack rates, $\lambda_a > \mu_p$ , the goodput of the server with no admission goes to zero, whereas the goodput of the server that uses admission control decreases gracefully.

**Figure 7:** Comparison of the goodput of a base/unmodified server with a server that uses authentication only (TOP) and a server that uses both authentication & admission control (BOTTOM). Server load due to legitimate requests is 50%. The graphs show that authentication improves goodput, is even better with admission control, particularly at high attack rates.
$\begin{figure}\centering \epsfig{file=figures/analysis3.eps,width=3in} \vskip -.05in\vskip -.15in \end{figure}$

Fig. 7 illustrates the above results: A Pentium-IV, 2.0GHz 1GB RAM, machine serves 2-pkt puzzles at a peak rate of 6000/sec ( $\mu_p = 6000$ ). Assume, conservatively, that each HTTP request fetches 15KB files ( $\mu_h = 1000$ ), that a user makes requests in a session () and that the normal server load is 50%. By substituting in Eqs. 3, 4, and 2, Fig. 7 compares the goodput of a server that does not use authentication ( base server) with the goodput of a server with authentication only ( $\alpha=1$ ), and a server with both authentication and admission control ( $\alpha=\alpha^*$ ). The top graph shows that authentication improves server goodput. The bottom graph shows the additional improvement from admission control.

**Figure 8:** **Phase plot showing how Kill-Bots adapts the admission probability to operate at a high goodput**
$\begin{figure}\centering \includegraphics*[width=2.5in]{figures/controller.eps} \vskip -.05in\vskip -.1in \end{figure}$

3.3 Adaptive Admission Control

How to make the server function at the optimal admission probability? Computing $\alpha^*$ from Eq. 1 requires values for parameters that are typically unknown at the server and change over time, such as the attack rate, $\lambda_a$ , the legitimate session rate, $\lambda_s$ , and the number of requests per session, $\frac{1}{q}$ .

To deal with the above difficulty, Kill-Bots uses an adaptive scheme. Based on simple measurements of the server's idle cycles, Kill-Bots adapts the authentication probability $\alpha$ to gradually approach $\alpha^*$ . Let $\rho_i , \rho_p, \rho_h$ denote the fraction of time the server is idle, serving puzzles and serving HTTP requests respectively. We have:

$\begin{displaymath} \rho_h+\rho_p+\rho_i=1. \end{displaymath}$

(5)

If the current authentication probability $\alpha < \alpha^*$ , the authenticated clients are too few and the server will spend a fraction of its time idle, i.e., $\rho_i > 0$ . In contrast, if $\alpha > \alpha^*$ , the server authenticates more clients than it can serve and $\rho_i = 0$ . The optimal probability $\alpha^*$ occurs when the idle time transitions to zero. Thus, the controller should increase $\alpha$ when the server experiences a substantial idle time and decrease $\alpha$ otherwise.

However, the above adaptation rule is not as simple as it sounds. We use Fig. 8 to show the relation between the fraction of time spent on authenticating clients $\rho_p$ and that spent serving HTTP requests $\rho_h$ . The line labeled ``Zero Idle Cycles'' refers to the states in which the system is highly congested $\rho_i =0 \rightarrow \rho_p +\rho_h =1$ . The line labeled ``Underutilized'' refers to the case in which the system has some idle cycles, i.e., $\alpha < \alpha^*$ . In this case, a fraction $\alpha$ of all arrivals are served puzzles. The average time to serve a puzzle is $\frac{1}{\mu_p}$ . Thus, the fraction of time the server is serving puzzles $\rho_p = \alpha \frac{\lambda_s+\lambda_a}{\mu_p}.$ Further, an $\alpha$ fraction of legitimate sessions have their HTTP requests served. Thus, the fraction of time the server serves HTTP is $\rho_h = \alpha \frac{\lambda_s}{ q \mu_h},$ where $\frac{1}{\mu_h}$ is the per-request average service time, and $\frac{1}{q}$ is the average number of requests in a session. Consequently,

$\begin{displaymath}\forall \alpha < \alpha^* : \rho_h = \left(\frac{\lambda_s}{\lambda_s+\lambda_a} \frac{\mu_p}{q\mu_h}\right) \rho_p, \end{displaymath}$

which is the line labeled ``Underutilized'' in Fig. 8. As the fraction of time the system is idle $\rho_i$ changes, the system state moves along the solid line segments A $\rightarrow$ B $\rightarrow$ C. Ideally, one would like to operate the system at point B which maximizes the system's goodput, $\rho_g=\rho_h$ , and corresponds to $\alpha=\alpha^*$ . However, it is difficult to operate at point B because the system cannot tell whether it is at B or not; all points on the segment B-C exhibit $\rho_i = 0$ . It is easier to stabilize the system at point E where the system is slightly underutilized because small deviations from E exhibit a change in the value of $\rho_i$ , which we can measure. We pick E such that the fraction of idle time at E is $\beta = \frac{1}{8}$ .

Next, we would like to decide how aggressively to adapt $\alpha$ . Substituting the values of $\rho_p$ and $\rho_h$ from the previous paragraph in Eq. 5 yields:

$\begin{displaymath} \forall \alpha < \alpha^*: \alpha \left(\frac{\lambda_a + \lambda_s}{\mu_p} + \frac{\lambda_s}{q\mu_h}\right) = 1 - \rho_i. \end{displaymath}$

Hence, $\forall \alpha[t], \alpha[t+\tau] <\alpha^*$ :

$\begin{displaymath} \frac{\alpha[t+\tau]}{\alpha[t]}=\frac{1-\rho_i[t+\tau]}{1-\... ...ac{\Delta\alpha}{\alpha[t]}=\frac{\Delta \rho_i}{1-\rho_i[t]}, \end{displaymath}$

where $\alpha[t]$ and $\alpha[t+\tau]$ correspond to the values at time

and $\tau$ seconds later. Thus, every $\tau$ =10s, we adapt the admission probability according to the following rules:

$\begin{displaymath} \Delta \alpha = \left\{ \begin{array}{ll} \gamma_1 \alpha \f... ...e{.1cm}\\ - \gamma_3 \alpha. & \rho_i = 0 \end{array} \right. \end{displaymath}$

(6)

where $\gamma_1$ , $\gamma_2$ , and $\gamma_3$ are constants, which Kill-Bots set to $\frac{1}{8}$ , $\frac{1}{4}$ , and $\frac{1}{4}$ respectively. The above rules move $\alpha$ proportionally to how far the system is from the chosen equilibrium point E, unless there are no idle cycles. In this case, $\alpha$ is decreased aggressively to go back to the stable regime around point E.

4 Security Analysis

This section discusses Kill-Bots's ability to handle a variety of attacks from a determined adversary.

(a) Socially-engineered attack: In a socially-engineered attack, the adversary tricks a large number of humans to solving puzzles on his behalf. Recently, spammers employed this tactic to bypass graphical tests that Yahoo and Hotmail use to prevent automated creation of email accounts (4). The spammers ran a porn site that downloaded CAPTCHAs from the Yahoo/Hotmail email creation Web page, forced its own visitors to solve these CAPTCHAs before granting access, and used these answers to create new email accounts.

Kill-Bots is much more resilient to socially engineered attacks. In contrast to email account creation where the client is given an ample amount of time to solve the puzzle, puzzles in Kill-Bots expire 4 minutes after they have been served. Thus, the attacker cannot accumulate a store of answers from human users to mount an attack. Indeed, the attacker needs a continuous stream of visitors to his site to be able to sustain a DDoS attack. Further, Kill-Bots maintains a loose form of fairness among authenticated clients, allowing each of them a maximum of 8 simultaneous connections. To grab most of the server's resources, an attacker needs to maintain the number of authenticated malicious clients much larger than that of legitimate users. For this, the attacker needs to control a server at least as popular as the victim Web server. Such a popular site is an asset. It is unlikely that the attacker will jeopardize his popular site to DDoS an equally or less popular Web site. Furthermore, one should keep in mind that security is a moving target; by forcing the attacker to resort to socially engineered attacks, we made the attack harder and the probability of being convicted higher.

(b) Polluting the Bloom filter: The attacker may try to spoof his IP address and pollute the Bloom filter, causing Kill-Bots to mistake legitimate users as malicious. This attack however is not possible because SYN cookies prevent IP spoofing and Bloom filter entries are modified after the SYN cookie check succeeds (Fig. 10).

(c) Copy attacks: In a copy attack, the adversary solves one graphical puzzle, obtains the corresponding HTTP cookie, and distributes it to many zombies to give them access to the Web site. It might seem that the best solution to this problem is to include a secure one-way hash of the IP address of the client in the cookie. Unfortunately, this approach does not deal well with proxies or mobile users. Kill-Bots protects against copy attacks by limiting the number of in-progress requests per puzzle answer. Our implementation sets this limit to 8.

(d) Replay attacks: A session cookie includes a secure hash of the time it was issued and is only valid during a certain time interval. If an adversary tries to replay a session cookie outside its time interval it gets rejected. An attacker may solve one puzzle and attempt to replay the ``answer'' packet to obtain many Kill-Bots cookies. Recall that when Kill-Bots issues a cookie for a valid answer, the cookie is an updated form of the token (Fig 6). Hence, replaying the ``answer'' yields the same cookie.

(e) Database attack: The adversary might try to collect all possible puzzles and the corresponding answers. When a zombie receives a puzzle, it searches its database for the corresponding answer, and sends it back to the server. To protect from this attack, Kill-Bots uses a large number of puzzles and periodically replaces puzzles with a new set. Generation of the graphical puzzles is relatively easy (47). Further, the space of all possible graphical puzzles is huge. Building a database of these puzzles and their answers, distributing this database to all zombies, and ensuring they can search it and obtain answers within 4 minutes (lifetime of a puzzle) is very difficult.

(f) Concerns regarding in-kernel HTTP header processing: Kill-Bots does not parse HTTP headers; it pattern matches the arguments to the GET and the Cookie: fields against the fixed string validate and against a 192-bit Kill-Bots cookie respectively. The pattern-matching is done in-place, i.e. without copying the packet and is cheap; $<8\mu$ s per request (§6.1.2).

(g) Breaking the CAPTCHA: Prior work on automatically solving simple CAPTCHAs exists (33), but such programs are not available to the public for security reasons (33). However, when one type of CAPTCHAs get broken, Kill-Bots can switch to a different kind.

**Figure 9:** **A Modular representation of the Kill-Bots code.**
$\begin{figure}\centering \includegraphics*[height=1.5in]{figures/impl.eps} \vskip -.05in\vskip -.1in \end{figure}$

5 Kill-Bots System Architecture

Fig. 9 illustrates the key components of Kill-Bots, which we briefly describe below.

(a) The Puzzle Managerconsists of two components. First, a user-space stub that asynchronously generates new puzzles and notifies the kernel-space portion of the Puzzle Manager of their locations. Generation of the graphical puzzles is relatively easy (1), and can either be done on the server itself in periods of inactivity (at night) or on a different dedicated machine. Also puzzles may be purchased from a trusted third party. The second component is a kernel-thread that periodically loads new puzzles from disk into the in-memory Puzzle Table.

(b) The Request Filter (RF) processes every incoming TCP packet addressed to port 80. It is implemented in the bottom half of the interrupt handler to ensure that unwanted packets are dropped as early as possible.

**Figure 10:** **The path traversed by new sessions in Kill-Bots. This code-path is implemented by the Request Filter module.**
$\begin{figure}\centering \begin{tabular}{c} \includegraphics*[width=3.0in]{figur... ...n]{figures/flowchart2.eps} \end{tabular}\vskip -.05in\vskip -.1in \end{figure}$

Fig. 10 provides a flowchart representation of the RF code. When a TCP packet arrives for port 80, the RF first checks whether it belongs to an established connection in which case the packet is immediately queued in the socket's receive buffer and left to standard kernel processing. Otherwise the filter checks whether the packet starts a new connection (i.e., is it a SYN?), in which case, the RF replies with a SYNACK that contains a standard SYN cookie. If the packet is not a SYN, the RF examines whether it contains any data; if not, the packet is dropped without further processing. Next, the RF performs two inexpensive tests in an attempt to drop unwanted packets quickly. It hashes the packet's source IP address and checks whether the corresponding entries in the Bloom filter have all exceeded $\xi$ unsolved puzzles, in which case the packet is dropped. Otherwise, the RF checks that the acknowledgment number is a valid SYN cookie.

If the packet passes all of the above checks, the RF looks for 3 different possibilities: (1) this might be the first data packet from an unauthenticated client, and thus it goes through admission control and is dropped with probability $1-\alpha$ . If accepted, the RF sends a puzzle and terminates the connection immediately; (2) this might be from a client that has already received a puzzle and is coming back with an answer. In this case, the RF verifies the answer and assigns the client an HTTP cookie, which allows access to the server for a period of time; (3) it is from an authenticated client that has a Kill-Bots HTTP cookie and is coming back to retrieve more objects. If none of the above is true, the RF drops this packet. These checks are ordered according to their increasing cost to shed attackers as cheaply as possible.

(c) The Puzzle Table maintains the puzzles available to be served to users. To avoid races between writes and reads to the table, we divide the Puzzle Table into two memory regions, a write window and a read window. The Request Filter fetches puzzles from the read window, while the Puzzle Manager loads new puzzles into the write window periodically in the background. Once the Puzzle Manager loads a fresh window of puzzles, the read and write windows are swapped atomically.

(d) The Cookie Table maintains the number of concurrent connections for each HTTP cookie (limited to 8).

(e) The Bloom Filter counts unanswered puzzles for each IP address, allowing the Request Filter to block requests from IPs with more than $\xi$ unsolved puzzles. Our implementation sets $\xi=32$ . Bloom filters are characterized by the number of counters and the number of hash functions that map keys onto counters. Our implementation uses $N=2^{20}$ and . Since a potentially large set of keys (32-bit IPs), are mapped onto much smaller storage (N counters), Bloom filters are essentially lossy. This means that there is a non-zero probability that all counters corresponding to a legitimate user pile up to $\xi$ due to collisions with zombies. Assuming distinct zombies and uniformly random hash functions, the probability a legitimate client is classified as a zombie is approximately $(1 - e^{ -ka/N} )^k \approx {(\frac{ka}{N})^k}$ . Given our choice of and , this probability for 75,000 zombies is .

6 Evaluation

We evaluate a Linux-based kernel implementation of Kill-Bots in the wide-area network using PlanetLab.

6.1 Experimental Environment

(a) Web Server: The web server is a 2GHz P4 with 1GB RAM and 512kB L2 cache running an unmodified mathopd (9) web-server on top of a modified Linux 2.4.10 kernel. We chose mathopd because of its simplicity. The Kill-Bots implementation consists of (1) 300 lines of modifications to kernel code, mostly in the TCP/IP protocol stack and (2) 500 additional lines for implementing the puzzle manager, the bloom filter and the adaptive controller. To obtain realistic server workloads, we replicate both static and dynamic content served by two web-sites, the CSAIL web-server and a Debian mirror.

(b) Modeling Request Arrivals: Legitimate clients generate requests by replaying HTTP traces collected at the CSAIL web-server and a Debian mirror. Multiple segments of the trace are played simultaneously to control the load generated by legitimate clients. A zombie issues requests at a desired rate by randomly picking a URI (static/dynamic) from the content available on the server.

**Figure 11:** **Our Experimental Setup.**
$\begin{figure}\includegraphics*[width=3.0in]{figures/exp_setup.eps} \vskip -0.05in\vskip -0.1in \end{figure}$

(c) Experiment Setup: We evaluate Kill-Bots in the wide-area network using the setup in Fig. 11. The Web server is connected to a 100Mbps Ethernet. We launch CyberSlam attacks from 100 different nodes on PlanetLab using different port ranges to simulate multiple attackers per node. Each PlanetLab node simulates up to 256 zombies--a total of 25,600 attack clients. We emulate legitimate clients on machines connected over the Ethernet, to ensure that any difference in their performance is due to the service they receive from the Web server, rather than wide-area path variability.

(d) Emulating Clients: We use WebStone2.5 (2) to emulate both legitimate Web clients and attackers. WebStone is a benchmarking tool that issues HTTP requests to a web-server given a specific distribution over the requests. We extended WebStone in two ways. First, we added support for HTTP sessions, cookies, and for replaying requests from traces. Second, we need the clients to issue requests at specific rate independent of how the web-server responds to the load. For this, we rewrote WebStone's networking code using libasync (28), an asynchronous socket library.

6.1.1 Metrics

We evaluate Kill-Bots by comparing the performance of a base server (i.e., a server with no authentication) with its Kill-Bots mirror operating under the same conditions. Server performance is measured using these metrics:

(a) Goodput of legitimate clients: The number of bytes per second delivered to all legitimate client applications. Goodput ignores TCP retransmissions and is averaged over 30s windows.
(b) Response times of legitimate clients: The elapsed time before a request is completed or timed out. We timeout incomplete requests after 60s.
(c) Cumulative number of legitimate requests dropped: The total number of legitimate requests dropped since the beginning of the experiment.

6.1.2 Microbenchmarks

We run microbenchmarks on the Kill-Bots kernel to measure the time taken by the various modules. We use the x86 rdtsc instruction to obtain fine-grained timing information; rdtsc reads a hardware timestamp counter that is incremented once every CPU cycle. On our 2GHz web-server, this yields a resolution of

nanoseconds. The measurements are for CAPTCHAs of 1100 bytes.

Table 2: Kill-Bots Microbenchmarks

Function	CPU Latency
Bloom Filter Access	.7 $\mu s$
Processing HTTP Header	8 $\mu s$
SYN Cookie Check	11 $\mu s$
Serving puzzle	31 $\mu s$

Table 2 shows our microbenchmarks. The overhead for issuing a graphical puzzle is $\approx 40\mu$ s (process http header +serve puzzle), which means that the CPU can issue puzzles faster than the time to transmit a 1100B puzzle on our 100Mb/s Ethernet. However, the authentication cost is dominated by standard kernel code for processing incoming TCP packets, mainly the interrupts ( $\approx 10\mu$ s per packet (23), about 10 packets per TCP connection). Thus, the CPU is the bottleneck for authentication and as shown in §6.4, performing admission control based on CPU utilization is beneficial.

Note also that checking the Bloom filter is much cheaper than other operations including the SYN cookie check. Hence, for incoming requests, we perform the Bloom filter check before the SYN cookie check (Fig. 14). In , the Bloom filter drops all zombie packets; hence performance is limited by the cost for interrupt processing and device driver access. We conjecture that using polling drivers (23,30) will improve performance at high attack rates.

6.2 Kill-Bots under CyberSlam

We evaluate the performance of Kill-Bots under CyberSlam attacks, using the setting described in §6.1. We also assume only 60% of the legitimate clients solve the CAPTCHAs; the others are either unable or unwilling to solve them. This is supported by the results in §6.6.

**Figure 12:** Kill-Bots under CyberSlam: Goodput and average response time of legitimate users at different attack rates for both a base server and its Kill-Bots version. Kill-Bots substantially improves server performance at high attack rates.
$\begin{figure}\begin{tabular}{cc} \epsfig{file=figures/f1tp.eps, width=3in} & ... ...esize (b) Response Time} \end{tabular}\vskip -0.05in\vskip -.1in \end{figure}$

Fig. 12 compares the performance of Kill-Bots with a base (i.e., unmodified) server, as the attack request rate increases. Fig. 12a shows the goodput of both servers. Each point on the graph is the average goodput of the server in the first twelve minutes after the beginning of the attack. A server protected by Kill-Bots endures attack rates multiple orders of magnitude higher than the base server. At very high attack rates, the goodput of the Kill-Bots server decreases as the cost of processing interrupts becomes excessive. Fig. 12b shows the response time of both web servers. The average response time experienced by legitimate users increases dramatically when the base server is under attack. In contrast, the average response time of users accessing a Kill-Bots server is unaffected by the ongoing attack.

**Figure 13:** Comparison of Kill-Bots' performance to server with no attackers when only 60% of the legitimate users solve puzzles. Attack lasts from 600s to 2400s. (a) Goodput quickly improves once bloom catches all attackers. (b) Response times improve as soon as the admission control reacts to the beginning of attack. (c) Admission control is useful both in and in , after bloom catches all zombies. Puzzles are turned off when Kill-Bots enters improving goodput.
$\begin{figure}\begin{tabular}{c} \includegraphics[width=3.0in]{figures/1.6sum_gp... ...c) Admission probability} \end{tabular}\vskip -.05in\vskip -.15in \end{figure}$

Fig. 13 shows the dynamics of Kill-Bots during a CyberSlam attack, with $\lambda_a=4000$ req/s. The figure also shows the goodput and mean response time with no attackers, as a reference. The attack begins at s and ends at s. At the beginning of the attack, the goodput decreases (Fig. 13a) and the mean response time increases (Fig. 13b). Yet, quickly the admission probability decreases (Fig. 13c), causing the mean response time to go back to its value when there is no attack. The goodput however stays low because of the relatively high attack rate, and because many legitimate users do not answer puzzles. After a few minutes, the Bloom filter catches all zombie IPs, causing puzzles to no longer be issued (Fig. 13c). Kill-Bots now moves to and performs authentication based on just the Bloom filter. This causes a large increase in goodput (Fig. 13a) due to both the admission of users who were earlier unwilling or unable to solve CAPTCHAs and the reduction in authentication cost. In this experiment, despite the ongoing CyberSlam attack, Kill-Bots' performance in ( onwards), is close to that of a server not under attack. Note that the normal load significantly varies with time and the adaptive controller (Fig. 13c) reacts to this load $t\in[1200,2400]s$ , keeping response times low, yet providing reasonable goodput.

6.3 Kill-Bots under Flash Crowds

We evaluate the behavior of Kill-Bots under a Flash Crowd. We emulate a Flash Crowd by playing our Web logs at a high speed to generate an average request rate of 2000 req/s. The request rate when there is no flash crowd is 300 req/s. This matches Flash Crowd request rates reported in prior work (19,20). In our experiment, a Flash Crowd starts at

and ends at

**Figure 14:** **Kill-Bots under Flash Crowds: The Flash Crowd event lasts from =1200s to =3000s. Though Kill-Bots has a slightly lower throughput, its Goodput is much higher and its avg. response time is lower.**
$\begin{figure}\begin{tabular}{c} \includegraphics[width=3.0in]{figures/f3tp.eps}... ...probability at the server} \end{tabular}\vskip -.05in\vskip -.2in \end{figure}$

Fig. 14 compares the performance of the base server against its Kill-Bots mirror during the Flash Crowd event. The figure shows the dynamics as functions of time. Each point in each graph is an average measurement over a 30s interval. We first show the total throughput of both servers in Fig. 14a. Kill-Bots has slightly lower throughput for two reasons. First, Kill-Bots attempts to operate at $\beta$ =12% idle cycles rather than at zero idle cycles. Second, Kill-Bots uses some of the bandwidth to serve puzzles. Fig. 14b reveals that the throughput figures are misleading; though Kill-Bots has a slightly lower throughput than the base server, its goodput is substantially higher (almost 100% more). This indicates that the base server wasted its throughput on retransmissions and incomplete transfers. Fig. 14c provides further supporting evidence-Kill-Bots drastically reduces the avg. response time.

That Kill-Bots improves server performance during Flash Crowds might look surprising. Although all clients in a Flash Crowd can answer the graphical puzzles, Kill-Bots computes an admission probability $\alpha$ such that the system only admits users it can serve. In contrast, a base server with no admission control accepts additional requests even when overloaded. Fig. 14d supports this argument by showing how the admission probability $\alpha$ changes during the Flash Crowd event to allow the server to shed away the extra load.

**Figure 15:** Cumulative numbers of dropped requests and dropped sessions under a Flash Crowd event lasting from to . Kill-Bots adaptively drops sessions upon arrival, ensuring that accepted sessions obtain full service, i.e. have fewer requests dropped.
$\begin{figure}\includegraphics[width=3in]{figures/f3drops.eps} \vskip -.05in\vskip -.15in \end{figure}$

Finally, Fig. 15 shows the cumulative number of dropped requests and dropped sessions during the Flash Crowd event for both the base server and the Kill-Bots server. Interestingly, the figure shows that Kill-Bots drops more sessions but fewer requests than the base server. The base server accepts new sessions more often than Kill-Bots but keeps dropping their requests. Kill-Bots drops sessions upon arrival, but once a session is admitted it is given a Kill-Bots cookie which allows it access to the server for 30min.

Note that Flash Crowds is just one example of a scenario in which Kill-Bots only needs to perform admission control. Kill-Bots can easily identify such scenarios-high server load but few bad bloom entries. Kill-Bots decouples authentication from admission control by no longer issuing puzzles; instead every user that passes the admission control check gets a Kill-Bots cookie.

6.4 Importance of Admission Control

**Figure 16:** Server goodput substantially improves with adaptive admission control. Figure is similar to Fig. 7 but is based on wide-area experiments rather than analysis. (For clarity, the Bloom filter is turned off in this experiment.)
$\begin{figure}\epsfig{file=figures/f1tp_2.eps, width=3in} \vskip -0.1in\vskip -.13in \end{figure}$

In §3.2, using a simple model, we showed that authentication is not enough, and good performance requires admission control. Fig. 16 provides experimental evidence that confirms the analysis. The figure compares the goodput of a version of Kill-Bots that uses only puzzle-based authentication, with a version that uses both puzzle-based authentication and admission control. We turn off the Bloom filter in these experiments because we are interested in measuring the goodput gain obtained only from admission control. The results in this figure are fairly similar to those in Fig. 7; admission control dramatically increases server resilience and performance.

6.5 Impact of Different Attack Strategies

The attacker might try to increase the severity of the attack by prolonging the time until the Bloom filter has discovered all attack IPs and blocked them, i.e., by delaying transition from to . To do so, the attacker uses the zombie IP addresses slowly, keeping fresh IPs for as long as possible. We show that the attacker does not gain much by doing so. Indeed, there is a tradeoff between using all zombie IPs quickly to create a severe attack for a short period vs. using them slowly to prolong a milder attack.

**Figure 17:** Comparison between 2 attack strategies; A fast strategy that uses all fresh zombie IPs in a short time, and a slow strategy that consumes fresh zombie IPs slowly. Graphs show a tradeoff; the slower the attacker consumes the IPs, the longer it takes the Bloom filter to detect all zombies. But the attack caused by the slower strategy though lasts longer has a milder impact on the goodput and response time.
$\begin{figure}\begin{tabular}{c} \includegraphics[width=3.0in]{figures/1.5tp.eps... ... time of legitimate users} \end{tabular}\vskip -.1in\vskip -.15in \end{figure}$

Fig. 17 shows the performance of Kill-Bots under two attack strategies; A fast strategy in which the attacker introduces a fresh zombie IP every 2.5 seconds, and a slow strategy in which the attacker introduces a fresh zombie IP every 5 seconds. In this experiment, the total number of zombies in the Botnet is 25000 machines, and the aggregate attack rate is constant and fixed at $\lambda_a=4000$ req/s. The figure shows that the fast attack strategy causes a short but high spike in mean response time, and a substantial reduction in goodput that lasts for a short interval (about 13 minutes), until the Bloom filter catches the zombies. On the other hand, the slow strategy affects performance for a longer interval ( $\sim$ 25 min) but has a milder impact on goodput and response time.

6.6 User Willingness to Solve Puzzles

We conducted a user study to evaluate the willingness of users to solve CAPTCHAs. We instrumented our research group's Web server to present puzzles to 50% of all external accesses to the index.html page. Clients that answer the puzzle correctly are given an HTTP cookie that allows them access to the server for an hour. The experiment lasted from Oct. 3 until Oct. 7. During that period, we registered a total of 973 accesses to the page, from 477 distinct IP addresses.

We compute two types of results. First, we filter out requests from known robots, using the User-Agent field, and compute the fraction of clients who answered our puzzles. We find that 55% of all clients answered the puzzles. It is likely that some of the remaining requests are also from robots but don't use well-known User-Agent identifiers, so this number underestimates the fraction of humans that answered the puzzles. Second, we distinguish between clients who check only the group's main page and leave the server, and those who follow one or more links. We call the latter interested surfers. We would like to check how many of the interested surfers answered the graphical puzzle because these users probably bring more value to the Web site. We find that 74% of interested users answer puzzles. Table 3 summarizes our results. These results may not be representative of users in the Internet, as the behavior of user populations may differ from one server to another.

Table 3: The percentage of users who answered a graphical puzzle to access the Web server. We define interested surfers as those who access two or more pages on the Web site.

Case	%Users
Answered puzzle	55%
Interested surfers who answered puzzle	74%

7 Related Work

Related work falls into the following areas.

(a) Denial of Service: Much prior work on DDoS describes specific attacks (e.g., SYN flood (36), Smurf (11), reflector attacks (34) etc.), and presents detection techniques or countermeasures. In contrast to Kill-Bots, prior work focuses on lower layers attacks and bandwidth floods. The backscatter technique (31) detects DDoS sources by monitoring traffic to unused segments of the IP address space. Traceback (40) uses in-network support to trace offending packets to their source. Many variations to the traceback idea detect low-volume attacks (41,5,49). Others detect bandwidth floods by mis-match in the volumes of traffic (16) and some (27) pushback filtering to throttle traffic closer to its source. Anderson et al. (7) propose that routers only forward packets with capabilities. Juels and Brainard (8) first proposed computational client puzzles as a SYN flood defense. Some recent work uses overlays as distributed firewalls (21,6). Clients can only access the server through the overlay nodes, which filter packets. The authors of (32) propose to use graphical tests in the overlay. Their work is different from ours because Kill-Bots uses CAPTCHAs only as an intermediate stage to identify the offending IPs. Further, Kill-Bots combines authentication with admission control and focusses on efficient kernel implementation.

(b) CAPTCHAs: Our authentication mechanism uses graphical tests or CAPTCHAs (47). Several other reverse Turing tests exist (37,14,22). CAPTCHAs are currently used by many online businesses (e.g. Yahoo!, Hotmail).

(c) Flash Crowds and Server Overload: Prior work (18,48) shows that admission control improves server perfomance under overload. Some admission control schemes (15,46) manage OS resources better. Others (20) persistently drop TCP SYN packets in routers to tackle Flash Crowds. Still others (44,42) shed extra load onto an overlay or a peer-to-peer network. Kill-Bots couples admission control with authentication.

8 Limitations & Open Issues

A few limitations and open issues are worth discussing. First, Kill-Bots interacts in a complex manner with Web Proxies and NATs, which multiplex a single IP address among multiple users. If all clients behind the proxy are legitimate users, then sharing the IP address has no impact. In contrast, if a zombie shares the proxy IP with legitimate clients and uses the proxy to mount an attack on the Web server, Kill-Bots may block all subsequent requests from the proxy IP address. To ameliorate such fate-sharing, Kill-Bots increments the Bloom counter by

when giving out a puzzle but decrements the Bloom counters by $x \geq 1$ whenever a puzzle is answered. Kill-Bots picks

based on server policy. If

, the proxy IP will be blocked only if the zombies traffic forwarded by the proxy/NAT is at least

times the legitimate traffic from the proxy. Further, the value of

can be adapted; if the server load is high even after the Bloom filter stops catching new IPs, Kill-Bots decreases the value of

because it can no longer afford to serve a proxy that has such a large number of zombies behind it.

Second, Kill-Bots has a few parameters that we have assigned values based on experience. For example, we set the Bloom filter threshold $\xi=32$ because even legitimate users may drop puzzles due to congestion or indecisiveness and should not be punished. There is nothing special about 32, we only need a value that is neither too big nor too small. Similarly, we allow a client that answers a CAPTCHA a maximum of 8 parallel connections as a trade-off between the improved performance gained from parallel connections and the desire to limit the loss due to a compromised cookie.

Third, Kill-Bots assumes that the first data packet of the TCP connection will contain the GET and Cookie lines of the HTTP request. In general the request may span multiple packets, but we found this to happen rarely.

Forth, the Bloom filter needs to be flushed eventually since compromised zombies may turn into legitimate clients. The Bloom filter can be cleaned either by resetting all entries simultaneously or by decrementing the various entries at a particular rate. In the future, we will examine which of these two strategies is more suitable.

9 Conclusion

The Internet literature contains a large body of research on denial of service solutions. The vast majority assume that the destination can distinguish between malicious and legitimate traffic by performing simple checks on the content of packets, their headers, or their arrival rates. Yet, attackers are increasingly disguising their traffic by mimicking legitimate users access patterns, which allows them to defy traditional filters. This paper focuses on protecting Web servers from DDoS attacks that masquerade as Flash Crowds. Underlying our solution is the assumption that most online services value human surfers much more than automated accesses. We present a novel design that uses CAPTCHAs to distinguish the IP addresses of the attack machines from those of legitimate clients. In contrast to prior work on CAPTCHAs, our system allows legitimate users to access the attacked server even if they are unable or unwilling to solve graphical tests. We implemented our design in the Linux kernel and evaluated it in Planetlab.

Acknowledgements

We thank Shan Sinha for working on an early version of Kill-Bots. We also thank Hari Balakrishnan, Robert Morris, Thomer Gil, David Andersen, Chuck Blake, Magdalena Balazinska, Allen Miu, Nate Kushman, our sheperd Stefan Savage, and the anonymous reviewers for their constructive comments. David Mazieres for libasync; Eddie Kohler for Click; and Michel Gorachzko for great technical support.

Bibliography

1: Jcaptcha.
jcaptcha.sourceforge.net/.
2: Mindcraft Inc. Webstone - The Benchmark for Web Servers.
http://www.mindcraft.com/webstone/.
3: Netfilter/Iptables.
http://www.netfilter.org.
4: Porn Gets Spammers Past Hotmail, Yahoo Barriers.
CNet News, May 2004.
http://news.com.com/2100-1023_3-5207290.html.
5: Alex Snoeren et al.
Hash-Based IP Traceback.
In SIGCOMM, 2001.
6: D. Andersen.
Mayday: Distributed Filtering for Internet services.
In USITS, 2003.
7: T. Anderson, T. Roscoe, and D. Wetherall.
Preventing Internet Denial-of-Service with Capabilities.
In HotNets, 2003.
8: Ari Juels et al.
Client Puzzles: A Cryptographic Countermeasure Against Connection Depletion Attacks.
In NDSS, 1999.
9: M. Boland.
Mathopd.
http://www.mathopd.org.
10: A. Broder and M. Mitzenmacher.
Network Applications of Bloom Filters: A Survey.
In Allerton, 2002.
11: CERT.
Advisory CA-1998-01 Smurf IP Denial-of-Service Attacks, 1998.
http://www.cert.org/advisories/CA-1998-01.html.
12: CERT.
Advisory CA-2003-20 W32/Blaster worm, 2003.
13: CERT.
Incident Note IN-2004-01 W32/Novarg.A Virus, 2004.
14: A. Coates, H. Baird, and R. Fateman.
Pessimal print: A Reverse Turing Test.
In IAPR, 1999.
15: G. Banga et al.
Resource Containers: A New Facility for Resource Management in Server Systems.
In OSDI, 1999.
16: T. Gil and M. Poletto.
MULTOPS: A Data-Structure for Bandwidth Attack Detection.
In USENIX Security, 2001.
17: E. Hellweg.
When Bot Nets Attack.
MIT Technology Review, September 2004.
18: R. Iyer et al.
Overload Control Mechanisms for Web Servers.
In Workshop on Perf. and QoS of Next Gen. Networks, 2000.
19: J. Jung et al.
Flash Crowds and Denial of Service Attacks: Characterization and Implications for CDNs.
In WWW, 2002.
20: H. Jamjoom and K. G. Shin.
Persistent Dropping: An Efficient Control of Traffic.
In ACM SIGCOMM, 2003.
21: A. Keromytis, V. Misra, and D. Rubenstein.
SOS: Secure Overlay Services.
In ACM SIGCOMM, 2002.
22: G. Kochanski, D. Lopresti, and C. Shih.
A Reverse Turing Test using speech.
In ICSLP, 2002.
23: E. Kohler, R. Morris, B. Chen, J. Jannotti, and M. F. Kaashoek.
The Click Modular Router.
ACM TOCS, 2000.
24: J. Leyden.
East European Gangs in Online Protection Racket, 2003.
www.theregister.co.uk/2003/11/12/east_european_gangs_in_online/.
25: J. Leyden.
DDoSers attack DoubleClick, 2004.
www.theregister.co.uk/2004/07/28/ddosers_attack_doubleclick/.
26: J. Leyden.
The Illicit trade in Compromised PCs, 2004.
www.theregister.co.uk/2004/04/30/spam_biz/.
27: R. Mahajan et al.
Controlling High Bandwidth Aggregates in the Network.
CCR, 2002.
28: D. Mazieres.
Toolkit for User-Level File Sys.
In USENIX, 2001.
29: S. McCanne.
The Berkeley Packet Filter Man page, May 1991.
BPF distribution available at ftp://ftp.ee.lbl.gov.
30: J. Mogul and K. K. Ramakrishnan.
Eliminating Receive Livelock in an Interrupt-driven Kernel.
In USENIX Tech. Conf., 1996.
31: D. Moore, G. Voelker, and S. Savage.
Inferring Internet Denial-of-Service Activity.
In USENIX Security, 2001.
32: W. G. Morein et al.
Using Graphic Turing Tests to Counter Automated DDoS Attacks.
In ACM CCS, 2003.
33: G. Mori and J. Malik.
Recognizing Objects in Adversarial Clutter: Breaking a Visual CAPTCHA.
In CVPR, 2003.
34: V. Paxson.
An Analysis of Using Reflectors for Distributed Denial-of-Service Attacks.
ACM CCR, 2001.
35: K. Poulsen.
FBI Busts Alleged DDoS Mafia, 2004.
http://www.securityfocus.com/news/9411.
36: L. Ricciulli, P. Lincoln, and P. Kakkar.
TCP SYN Flooding Defense.
In CNDS, 1999.
37: Y. Rui and Z. Liu.
ARTiFACIAL: Automated Reverse Turing Test Using FACIAL Features.
In Multimedia, 2003.
38: R. Russell.
Linux IP Chains-HOWTO.
http://people.netfilter.org/ rusty/ipchains/HOWTO.html.
39: S. Kandula et al.
Botz-4-sale: Surviving Organized DDoS Attacks That Mimic Flash Crowds.
Technical Report TR-969, MIT., 2004.
40: S. Savage et al.
Practical Network Support for IP Traceback.
In SIGCOMM, 2000.
41: D. Song and A. Perrig.
Advanced and Authenticated Marking Schemes for IP Traceback.
In INFOCOM, 2001.
42: T. Stading, P. Maniatis, and M. Baker.
Peer-to-peer Caching Schemes to Address Flash Crowds.
In IPTPS, 2002.
43: S. Staniford, V. Paxson, and N. Weaver.
How to 0wn the Internet in Your Spare Time.
In USENIX Security, 2002.
44: A. Stavrou et al.
A lightweight, robust, P2P system to handle Flash Crowds.
IEEE JSAC, 2004.
45: L. Taylor.
Botnets and Botherds.
http://sfbay-infragard.org.
46: Thiemo Voigt et al.
Handling Multiple Bottlenecks in Web Servers Using Adaptive Inbound Controls.
In High-Speed Networks, 2002.
47: L. von Ahn et al.
Captcha: Using Hard AI Problems for Security.
In EUROCRYPT, 2003.
48: M. Welsh et al.
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services.
In SOSP, 2001.
49: A. Yaar et al.
Pi: A Path Identification Mechanism to Defend against DDoS Attacks.
In IEEE Security & Privacy, 2003.

About this document ...

Botz-4-Sale: Surviving Organized DDoS Attacks That Mimic Flash Crowds

This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.70)

The command line arguments were:
latex2html -split 0 -no_navigation -white -show_section_numbers XXX

The translation was initiated by Srikanth Kandula on 2005-03-28

Srikanth Kandula 2005-03-28

Srikanth Kandula	Dina Katabi	Matthias Jacob	Arthur Berger
MIT		Princeton	MIT/Akamai
{kandula,dina}@csail.mit.edu		mjacob@princeton.edu	awberger@mit.edu