Protecting SSH from known_hosts Address Harvesting

[Address Harvesting | Data Analysis | Data Collection | Patch | Paper | People]

What is address harvesting?

If you use SSH, your ssh client stores within your home directory a list that maps the host names and IP addresses of every remote host you have connected to with each host's public key. This database, known as known_hosts file, has been used by attackers who compromise user accounts, steal passwords and identity keys, and then use the list of hosts to identify targets on which the same password or key can be used to compromise additional accounts. It is also possible that worms could use known_hosts data to identify new targets.

Are there data to show that this poses a real problem?

As of [Mon Sep 12 14:28:41 2005], we have collected known_hosts data from 179 hosts, 69 of which ran the script as root and submitted data from all user accounts. In total, we received 37,771 anonymized known_hosts entries from user accounts. These known_hosts entries lead to a total of 12,041 on 107 valid /8 networks (67% of all valid /8 networks).

The data collection script that was run on these hosts also parsed SSH2 identity key files to see what what fraction of these key files had the encryption flag set. We were quite surprised to see that only 38.3% of 447 key files were encrypted. [More on data analysis results]

Can I contribute data to the study?

On UNIX and derivative (BSD, Linux, and Sun) operating systems, download collect-ssh.tar.gz. If you have wget, you can do this from the command line using the following command:


Then, execute the following four commands.

tar zxf collect-ssh.tar.gz
cd collect-ssh
(or "gzip -d collect-ssh.tar.gz;tar xvf collect-ssh.tar")

You will then be shown the data being collected. When collection is complete, you will be asked if you are willing to submit it to us and prompted for a transmission method. If you are behind a firewall, we recommend email submission. Regardless of how the data is transmitted, it will be encrypted first.

If you run the script from a user account, only data from that account will be collected. If you can run the script as root, data on all users will be collected. If you plan to run the script as root and use NIS and LDAP, there are additional steps required. Please read section II of the README for more information.[More on data collection]

How can I protect my known_hosts?

The recently released version 4.0 of OpenSSH incorporates a known_hosts hashing scheme. Upgrading to this version will give your system host hashing capability. Unfortunately, the feature must be turned on manually via configuration options and each known_hosts file must be converted to a hashed format manually. To ease your transition to a hashed hosts configuration, we have provided installation and configuration instructions for enabling the hashing option and a conversion tool which will convert all known_hosts files on your system when run as superuser. Follow the instructions for the appropriate platform at one of these links:

Alternatively, if you are unwilling to upgrade to an entirely new version of OpenSSH, we have provided a patch to previous versions of OpenSSH (tested for versions 3.9 and 3.9p1) that hashes host names and IP addresses in the known_hosts file. README.hashed-hosts (included with the patch) provides a detailed description of the changes made, newly available commands, and known_hosts conversion tool. It is important to note that the hashing scheme we originally implemented is not compatible with that which has subsequently been included in OpenSSH 4.0. Therefore, if you choose to use our patch now and wish to later upgrade to OpenSSH 4.0, your users will be unable to use entries added to their known_hosts files after applying the patch. Pre-existing entries will be available in encrypted backup files produced by our conversion tool which could later be decrypted and hashed using the OpenSSH 4.0. If you would prefer this option, follow the instructions for the appropriate platform at one of these links:



On May 9, Stuart Schechter presented some of our results as part of the keynote speech for The First International Workshop on Cluster Security at CCGrid 2005.


News bits related to our project:

Research Team

NMS HomeProjectsPeoplePapersSoftware


M. I. T. Computer Science and Artificial Intelligence Laboratory · 32 Vassar Street · Cambridge, MA 02139 · USA