'============================================================================== collect-ssh.pl - A information collection tool for OpenSSH configuration stats ============================================================================== As a part of our research project "Inoculating SSH Against Address-Harvesting Worms", we collect anonymized information of entries in a known_host file and fingerprints of the keys listed in authorized_keys and public_key files in ~/.ssh/ directory. More information on the project is available at http://nms.lcs.mit.edu/projects/ssh/ We appreciate your support for participating in this study! I. Requirements =============== The script requires the following for proper execution: - The openssl command must be in the path - The Crypt::Rijndael perl module must be available If you do not have the Crypt::Rijndael module, run the following command: $ ./build-CR.sh This will build Crypt::Rijndael and Digest::SHA1 in place without installing it to your system, so root privileges are not required. II. Execution ============= This part is easy: $ ./collect-ssh.pl The script will display the generated results to the terminal and then ask the user to confirm her desire to submit the results for our use. The following options are allowed: perl collect-ssh.pl Data collection procedure options (default behavior is to prompt for submission method): -c http Submit via tcp connection over port 80 (HTTP) Will not automatically detect proxies, so this will may fail if your firewall blocks outgoing port 80. ('-h' is equivalent) -c mail Pipe to /bin/mail and send to sshproject\@nms.lcs.mit.edu Works with your existing mail infrastructure to prevent interference from firewalls. Will fail if host is not able or allowed to send mail. ('-e' is equivalent) -c file: Save to file for manual submission Please email to sshproject\@nms.lcs.mit.edu ('-f ' is equivalent) -c prompt Prompt user for preferred option -x Exclude from collection procedure. Multiple '-x' options may be specified. Standard options (use a space between option and value): -pf Specify an alternate path to your system passwd file. Default is '/etc/passwd'. This is useful if your system uses NIS or LDAP which can sometimes cause problems for the script on large systems. Tools to generate passwd file style data from NIS and LDAP can avoid these problems. (see README for more information) Alternate options to find user home directories (instead of -a or -pf): -u Collect data from files in the .ssh subdirectory beneath this user home directory path, specified via the value. (e.g. -u /home/johndoe collects data in /home/johndoe/.ssh/) -m Like -u, but the specifies a master subdirectory under which there are user subdirectories for which the //.ssh/ directory contains the known_hosts files to be converted. (e.g. -m /var/home converts files in /var/home/*/.ssh/ and globbing operations work as well such as -m /var/home/*/* in the case that user home directories are subdivided into multiple 'master' subdirectories) -r Location of superuser home directory. Default is '/root'. RUNNING AS ROOT? The following information may be useful. If you use NIS or LDAP, a few additional steps are necessary. collect-ssh.pl works by reading yor systems /etc/passwd file. This is often inaccurate on systems using NIS, LDAP, or other directory service. However, NIS and LDAP can generate a passwd-like file for you. For NIS, you simply need to execute the following commands: $ ypcat passwd > passwd.tmp $ ./collect-ssh.pl -p passwd.tmp For LDAP, you need the ldap2pass utility or other manner in which to generate a passwd-style flat file from a LDAP directory. If you don't have ldap2pass, it is available at http://www.fanying.com/projects/ldaputils.html. After generating a flat passwd file, you can simply invoke the collection script with that flat file as an argument: $ ./collect-ssh.pl -p passwd.tmp If generating a passwd file is undesirable or if you experience failures during script execution, other options are available. For instance, on one NIS-enabled system using automount, we ran into problems with exceeding the system's mount point maximum. This prevented the -p option from working. In order to avoid this specific problems as well as provide an alternative to passwd file data, we created the -u and -r options. Instead of reading a passwd file, the collect-ssh.pl is also capable of search a directory hierarchy for home directories. Using -u and -r, you can mount your user home directories as a whole in your filesystem hierarchy and then simply point collect-ssh.pl at the top level as follows: $ mount -t nfs serverA:/home /mnt/home1 $ mount -t nfs serverB:/home /mnt/home2 $ collect-ssh.pl -u /mnt/home1 -u /mnt/home2 In this example, collect-ssh.pl will recursive descend through the file hierarchy under /mnt/home1 and /mnt/home2 looking for directories that are not owned by root. These are taken to be home directories and their paths used for collection. This is especially useful when the top-level of the home directory file hierarchy is subdivided, for instance, into /mnt/home1/a/a/, /mnt/home/a/b/, etc. directories. Additionally, if you chose to use the -u option (and therefore ignore the local passwd file), the -r option may be useful if the superuser's home directory is not /root. III. What collect-ssh.pl does ============================= collect-ssh.pl collects the following information about the local host's stored OpenSSH configuration state. 1) An anonymized (see [1]) version of the local host's IP address or "CANTRESOLVE if it's not resolvable. 2) OpenSSH state for individual user(s) If run as a regular user, only that user's .ssh/ directory will be analyzed. If the script is run as root, all users' .ssh/ directories will be analyzed. collect-ssh.pl does its best to avoid system user accounts, but it cannot not do this perfectly so the executing root user will be prompted to decide whether or not to analyze each account's data. - A SHA1 hash of the username - Operating system and kernel version (uname -r -s) - OpenSSH version (ssh -V) For each entry in a user's ~/.ssh/known_hosts and ~/.ssh/known_hosts2 files: - An anonymized (see [1]) version of the IP address for the host - the entry's line number within the file For each entry in a user's ~/.ssh/authorized_keys and ~/.ssh/authorized_keys2 files: - A SHA1 hash of the public key For each of the user's ~/.ssh/*.pub files - A SHA1 hash of the public key - A boolean specifying whether the associated private key is encrypted As this information is collected, it is displayed to the user's terminal. Following collection, the script will check with the user before submitting this information to our server. All results are encrypted using Rijndael and a randomly chosen 128-bit key. That key is then encrypted with an RSA public key. The ciphertext and the RSA-encrypted symmetric key are transmitted to our server via HTTP POST method. When performing IP anonymization, collect-ssh.pl uses the technique described in [1] with Rijndael as the encryption algorithm. Users are of course encouraged to read the script to verify that these operations are performed as described. IV. Authors =========== Jaeyeon Jung - MIT CSAIL Stuart Schechter - Harvard University DEAS Will Stockwell - MIT CSAIL References ========== [1] "Prefix-Preserving IP Address Anonymization: Measurement-based Security Evaluation and a New Cryptography-based Scheme", to appear in Proceedings of the IEEE International Conference on Network Protocols, Paris, 2002. http://www.cc.gatech.edu/computing/Telecomm/cryptopan/icnp02.ps