An Introduction to CIF

originally written by our good friend Kyle.

CIRTs and related organizations often handle incident detection as well as response. Both of these roles produce and consume threat intelligence in different ways. For example, we often want to correlate our network traffic with OSINT indicators (known bad IP addresses and URLs, MD5 hashes of suspicious files, etc.) I've started looking at the Collective Intelligence Framework as a way to fulfill these needs. CIF development is sponsored by the REN-ISAC and National Science Foundation, with most of the coding (and everything else!) handled by Wes Young. Everything is open source for those of us who like - or need - to hack directly on the code. In this article, I'll explain CIF, give some usage examples, and discuss test deployment scenarios.

Understanding CIF

From the perspective of a user, CIF allows you to run queries against many data sources at once. If you have other private data sources available, particularly via XML (RSS), JSON, or in a file (e.g. CSV), you can incorporate those, as well as additional OSINT sources. CIF comes preconfigured for: Use cases include manually querying the database for specific indicators (e.g. "do we have any records for this IP address?") as well as pulling feeds of various sorts for use by security systems (e.g. "what URLs should we block at the proxy?"). CIF includes concepts of severity and confidence as well as privilege. This allows you to provide feeds of high-confidence public data to some systems while still allowing investigators to query private, unconfirmed data. Essentially, CIF ingests data - typically on an hourly or data basis, depending on the source - indexes it on the fly for performance reasons, performs correlation analytics (e.g. so that a URL also turns into domain and IP address information), and then makes it available in feeds via various output plugins. These plugins include tables and HTML for viewing by a user, but also IPtables rules, Snort rules, JSON, and CSV for processing by other security systems.

Usage examples

Everything below comes from the Perl client. I haven't yet dealt with the Python client, much less hacked on it, but that's coming Soontm.

$ cif -q infrastructure/malware -c 50 -s medium

gives a fairly large list of IP addresses associated with malware. (I used medium severity and 50% confidence in these examples.)

Even if you don't use a proxy server, you might find CIF useful for checking suspicious URLs:

cif -q url -c 50 -s medium -p snort

You now have a list of Snort rules to pull into your IDS.

Or if you have your own list of IP addresses to check, such as when an ongoing case has new indicators.

you can put them in a file and query each of them:

for f in `cat hostlist.txt` ; do cif -q $f >> specific-ip.txt; done

This yields another list. You might see a few lines in that example with a "private" restriction and impact as "search". This happens because, by default, CIF will log every query for a specific indicator. A number of searches, such as from other investigators, may have significance apart from any data. However, if you don't want CIF to log a query, just use the "-n" parameter.

If you'd like to play with it some more, contact me for an API key and the address of my semi-public CIF server. Twitter or email both work fine.

Appendix: CIF on the Amazon cloud

Amazon Web Services provide a decent platform for testing CIF or running a public instance like mine. The following assumes some familiarity with Linux administration and at least a basic understanding of the Elastic Compute Cloud (EC2).

You can start with a small instance for the installation, but you'll quickly want to move to a medium instance at least. I run a large instance using the Ubuntu Cloud Guest server image. In general, follow the server install instructions for CIF. You'll also want to note the specifics for Ubuntu as they contain a few workarounds you will need. Allocate an Elastic IP and register it in DNS someplace, such as with Amazon Route 53. For the Security Group, only add HTTPS and SSH. You won't need anything else, and I recommend leaving it at this minimal state for security purposes. You'll also need an Elastic Block Store. While you can start with 10GB, expect that to grow a few GB per week, so you'll need to resize from time to time or create a larger volume at the beginning. While not required for CIF installation, I can't recommend enough that you use git to manage config files. Srsly.

When installing Postgres, note that "peer" may appear in the original file instead of "ident sameuser". Also, I did not use the values in CIF doc, as postgres didn't like them. I left everything at the defaults except:

work_mem = 512MB
checkpoint_segments = 32

When setting up BIND9, first check /etc/resolv.conf for the IP addresses you should use as forwarders.