Data Leak Prevention

Data Loss Prevention (DLP) is a computer security term referring to systems that identify, monitor, and protect data in use (e.g., endpoint actions), data in motion (e.g., network actions), and data at rest (e.g., data storage) through deep content inspection and with a centralized management framework. The systems are designed to detect and prevent the unauthorized use and transmission of confidential information.

And the reason is that data leak prevention products plug a gaping hole in most company’s security systems. The problem is that most security products are outwardly focused. They try to block external attacks. That’s all well and good, but it doesn’t address an entire spectrum of security vulnerabilities that occur when data moves from inside the network out.

Data leakage prevention products – also known as anti-data leakage or data-loss prevention – inspect content as it moves across the network and enforces policies so that confidential information doesn’t escape the walls of the enterprise.

It is also referred to by various vendors as Data Leak Prevention, Information Leak Detection and Prevention (ILDP), Information Leak Prevention (ILP), Content Monitoring and Filtering (CMF) or Extrusion Prevention System by analogy to Intrusion-prevention system.

Types of DLP systems

Network DLP

Also referred to as gateway-based systems. These are usually dedicated hardware/software platforms, typically installed on the organization’s internet network connection, that analyze network traffic to search for unauthorized information transmissions, including email, IM, FTP, HTTP, and HTTPS (called data in motion). They have the advantage that they are simple to install, and provide a relatively low cost of ownership. Network DLP systems can also discover data at rest (data stored throughout the enterprise) to identify areas of risk where confidential data is stored in inappropriate and/or unsecured locations.

Host-based DLP systems

Such systems run on end-user workstations or servers in the organization. Like network-based systems, host-based can address internal as well as external communications, and can therefore be used to control information flow between groups or types of users (eg ‘Chinese walls’). They can also control email and Instant Messaging communications before they are stored in the corporate archive, such that a blocked communication (ie one which was never sent, and therefore not subject to retention rules) will not be identified in a subsequent legal discovery situation.

Host systems have the advantage that they can monitor and control access to physical devices (such as mobile devices with data storage capabilities) and in some cases can access information before it has been encrypted. Some host based systems can also provide application controls to block attempted transmissions of confidential information, and provide immediate feedback to the user. They have the disadvantage that they need to be installed on every workstation in the network, cannot be used on mobile devices (e.g., cell phones and PDAs), or where they cannot be practically installed (for example on a workstation in an internet café).

Data Identification

DLP solutions include a number of techniques for identifying confidential or sensitive information. Sometimes confused with discovery, data identification is a process by which organizations use a DLP technology to determine what to look for (in motion, at rest, or in use). DLP solutions use multiple methods for deep content analysis, ranging from keywords, dictionaries, and regular expressions to partial document matching and fingerprinting. The strength of the analysis engine directly correlates to its accuracy. The accuracy of DLP identification is important to lowering/avoiding false positives and negatives. Accuracy can depend on many variables, some of which may be situational or technological. Testing for accuracy is recommended to ensure a solution has virtually zero false positives/negatives.