# Steganography in contemporary cyberattacks

2017-08-03T09:00:22
ID SECURELIST:56FF2E253856D1167848A57EA1C56F6F
Type securelist
Reporter Alexey Shulmin
Modified 2017-08-03T09:00:22

## Statistical methods of analysis: histogram method

This method was suggested in 2000 by Andreas Westfeld and Andreas Pfitzmann, and is also known as the chi-squared method. Below we give a brief overview.

The entire image raster is analyzed. For each color, the number of dots possessing that color is counted within the raster. (For simplicity, we are dealing with an image with one color plane.) This method assumes that the number of pixels possessing two adjacent colors (i.e. colors different only by one least significant bit) differs substantially for a regular image that does not contain an embedded payload (see Figure A below). For a carrier image with a payload, the number of pixels possessing these colors is similar (see Figure B).

Figure A. An empty carrier | Figure B. A filled carrier.

The above is an easy way to visually represent this algorithm.

Strictly speaking, the algorithm consists of the following steps that must be executed sequentially:

• The expected occurrence frequency for the pixels of color i in a payload-embedded image is calculated as follows:
• The measured frequency of the occurrence of a pixel of specific color is determined as:
• The chi-squared criterion for _k-1 _degrees of freedom is calculated as:
• P is the probability that the distributions ni and ni* are equal under these conditions. It is calculated by integrating the density function:

Naturally, we have tested whether this method is suitable for detecting filled stego-containers. Here are the results.

Original image | Visual attack image | Chi-squared attack, 10 zones
---|---|---
| |
| |

The threshold values of the chi-squared distribution for p=0.95 and p=0.99 are 101.9705929 and 92.88655838 respectively. Thus, for the zones where the calculated chi-squared values are lower than the threshold, we can accept the original hypothesis "adjacent colors have similar frequency distributions, therefore we are dealing with a carrier image with a payload".

Indeed, if we look at the visual attack images, we can clearly see that these zones contain an embedded payload. Thus, this method works for high-entropy payloads.

## Statistical methods of analysis: RS method

Another statistical method of detecting payload carriers was suggested by Jessica Fridrich, Miroslav Goljan and Andreas Pfitzmann in 2001. It is called the RS method, where RS stands for 'regular/singular'.

The analyzed image is divided into a set of pixel groups. A special flipping procedure is then applied for each group. Based on the values of the discriminant function before and after the flipping procedure is applied, all groups are divided into regular, singular and unusable groups.

This algorithm is based on the assumption that the number of regular and singular pixel groups must be approximately equal in the original image and in the image after flipping is applied. If the numbers of these groups change appreciably after flipping is applied, this indicates that the analyzed image is a carrier with a payload.

The algorithm consists of the following steps:

• The original image is divided into groups of _n _pixels (x1, …, xn).
• The so-called discriminant function is defined which assigns to each group of pixels G = (x1, …, xn) a real number f(x1, …, xn) ∈
• The discriminant function for the groups of pixels (x1, …, xn) can be defined as follows:
• Then we define the flipping function which has the following properties:

Depending on the discriminant function's values prior to and after flipping is applied, all groups of pixels are divided into regular, singular and unusable groups:

We have put this method to the test as well, and obtained the following results. We used the same empty and payload-embedded carriers as in the previous test.

Original image | Visual attack image | Chi-squared attack, 10 zones
---|---|---
| |
| |

Note that this attack method does not pass the binary verdict in terms of "whether this specific carrier contains an embedded payload or not"; rather, it determines the approximate length of the embedded payload (as a percentage).

As can be seen from the results above, this method returned a verdict for the empty message that it was filled less than 1% with payload, and for the payload-embedded carrier it returned a verdict that it was about 44% filled. Obviously, these results are slightly off. Let's look at the filled container: from the visual attack it follows that more than 50% of the container is filled, while the RS attack tells us that 44% of the container is filled. Therefore, we can apply this method if we establish a certain "trigger threshold": our experiments showed that 10% is a sufficient threshold of reliability. If the RS attack claims that more than 10% of the container is full, you can trust this verdict and mark the container as full.

Now it's time to test these two methods in real-world conditions, on the Zero.T carriers in which the payload has regular entropy.

We ran the appropriate tests and here are the results:

Original image | Chi-squared attack | RS attack
---|---|---
| |
| |

As we see, a chi-squared attack is not applicable on low entropy images – it yields unsatisfactory or inaccurate results. However, the RS attack worked well: in both cases, it detected a hidden payload in the image. However, what do we do if automatic analysis methods show there is no payload, but we still suspect there might be one?

In that case, we can apply specific procedures that have been developed for specific malware families to extract the payload. For the aforementioned Zero.T loader, we have written our own embedded payload extraction tool. Its operation can be schematically presented as follows.

| + |
---|---|---
=

Obviously, if we get a valid result (in this specific case, an executable file), then the source image has an embedded payload in it.

## Is DNS tunneling also steganography?

Can we consider use of a DNS tunnel a subtype of steganography? Yes, definitely. For starters, let's recap on how a DNS tunnel works.

From a user computer in a closed network, a request is sent to resolve a domain, for example the domain wL8nd3DdINcGYAAj7Hh0H56a8nd3DdINcGYAlFDHBurWzMt[.]imbadguy[.]com to an IP address. (In this URL, the second-level domain name is not meaningful.) The local DNS server forwards this request to an external DNS server. The latter, in turn, does not know the third-level domain name, so it passes this request forward. Thus, this DNS request follows a chain of redirections from one DNS server to another, and reaches the DNS server of the domain imbadguy[.]com.

Instead of resolving a DNS request at the DNS server, threat actors can extract the information they require from the received domain name by decoding its first part. For example, information about the user's system can be transmitted in this way. In response, a threat actor's DNS server also sends some information in a decoded format, putting it into the third- or higher-level domain name.

This means the attacker has 255 characters in reserve for each DNS resolution, up to 63 characters for subdomains. 63 characters' worth of data is sent in each DNS request, and 63 characters are sent back in response, and so on. This makes it a decent data communications channel! Most importantly, it is concealed communication, as an unaided eye cannot see that any extra data is being communicated.

To specialists who are familiar with network protocols and, in particular, with DNS tunneling, a traffic dump containing this sort of communication will look quite suspicious – it will contain too many long domains that get successfully resolved. In this specific case, we are looking at the real-life example of traffic generated by the Trojan Backdoor.Win32.Denis, which uses a DNS tunnel as a concealed channel to communicate with its C&C.

A DNS tunnel can be detected with the help of any popular intrusion detection (IDS) tool such as Snort, Suiricata or BRO IDS. This can be done using various methods. For example, one obvious idea is to use the fact that domain names sent for DNS resolution are much longer than usual during tunneling. There are quite a few variations on this theme on the Internet:

alert udp any any -> any 53 (msg:"Large DNS Query, possible cover channel"; content:"|01 00 00 01 00 00 00 00 00 00|"; depth:10; offset:2; dsize:>40; sid:1235467;)

There is also this rather primitive approach:

Alert udp \$HOME_NET and -> any 53 (msg: "Large DNS Query"; dsize: >100; sid:1234567;)

There is plenty of room for experimenting here, trying to find a balance between the number of false positives and detecting instances of actual DNS tunneling.

Apart from suspiciously long domain names, what other factors may be useful? Well, anomalous syntax of domain names is another factor. All of us have some idea of what typical domain names look like – they usually contain letters and numbers. But if a domain name contains Base64 characters, it will look pretty suspicious, won't it? If this sort of domain name is also quite long, then it is clearly worth a closer look.

Many more such anomalies can be described. Regular expressions are of great help in detecting them.

We would like to note that even such a basic approach to detecting DNS tunnels works very well. We applied several of these rules for intrusion detection to the stream of malware samples sent to Kaspersky Lab for analysis, and detected several new, previously unknown backdoors that used DNS tunnels as a covert channel for C&C communication.

## Conclusions

We are seeing a strong upward trend in malware developers using steganography for different purposes, including for concealing C&C communication and for downloading malicious modules. This is an effective approach considering payload detection tools are probabilistic and expensive, meaning most security solutions cannot afford to process all the objects that may contain steganography payloads.

However, effective solutions do exist – they are based on combinations of different methods of analysis, prompt pre-detections, analysis of meta-data of the potential payload carrier, etc. Today, such solutions are implemented in Kaspersky Lab's Anti-Targeted Attack solution (KATA). With KATA deployed, an information security officer can promptly find out about a possible targeted attack on the protected perimeter and/or the fact that data is being exfiltrated.