Distributed YARA Malware Scanning System: KLara project

2018-03-10T05:35:36
ID N0WHERE:172601
Type n0where
Reporter N0where
Modified 2018-03-10T05:35:36

Description

Klara project is aimed at helping Threat Intelligence researchers hunt for new malware using Yara .

In order to hunt efficiently for malware, one needs a large collection of samples to search over. Researchers usually need to fire a Yara rule over a collection / set of malicious files and then get the results back. In some cases, the rule needs adjusting. Unfortunately, scanning a large collection of files takes time. Instead, if a custom architecture is used, scanning 10TB of files can take around 30 minutes.

Klara, a distributed system written in Python, allows researchers to scan one or more Yara rules over collections with samples, getting notifies by e-mail as well as the web interface when scan results are ready.

Features

  • Modern web interface, allowing researchers to “fire and forget” their rules, getting back results by e-mail / API
  • Powerful API, allowing for automatic Yara jobs submissions, checking their status and getting back results. API Documentation will be released soon.
  • Distributed system, running on commodity hardware

Architecture

Klara leverages Yara’s power, distributing scans using a dispatcher-worker model. Each worker server connects to a dispatcher trying to check if new jobs are avaibale. If a new job is indeed available, it checks to see if the required scan repository is available on its own filesystem and, if it is, it will start the Yara scan with the rules submitted by the researcher

The main issue Klara tries to solve is running Yara jobs over a large collection of malware samples (>1TB) in a resonable amount of time.

Installing Klara

Requirements for running Klara:

  • GNU/Linux (we recommend Ubuntu 16.04 or latest LTS)
  • MySQL / MariaDB DB
  • Python 2.7
  • Python virtualenv package
  • Yara (installed on workers)

Installing Klara consists of 4 parts:

  • Database installation
  • Worker installation
  • Dispatcher installation
  • Web interface installation

Components are connected between themselves as follows:

                              +----------+          +----------------+
                              |          |          |                |
                  +---------->+ Database +<--+      |     nginx      |
                  |           |          |   |      |   (optional)   |
                  |           +----------+   |      |                |   
           +------+------+                   |      +-------+--------+   
           |             |                   |              |             
    +----->|  Dispatcher | <---+             |              |            
    |      |             |     |             |              |            
    |      +------+------+     |             |              v            
    |             |            |             |      +-------+--------+
    |             |            |             |      |                |
    |             |            |             |      |                |
+---+----+   +----+---+   +----+---+         ^------+   Web server   |
|        |   |        |   |        |                |                |
| Worker |   | Worker |   | Worker |                |                |
|        |   |        |   |        |                +----------------+
+--------+   +--------+   +--------+

Workers connect to Dispatcher using a simple HTTP REST API. Dispatcher and the Web server connect the MySQL / MariaDB Database using TCP connections. Because of this, components can be installed on separated machines / VMs. The only requirements is that TCP connections are allowed between them.

Distributed YARA Malware Scanning System: KLara project