Lucene search

HistoryApr 19, 2016 - 2:42 p.m.

Heuristics File System Secret Search: blueflower


blueflower is a command-line tool that looks for secrets such as private keys or passwords in a file structure. Interesting files are detected using heuristics on their names and on their content. Unlike some forensics tools, blueflower does not search in RAM, and does not attempt to identify cryptographic keys or algorithms in binaries.


  • _ search _ in the following types of files:
    • text/* MIME-typed files
    • PDF, DOCX, XLSX documents
    • tar, ZIP archives
    • bzip2, gzip compressed files/archives
  • _ detection _ of
    • common key and password containers (SSH id_* , Apple keychain, Java KeyStore, etc.)
    • common encrypted containers (Truecrypt, PGP Disks, GnuPG files, encrypted ZIPs, etc.)
    • executables (PE, ELF, with heuristical packing detection)
    • other interesting files (Bitcoin wallets, PGP policies, etc.)
  • _ hiding _ of secrets searched for (names, secret keys, etc.) via a hash file


blueflower is written for Python 2.7. It will not work on Python 3.x.


From the project’s top directory, you can directly run

python directory [hashes]


  • directory is the root of the file structure to explore
  • hashes is an optional file, which should be created with the script (see details below)

Results are written to a log file blueflower-YYYYMMDDhhmmss in CSV format.

(Run make clean if you wish to remove previous log files as well as .pyc ‘s.)


  • no limit is set on the number of files processed ( ^C to gracefully interrupt)
  • RAR archives nested in other archives are not supported
  • there may be a lot of false positives

Heuristics File System Secret Search


Python modules:

To install to the global packages directory:

sudo make install

(omit sudo on Windows)

To install locally (to site.USER_BASE):

make local

(Run make cleanall if you wish to clean up the project’s directory.)

blueflower can then be called from any location, assuming that the binaries are located in a directory included in your PATH.

Hashes file

Let’s say you have a list of strings that you want to search for without revealing them. These could be names, passwords, secret keys, etc. blueflower implements this feature, by taking as optional argument a list of hashes ( -H hashesfile ). Obviously this comes with a performance penalty: hashing all strings matching the regular expression given.


First, put your secret strings in a text file with one item per line, for example


Then, run

python yourfile

which will prompt you for

  • a regular expression that matches the set of strings
  • a password, which will be needed to run blueflower

This will create a file yourfile.hashes in the same directory as yourfile , to give as an argument to blueflower.


The first line of .hashes files contains the regular expression. The second line contains 2 lowercase hex strings separated by a comma (no space). These are generated by , and are respectively

  • a _ salt _ of 8 bytes (16 hex characters), generated using os.urandom(8)
  • a _ verifier _ of 8 bytes, which is the SipHash-2-2 hash of the salt using the key derived from the password

The verifier serves to ensure that the password entered is the correct one, by checking that hashing the salt using the password entered yields a value identical to the verifier. The subsequent lines include the SipHash-2-2 hashes of each of the secret strings, in the same order as received, using the verified key. For example, the first 5 lines of a .hashes file can be



Key derivation

A 128-bit key is derived from the password using SipHash-1000-20000 (the SipHash PRF with 1000 compression rounds and 20000 finalization rounds). This should be slow enough to mitigate bruteforce attacks, and the use of a salt makes precomputation useless.

SipHash-1000-20000 was chosen rather than a dedicated password hashing scheme (bcrypt/scrypt/PBKDF2) for simplicity, to minimize dependencies, and because the GPU-friendliness of SipHash can be compensated by really slow hashing. The hashing speed is mostly independent of the password’s length, since the 20000-iteration bottleneck is the finalization.

Verifier and salt

The presence of the verifier string allows to efficiently test the correctness of a key, and thus to bruteforce keys at a high rate by computing many SipHash-2-2 in parallel. However keys are 128-bit, and thus practically unbreakable. Again, the use of a salt makes precomputation useless.

The use of a same salt for both key derivation and verifier generation might be surprising, but it does not reduce security since different hash functions are used, and the unpredictability property is not affected.

Regular expression

The choice of the regular expression leaks information on the secrets strings searched for, so choose it carefully: the more general, the less leak, but the slower too (more strings will be hashed).

Log file

The log file only contains the hash corresponding to the secret string detected, not the string matched. The log file thus does not directly reveal the secrets searched for. However,

  • the log file contains the name of the file including the secret string
  • one can easily modify blueflower to include the secrets detected in the log file

Source && Download

Heuristics File System Secret Search: blueflower download