Simovits

Strelka: Let us build a scanner

In this blog post, we’ll go through some information about Strelka and later on we’ll create a new scanner for Strelka.

Disclaimer

The code and integration of this solution isn’t battle tested. This setup is mainly a guide for building and adding functionality to the Strelka solution. Use this at your own risk.

Introduction to Strelka

Strelka[1] is a modular data scanning platform, allowing users or systems to submit files for the purpose of analyzing, extracting, and reporting file content and metadata. The Strelka platform is inspired by the Lockheed Martin’s Laika BOSS [2] solution.

What Strelka Actually Does

Strelka is a file analyser framework. The system runs in a docker environment. Strelka’s purpose is to perform file extraction and metadata collection at enterprise scale. Let us go through the flow how a file scan is handled in Strelka:

The Initial Tasting

First, Strelka runs what it calls ”taste” YARA rules against your file. This is its first phase. These specialized rules help Strelka quickly determine the file type, regardless of extension or header tricks malware might use. This ”tasting” happens in the taste.yara file and is configured in backend.yaml. This file contains signatures about file types for example the MZ header and an BMP-file:

The result then adds a tag for the file type for the scanners to read.

The Scanner Selection

Based on this initial identification, Strelka dynamically selects which specialized scanners to run. It maintains a catalogue of over 50 scanners for common file types for example:

The scanner’s then have the possibility to read the MIME type or the tags that the initial file identifying step runs.

The Recursive Dive

When Strelka spots a container file like a ZIP or email, it doesn’t just analyse the container, it extracts every embedded file and puts each one back through the entire analysis pipeline.

The Pattern Recognition with YARA

Finally, Strelka applies its full YARA ruleset to identify known malicious patterns. Unlike antivirus signatures that look for exact matches, these rules can spot techniques and behaviours.

The Final Report

The final output of the Strelka run is a JSON output. The Strelka could be used in a process for file scanning for example Security Onion uses Strelka [3] for analysing files from that got extracted from Zeek or Suricata.

Strelka also provided an UI, so it’s possible to analyse files by uploading a file to a UI and see the result.

Scanner implementation

Why Add ClamAV to Strelka?

This is a perfect implementation to learn how to build Strelka scanners and add the capability of integrating an Anti-virus scanner. ClamAV[4] is an open-source antivirus engine.

Building a ClamAV Strelka Scanner

Here’s our implementation of the ClamAV scanner for Strelka, in this solution we’re using the clamav-client [5]. This should be added to the folder src\python\strelka\scanners and be named for example scan_clamav.py. Also, we’re running the ClamAV as a separate container and making use of the possibility to connect to the TCP socket of clamd. See, the configuration of this container in the later part. The clamd daemon listens for incoming connections on Unix and/or TCP socket and scans files or directories on demand. Clamd is an optional feature of ClamAV providing local or remote host access to ClamAV’s scanning engine through the clamd protocol.

import os
import time
import tempfile

from clamav_client import get_scanner
from clamav_client.clamd import BufferTooLongError, CommunicationError

from strelka import strelka


class ScanClamav(strelka.Scanner):
    """
    Scans files with the ClamAV antivirus daemon.
    
    Options:
        host: Hostname for ClamAV daemon connection.
            Defaults to 'clamav' as defined in the docker-compose.
        port: Port for ClamAV daemon connection.
            Defaults to 3310.
        max_bytes: Maximum number of bytes to scan.
            Defaults to 20000000 (20MB).
        include_passed: Include clean results in output.
            Defaults to False.
        tempfile_directory: Directory for temporary files.
            Defaults to '/tmp/'.
    """

    def scan(self, data, file, options, expire_at):
        """
        Scans the file data with ClamAV.
        """
        start = time.time()
        self.event = {}
        self.flags = []

        # Get configuration options
        host = options.get("host", "clamav")
        port = options.get("port", 3310)
        max_bytes = options.get("max_bytes", 20000000)
        include_passed = options.get("include_passed", False)
        tempfile_directory = options.get("tempfile_directory", "/tmp/")
        
        # TCP address format
        address = f"{host}:{port}"

        try:
            # Initialize scanner
            try:
                scanner = get_scanner({
                    "backend": "clamd",
                    "address": address,
                    "stream": True,
                    "timeout": 60.0,
                })
                
                # Check scanner connectivity
                scanner_info = scanner.info()
                self.event["scanner_info"] = {
                    "name": scanner_info.name,
                    "version": scanner_info.version,
                }
                if scanner_info.virus_definitions:
                    self.event["scanner_info"]["virus_definitions"] = scanner_info.virus_definitions
            except Exception as e:
                self.flags.append("clamd_initialization_error")
                self.event["error"] = f"Scanner initialization error: {str(e)}"
                self.event["elapsed"] = time.time() - start
                return
            
            # Apply max_bytes limit if needed
            if max_bytes > 0 and len(data) > max_bytes:
                scan_data = data[:max_bytes]
                self.flags.append("max_bytes_limited")
            else:
                scan_data = data
            
            # Create a temporary file for scanning
            scan_result = None
            with tempfile.NamedTemporaryFile(dir=tempfile_directory, delete=False) as temp_file:
                try:
                    temp_file.write(scan_data)
                    temp_file.flush()
                    temp_file_path = temp_file.name
                
                    # Scan the file
                    scan_result = scanner.scan(temp_file_path)
                finally:
                    # Clean up the temp file
                    try:
                        os.unlink(temp_file_path)
                    except (OSError, NameError):
                        pass
            
            # Process the scan results
            if scan_result:
                if scan_result.state == "OK":
                    # No virus found
                    self.event["scan_result"] = "clean"
                    if include_passed:
                        self.event["virus_name"] = None
                
                elif scan_result.state == "FOUND":
                    # Virus found
                    virus_name = scan_result.details
                    self.flags.append(f"virus_detected:{virus_name}")
                    self.event["scan_result"] = "infected"
                    self.event["virus_name"] = virus_name
                
                elif scan_result.state == "ERROR":
                    # Error during scanning
                    self.flags.append(f"clamd_error:{scan_result.details}")
                    self.event["scan_result"] = "error"
                    self.event["error"] = scan_result.details
                
                # Add scan_passed field based on the ScanResult.passed property
                if scan_result.passed is not None:
                    self.event["scan_passed"] = scan_result.passed
            else:
                # No scan result available
                self.flags.append("no_scan_result")
                self.event["scan_result"] = "error"
                self.event["error"] = "No scan result returned"
            
        except strelka.ScannerTimeout:
            raise
        except BufferTooLongError as e:
            self.flags.append("buffer_too_long")
            self.event["error"] = f"Buffer too long: {str(e)}"
        except CommunicationError as e:
            self.flags.append("clamd_communication_error")
            self.event["error"] = f"Communication error: {str(e)}"
        except ValueError as e:
            self.flags.append("value_error")
            self.event["error"] = f"Value error: {str(e)}"
        except IOError as e:
            self.flags.append("io_error")
            self.event["error"] = f"IO error: {str(e)}"
        except Exception as e:
            self.flags.append("scan_error")
            self.event["error"] = f"Unexpected error: {str(e)}"
        
        # Add elapsed time
        self.event["elapsed"] = time.time() – start

Make the solution work

To integrate our scanner with Strelka, we need to:

  1. Add to Docker Compose: Include ClamAV in our deployment
  2. Update the Strelka Backend Configuration: Register our scanner
  3. Update the poetry project files: Update the imports

Docker Compose Update

Add the following to your docker-compose.yaml file:

  clamav:
    image: clamav/clamav:latest
    container_name: strelka-clamav
    #volumes:
      # - clamav_data:/var/lib/clamav #Add if you need to persist the clamav db between restarts
    ports:
      - "3310:3310"
    environment:
      - CLAMAV_NO_FRESHCLAM=false  # Enable automatic updates
    healthcheck:
      test: ["CMD", "clamdscan", "--ping", "-c", "/etc/clamav/clamd.conf"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 60s
    networks:
      - net

Configuration update

Add the following to your backend.yaml:

  'ScanClamav':
    - positive:
        flavors:
          - "*"   //Make it scan all files
      priority: 5
      options:
        tcp_host: "clamav"
        tcp_port: 3310
        tempfile_directory: "/dev/shm/"
        max_bytes: 20000000
        include_passed: false


Update the poetry project files

Add the import into the pyproject.toml “clamav-client = "0.6.3" “ and build a new poetry.lock file.

Example results from the ClamAV scanner

The ClamAV scanner produces results like this, these examples contain a sample where the scan results are clean and infected:

Scan with clean result from ClamAV:

Scan with non-clean result from ClamAV:

Conclusion

In this blog we showed an implementation of the ClamAV scanner into Strelka. This was a way for me to learn how to implement a new scanner into Strelka. Hopefully you learned something as well. The implementation is quite simple and add the ClamAV as a separate docker container. This could also, be built into the Strelka backend if needed. If that solution is choosen then the scanner’s code and configurations need some small modification to work. Also, this implementation doesn’t provide any tests for the solution, had some problems with this as the tests run the implementation and tries to call the scanner and the scanner calls the clamd TCP socket which when running the tests isn’t available.

References

  1. https://target.github.io/strelka/
  2. https://github.com/lmco/laikaboss
  3. https://docs.securityonion.net/en/2.4/strelka.html
  4. https://docs.clamav.net/
  5. https://github.com/artefactual-labs/clamav-client