Strelka: Let us build a scanner
In this blog post, we’ll go through some information about Strelka and later on we’ll create a new scanner for Strelka.
Disclaimer
The code and integration of this solution isn’t battle tested. This setup is mainly a guide for building and adding functionality to the Strelka solution. Use this at your own risk.
Introduction to Strelka
Strelka[1] is a modular data scanning platform, allowing users or systems to submit files for the purpose of analyzing, extracting, and reporting file content and metadata. The Strelka platform is inspired by the Lockheed Martin’s Laika BOSS [2] solution.

What Strelka Actually Does
Strelka is a file analyser framework. The system runs in a docker environment. Strelka’s purpose is to perform file extraction and metadata collection at enterprise scale. Let us go through the flow how a file scan is handled in Strelka:
The Initial Tasting
First, Strelka runs what it calls ”taste” YARA rules against your file. This is its first phase. These specialized rules help Strelka quickly determine the file type, regardless of extension or header tricks malware might use. This ”tasting” happens in the taste.yara file and is configured in backend.yaml. This file contains signatures about file types for example the MZ header and an BMP-file:

The result then adds a tag for the file type for the scanners to read.

The Scanner Selection
Based on this initial identification, Strelka dynamically selects which specialized scanners to run. It maintains a catalogue of over 50 scanners for common file types for example:
- ScanPe for Windows executables
- ScanOle for Microsoft Office documents
- ScanPdf for PDF analysis
- ScanZip for compressed archives
- ScanJavascript for web scripts
The scanner’s then have the possibility to read the MIME type or the tags that the initial file identifying step runs.

The Recursive Dive
When Strelka spots a container file like a ZIP or email, it doesn’t just analyse the container, it extracts every embedded file and puts each one back through the entire analysis pipeline.
The Pattern Recognition with YARA
Finally, Strelka applies its full YARA ruleset to identify known malicious patterns. Unlike antivirus signatures that look for exact matches, these rules can spot techniques and behaviours.
The Final Report
The final output of the Strelka run is a JSON output. The Strelka could be used in a process for file scanning for example Security Onion uses Strelka [3] for analysing files from that got extracted from Zeek or Suricata.
Strelka also provided an UI, so it’s possible to analyse files by uploading a file to a UI and see the result.

Scanner implementation
Why Add ClamAV to Strelka?
This is a perfect implementation to learn how to build Strelka scanners and add the capability of integrating an Anti-virus scanner. ClamAV[4] is an open-source antivirus engine.
Building a ClamAV Strelka Scanner
Here’s our implementation of the ClamAV scanner for Strelka, in this solution we’re using the clamav-client [5]. This should be added to the folder src\python\strelka\scanners and be named for example scan_clamav.py. Also, we’re running the ClamAV as a separate container and making use of the possibility to connect to the TCP socket of clamd. See, the configuration of this container in the later part. The clamd daemon listens for incoming connections on Unix and/or TCP socket and scans files or directories on demand. Clamd is an optional feature of ClamAV providing local or remote host access to ClamAV’s scanning engine through the clamd protocol.
import os
import time
import tempfile
from clamav_client import get_scanner
from clamav_client.clamd import BufferTooLongError, CommunicationError
from strelka import strelka
class ScanClamav(strelka.Scanner):
"""
Scans files with the ClamAV antivirus daemon.
Options:
host: Hostname for ClamAV daemon connection.
Defaults to 'clamav' as defined in the docker-compose.
port: Port for ClamAV daemon connection.
Defaults to 3310.
max_bytes: Maximum number of bytes to scan.
Defaults to 20000000 (20MB).
include_passed: Include clean results in output.
Defaults to False.
tempfile_directory: Directory for temporary files.
Defaults to '/tmp/'.
"""
def scan(self, data, file, options, expire_at):
"""
Scans the file data with ClamAV.
"""
start = time.time()
self.event = {}
self.flags = []
# Get configuration options
host = options.get("host", "clamav")
port = options.get("port", 3310)
max_bytes = options.get("max_bytes", 20000000)
include_passed = options.get("include_passed", False)
tempfile_directory = options.get("tempfile_directory", "/tmp/")
# TCP address format
address = f"{host}:{port}"
try:
# Initialize scanner
try:
scanner = get_scanner({
"backend": "clamd",
"address": address,
"stream": True,
"timeout": 60.0,
})
# Check scanner connectivity
scanner_info = scanner.info()
self.event["scanner_info"] = {
"name": scanner_info.name,
"version": scanner_info.version,
}
if scanner_info.virus_definitions:
self.event["scanner_info"]["virus_definitions"] = scanner_info.virus_definitions
except Exception as e:
self.flags.append("clamd_initialization_error")
self.event["error"] = f"Scanner initialization error: {str(e)}"
self.event["elapsed"] = time.time() - start
return
# Apply max_bytes limit if needed
if max_bytes > 0 and len(data) > max_bytes:
scan_data = data[:max_bytes]
self.flags.append("max_bytes_limited")
else:
scan_data = data
# Create a temporary file for scanning
scan_result = None
with tempfile.NamedTemporaryFile(dir=tempfile_directory, delete=False) as temp_file:
try:
temp_file.write(scan_data)
temp_file.flush()
temp_file_path = temp_file.name
# Scan the file
scan_result = scanner.scan(temp_file_path)
finally:
# Clean up the temp file
try:
os.unlink(temp_file_path)
except (OSError, NameError):
pass
# Process the scan results
if scan_result:
if scan_result.state == "OK":
# No virus found
self.event["scan_result"] = "clean"
if include_passed:
self.event["virus_name"] = None
elif scan_result.state == "FOUND":
# Virus found
virus_name = scan_result.details
self.flags.append(f"virus_detected:{virus_name}")
self.event["scan_result"] = "infected"
self.event["virus_name"] = virus_name
elif scan_result.state == "ERROR":
# Error during scanning
self.flags.append(f"clamd_error:{scan_result.details}")
self.event["scan_result"] = "error"
self.event["error"] = scan_result.details
# Add scan_passed field based on the ScanResult.passed property
if scan_result.passed is not None:
self.event["scan_passed"] = scan_result.passed
else:
# No scan result available
self.flags.append("no_scan_result")
self.event["scan_result"] = "error"
self.event["error"] = "No scan result returned"
except strelka.ScannerTimeout:
raise
except BufferTooLongError as e:
self.flags.append("buffer_too_long")
self.event["error"] = f"Buffer too long: {str(e)}"
except CommunicationError as e:
self.flags.append("clamd_communication_error")
self.event["error"] = f"Communication error: {str(e)}"
except ValueError as e:
self.flags.append("value_error")
self.event["error"] = f"Value error: {str(e)}"
except IOError as e:
self.flags.append("io_error")
self.event["error"] = f"IO error: {str(e)}"
except Exception as e:
self.flags.append("scan_error")
self.event["error"] = f"Unexpected error: {str(e)}"
# Add elapsed time
self.event["elapsed"] = time.time() – start
Make the solution work
To integrate our scanner with Strelka, we need to:
- Add to Docker Compose: Include ClamAV in our deployment
- Update the Strelka Backend Configuration: Register our scanner
- Update the poetry project files: Update the imports
Docker Compose Update
Add the following to your docker-compose.yaml file:
clamav:
image: clamav/clamav:latest
container_name: strelka-clamav
#volumes:
# - clamav_data:/var/lib/clamav #Add if you need to persist the clamav db between restarts
ports:
- "3310:3310"
environment:
- CLAMAV_NO_FRESHCLAM=false # Enable automatic updates
healthcheck:
test: ["CMD", "clamdscan", "--ping", "-c", "/etc/clamav/clamd.conf"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s
networks:
- net
Configuration update
Add the following to your backend.yaml:
'ScanClamav':
- positive:
flavors:
- "*" //Make it scan all files
priority: 5
options:
tcp_host: "clamav"
tcp_port: 3310
tempfile_directory: "/dev/shm/"
max_bytes: 20000000
include_passed: false
Update the poetry project files
Add the import into the pyproject.toml “clamav-client = "0.6.3" “ and build a new poetry.lock file.
Example results from the ClamAV scanner
The ClamAV scanner produces results like this, these examples contain a sample where the scan results are clean and infected:
Scan with clean result from ClamAV:

Scan with non-clean result from ClamAV:

Conclusion
In this blog we showed an implementation of the ClamAV scanner into Strelka. This was a way for me to learn how to implement a new scanner into Strelka. Hopefully you learned something as well. The implementation is quite simple and add the ClamAV as a separate docker container. This could also, be built into the Strelka backend if needed. If that solution is choosen then the scanner’s code and configurations need some small modification to work. Also, this implementation doesn’t provide any tests for the solution, had some problems with this as the tests run the implementation and tries to call the scanner and the scanner calls the clamd TCP socket which when running the tests isn’t available.
References