Welcome to the official documentation for Strelka, an advanced tool for automated malware analysis. This documentation aims to provide comprehensive insights into the functionality and usage of Strelka, facilitating ease of use and development.
- Overview
- How Docs Work
- Running Docs Locally
- Automated Pipeline
- Documentation Format
- Backend Configuration
Strelka is designed for large scale file analysis, providing robust scanning capabilities across various file types. The project's documentation is automatically generated and updated through GitHub Actions the latest changes in Target's Strelka repository.
strelka-docs
builds and publishes new documents to the gh-pages
branch.
This branch is hosted on GitHub.
Documentation for Strelka is automatically generated:
- Daily @ 2 AM UTC
- On Push to Branchs: [
gh-pages
,main
]
To set up and view the documentation locally, follow these steps:
-
Install Poetry
Download and install Poetry, a tool for handling Python package dependencies.
curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python -
-
Clone the Strelka Repository
Obtain the latest version of the
strelka
code from its repository.git clone https://github.com/target/strelka
-
Install Dependencies
Use Poetry to install the necessary dependencies for running the documentation locally.
poetry install
-
(Optional) Replace Scanners
If you need to develop or test documentation for specific scanners, modify the scanner in the
strelka/scanner
folder. -
Build the Documentation
Generate the latest version of the documentation by running the build script. This will create new
.md
files based on all of the scanner code.python ./build_docs.py
-
Start the Local Mkdocs Server
Use Poetry to run the Mkdocs server and view the documentation locally.
poetry run mkdocs serve
-
Access the Documentation
Open your web browser and go to
http://127.0.0.1:8000/target/strelka/
to view the local documentation.
Documented based on Google docstrings guidelines, including:
- Description: A concise overview of the scanner's purpose and functionality.
- Includes Scanner Type: Collection or Malware
- Attributes: Details about the scanner's attributes that define its behavior. Usually found outside functions or inside init. (Can be None)
- Other Parameters: Details about the scanner's options. Can usually be found defined at the top of the
scan
class or inside thebackend.yml
. - Detection Use Cases: Examples of potential use cases for the scanner, highlighting its detection capabilities.
- Known Limitations: Acknowledgment of any limitations or areas for improvement in the scanner's functionality. (Can be None)
- Todo: List of potential script improvements / future implementations (Can be None)
- References: List of references used to develop / describe the scanner (Can be None)
- Contributors: List of users that have assisted in the development of the scanner.
class ScanEmail(strelka.Scanner):
"""
Extracts and analyzes metadata, attachments, and optionally generates thumbnails from email messages.
This scanner processes email files to extract and analyze metadata, attachments, and optionally generates
thumbnail images of the email content for a visual overview. It supports both plain text and HTML emails,
including inline images.
Scanner Type: Collection
## Options
Attributes:
None
Other Parameters:
create_thumbnail (bool): Indicates whether a thumbnail should be generated for the email content.
thumbnail_header (bool): Indicates whether email header information should be included in the thumbnail.
thumbnail_size (int): Specifies the dimensions for the generated thumbnail images.
## Detection Use Cases
!!! info "Detection Use Cases"
- **Document Extraction**
- Extracts and analyzes documents, including attachments, from email messages for content review.
- **Thumbnail Generation**
- Optionally generates thumbnail images of email content for visual analysis, which can be useful for
quickly identifying the content of emails.
- **Email Header Analysis**
- Analyzes email headers for potential indicators of malicious activity, such as suspicious sender addresses
or subject lines.
## Known Limitations
!!! warning "Known Limitations"
- **Email Encoding and Complex Structures**
- Limited support for certain email encodings or complex email structures.
- **Thumbnail Accuracy**
- Thumbnail generation may not accurately represent the email content in all cases,
especially for emails with complex layouts or embedded content.
- **Limited Output**
- Content is limited to a set amount of characters to prevent excessive output.
## To Do
!!! question "To Do"
- **Improve Error Handling**:
- Enhance error handling for edge cases and complex email structures.
- **Enhance Support for Additional Email Encodings and Content Types**:
- Expand support for various email encodings and content types to improve scanning accuracy.
## References
!!! quote "References"
- [Python Email Parsing Documentation](https://docs.python.org/3/library/email.html)
- [WeasyPrint Documentation](https://doc.courtbouillon.org/weasyprint/stable/)
- [PyMuPDF (fitz) Documentation](https://pymupdf.readthedocs.io/en/latest/)
## Contributors
!!! example "Contributors"
- [Josh Liburdi](https://github.com/jshlbrd)
- [Paul Hutelmyer](https://github.com/phutelmyer)
- [Ryan O'Horo](https://github.com/ryanohoro)
"""
Outlines function purposes, arguments, and return values, promoting clarity and ease of use.
"""
Performs the scan operation on batch file data, extracting and categorizing different types of tokens.
Args:
data (bytes): The batch file data as a byte string.
file (strelka.File): The file object to be scanned.
options (dict): Options for customizing the scan. These options can dictate specific behaviors
like which tokens to prioritize or ignore.
expire_at (datetime): Expiration timestamp for the scan result. This is used to determine when
the scan result should be considered stale or outdated.
"""