Artifact Collection Feature
- Overview
- Usage
- Output Structure
- Artifact Organization
- Artifact Manifest
- Metadata Preservation
- Collection Process
- Filtering
- Integration with Profiles
- Use Cases
- Chain of Custody
- Performance Considerations
- Security Considerations
- Limitations
- Future Enhancements
- Example Workflow
- Troubleshooting
Overview
The artifact collection feature allows sus to copy files that match analysis patterns while preserving their metadata. This is inspired by forensic tools like UAC (Unix-like Artifacts Collector) and KAPE (Kroll Artifact Parser and Extractor).
Usage
Enable artifact collection with the --collect flag:
# Basic collection
sus /path/to/analyze --collect --output-dir ./investigation
# With profiles
sus /evidence --profile profiles/composite/forensic-investigation.toml --collect --output-dir ./collected
# Incident response collection
sus /compromised-system --profile profiles/composite/incident-response.toml --collect
Output Structure
When collection is enabled, artifacts are organized in the output directory:
output/
├── analysis.db # Analysis database
├── files/ # Analyzed file metadata
├── collected/ # Collected artifacts (NEW)
│ ├── default/ # Files without specific tags
│ ├── server1/ # Files tagged as 'server1'
│ └── manifest.json # Collection manifest
└── manifest.json # Artifact manifest
Artifact Organization
Collected artifacts are organized by tag and filename:
- Files are grouped by their assigned tags (using
--tag-diroption) - Filenames include SHA256 hash to prevent conflicts
- Original directory structure is flattened for easier review
Example:
collected/
├── server1/
│ ├── a1b2c3d4...xyz_suspicious.exe
│ ├── e5f6g7h8...abc_malware.dll
│ └── ...
└── server2/
├── 9i0j1k2l...def_backdoor.sh
└── ...
Artifact Manifest
The manifest (collected/manifest.json) contains detailed information about each collected artifact:
{
"version": "1.0",
"collection_started": "2024-01-15T10:30:00Z",
"collection_completed": "2024-01-15T11:45:00Z",
"artifact_count": 42,
"total_size": 1048576,
"artifacts": [
{
"original_path": "/path/to/file.exe",
"collected_path": "/output/collected/server1/abc123...xyz_file.exe",
"sha256": "abc123...",
"file_size": 24576,
"created": "2024-01-10T08:00:00Z",
"modified": "2024-01-14T15:30:00Z",
"accessed": "2024-01-15T10:25:00Z",
"permissions": 755,
"uid": 1000,
"gid": 1000,
"collection_timestamp": "2024-01-15T10:35:00Z",
"tag": "server1",
"mime_type": "application/x-executable"
}
],
"notes": []
}
Metadata Preservation
The collection feature preserves file metadata:
All Platforms
- File size
- Creation time (if available)
- Modification time
- Access time
- SHA256 hash
- MIME type
Unix/Linux
- File permissions (mode)
- User ID (UID)
- Group ID (GID)
Notes
- Ownership preservation (chown) typically requires root privileges
- Timestamps are preserved where the filesystem supports it
- Symbolic links are not followed (only their metadata is collected)
Collection Process
- Analysis Phase: Files are analyzed normally using profiles and patterns
- Pattern Matching: Files with pattern matches are identified
- Collection Phase: During cleanup, matched files are collected:
- Query database for files with pattern matches
- Exclude extracted/decoded files (collect originals only)
- Copy files while preserving metadata
- Create manifest records
- Manifest Saving: Complete manifest is saved as JSON
Filtering
Collection automatically filters files:
- Only files with pattern matches are collected
- Extracted files from archives are excluded (originals collected instead)
- Decoded files are excluded (originals collected instead)
- Files that no longer exist are skipped
Integration with Profiles
Collection works with all profiles:
# Collect malware samples
sus /samples --profile profiles/base/malware.toml --collect
# Collect PII violations
sus /data --profile profiles/composite/pci-compliance.toml --collect
# Comprehensive forensic collection
sus /evidence --profile profiles/composite/forensic-investigation.toml --collect
Use Cases
Incident Response
Collect evidence of compromise:
sus /var/log --profile profiles/composite/incident-response.toml \
--collect --output-dir ./ir-evidence
Compliance Auditing
Collect files with PII or sensitive data:
sus /share/documents --profile profiles/composite/pci-compliance.toml \
--collect --output-dir ./compliance-violations
Malware Triage
Collect suspicious executables:
sus /downloads --profile profiles/base/malware.toml \
--collect --output-dir ./malware-samples
Multi-System Collection
Tag and collect from multiple systems:
sus /mnt/server1 --tag-dir '/mnt/server1:server1' \
/mnt/server2 --tag-dir '/mnt/server2:server2' \
--profile profiles/composite/forensic-investigation.toml \
--collect --output-dir ./multi-system-collection
Chain of Custody
The manifest provides chain of custody information:
- Original file path and collection time
- File hashes for integrity verification
- Metadata snapshots at collection time
- Collection version and tool information
Performance Considerations
- Collection happens after analysis completes
- Files are copied using spawn_blocking for async efficiency
- Large files are handled with memory-mapped I/O during analysis
- Manifest is written once at the end of collection
Security Considerations
- Collected files may contain malware - handle with care
- Collection does not sanitize or quarantine files
- Preserve the collected directory with appropriate permissions
- Consider encrypting the collected artifacts directory
- Verify manifest hashes before using collected artifacts
Limitations
- Does not create forensic images (E01, AFF) - files are copied as-is
- Does not preserve NTFS Alternate Data Streams (ADS)
- Does not preserve extended attributes on all platforms
- Requires sufficient disk space for collected artifacts
- Ownership preservation requires appropriate privileges
Future Enhancements
Planned improvements:
- Support for forensic image formats (E01, AFF)
- Preservation of NTFS ADS and extended attributes
- Encryption of collected artifacts
- Compression of collection
- Incremental collection (collect only new matches)
- Collection reports in PDF/HTML format
Example Workflow
Complete incident response workflow:
# 1. Analyze and collect
sus /compromised-system \
--profile profiles/composite/incident-response.toml \
--collect \
--output-dir ./ir-$(date +%Y%m%d-%H%M%S)
# 2. Review manifest
cat ./ir-*/collected/manifest.json | jq '.artifact_count'
# 3. Extract specific artifacts
cat ./ir-*/collected/manifest.json | \
jq -r '.artifacts[] | select(.mime_type | contains("executable")) | .collected_path'
# 4. Generate report
sus --server-only --output-dir ./ir-*
# Access http://localhost:8080 to review findings
Troubleshooting
No artifacts collected
- Verify files match patterns (check analysis.db)
- Ensure
--collectflag is specified - Check file permissions for reading source files
Missing metadata
- Some filesystems don't support all metadata
- Creation time may not be available on all platforms
- Extended attributes require platform-specific support
Permission errors
- Ensure read access to source files
- Ensure write access to output directory
- Ownership preservation requires root/admin
Disk space issues
- Monitor available space before collection
- Use
--max-file-sizeto limit large files - Consider selective profiles to reduce matches