Breach Parser < Browser Instant >
This report details the findings and operational utility of Breach-Parser, a tool commonly used in external penetration testing to identify exposed user credentials from historical data breaches. 1. Executive Summary
Breach-Parser is a reconnaissance script designed to parse massive collections of leaked data (such as the Compilation of Many Breaches or COMB) to identify email addresses and plaintext passwords associated with a target domain. This tool is a critical component of an External Pentest Playbook used to facilitate credential-based attacks. 2. Technical Overview
The tool operates by scanning indexed breach databases to extract specific patterns:
Target Scope: Filters results based on a specific domain (e.g., @company.com).
Data Extraction: Retrieves compromised email addresses and their corresponding passwords. breach parser
Output Format: Typically generates a structured list of unique credentials that can be utilized in downstream attack phases. 3. Operational Findings
During a standard assessment, Breach-Parser serves as the primary data source for:
Credential Stuffing: Attempting to use the leaked credentials directly on target logins (e.g., VPNs, O365).
Password Spraying: Using common patterns found in the breach data (e.g., Summer2021!) to guess active passwords for discovered accounts according to Johnermac's security notes. This report details the findings and operational utility
User Identification: Building a list of valid internal usernames/emails that may not be publicly listed on the company website. 4. Risk Assessment Risk Factor Description Identity Theft
Exposed credentials allow attackers to impersonate employees. Lateral Movement
If a user reuses a breached password for internal systems, an external breach can lead to full network compromise. Credential Reuse
Statistics show high rates of password reuse across personal and corporate accounts. 5. Recommended Mitigations Use Cases
To defend against the data uncovered by Breach-Parser, organizations should implement:
Multi-Factor Authentication (MFA): The most effective defense against credential-based attacks.
Dark Web Monitoring: Utilizing platforms like the Omeal Ltd AI-Powered Platform to receive alerts when corporate emails appear in new leaks.
Password Audits: Regularly checking internal hashes against known breach databases to force resets on compromised accounts.
Security Awareness: Educating staff on the dangers of password reuse between personal and professional services.
Technical Considerations
- Memory Management: Processing massive files (10GB+) requires streaming (line-by-line) rather than loading the whole file into RAM.
- Performance: Compiled languages (Go/Rust) are preferred over interpreted languages (Python) for high-volume parsing.
- Security: The tool must be run in a sandboxed environment. Parsers often handle malicious input (e.g., path traversal strings in password fields) and must be hardened against exploits.
Use Cases
- Threat intelligence teams aggregating breach data for exposure analysis.
- Incident responders identifying compromised accounts and notifying users.
- Security researchers measuring password reuse and common weak passwords.
- Enterprises scanning internal identity stores for matches and mitigating risk.
Performance & Scale Targets
- Process 10M records/minute on 8-core / 32GB RAM
- Streaming parser (no full file load into memory)
- Parallel chunk processing for ZIP/7z archives
Example Output (Parsed):
"username": "bob", "password": "password123", "email": "bob@mail.com", "ip": "192.168.1.1"
"username": "alice", "password": "letmein", "email": "alice@work.com", "ip": null
1. Multi-Format Ingestion
- Input support:
.txt,.csv,.tsv,.json,.jsonl,.sql(INSERT statements),.dmp,.gz,.zip,.7z - Auto-detect delimiter (comma, tab, pipe, semicolon) for structured text
- Parse MongoDB dumps, Elasticsearch exports, and custom formats