How to Scan Files and Directories for Sensitive Data
Use Blindfold SDK to find PII hiding in CSV exports, JSON fixtures, config files, and documents. Examples in Python, JavaScript, Java, Go, and .NET.
Sensitive data hides in places you do not expect. CSV exports with customer emails sitting in a shared drive. JSON fixtures with real SSNs checked into a test directory. Config files with API keys. Markdown docs with patient names from a copy-paste.
Before you can protect it, you need to find it. Here is how to scan files for PII using the Blindfold SDK.
Scan a File
from blindfold import Blindfold bf = Blindfold() with open("customers.csv") as f: for i, line in enumerate(f, 1): result = bf.detect(line.strip()) if result.detected_entities: print(f"Line {i}: {[e.type for e in result.detected_entities]}")
Line 1: ['Email Address', 'Phone Number'] Line 3: ['SSN', 'Email Address'] Line 7: ['Credit Card']
This runs offline in local mode — no API key needed. It covers 86 entity types including emails, credit cards, SSNs, IBANs, phone numbers, and passport numbers.
After You Find PII
Once you know which files contain sensitive data, you have a few options:
- Redact — replace PII with
[REDACTED]usingbf.redact()and overwrite the file. - Tokenize — replace PII with reversible tokens using
bf.tokenize(). Keep the mapping to restore originals later if needed. - Delete — if the file should not exist (e.g. a test fixture with real data), remove it.
- Move — relocate to encrypted storage with access controls.
The scan runs locally with no API key. For deeper detection of names, addresses, and medical terms, add an API key to enable NLP-powered detection alongside regex.
Start protecting sensitive data
Free plan includes 500K characters/month. No credit card required.