PrivacyMarch 3, 20266 min read

How to Scan Files and Directories for Sensitive Data

Use Blindfold SDK to find PII hiding in CSV exports, JSON fixtures, config files, and documents. Examples in Python, JavaScript, Java, Go, and .NET.

Sensitive data hides in places you do not expect. CSV exports with customer emails sitting in a shared drive. JSON fixtures with real SSNs checked into a test directory. Config files with API keys. Markdown docs with patient names from a copy-paste.

Before you can protect it, you need to find it. Here is how to scan files for PII using the Blindfold SDK.

Scan a File

from blindfold import Blindfold

bf = Blindfold()

with open("customers.csv") as f:
    for i, line in enumerate(f, 1):
        result = bf.detect(line.strip())
        if result.detected_entities:
            print(f"Line {i}: {[e.type for e in result.detected_entities]}")

output

Line 1: ['Email Address', 'Phone Number']
Line 3: ['SSN', 'Email Address']
Line 7: ['Credit Card']

This runs offline in local mode — no API key needed. It covers 86 entity types including emails, credit cards, SSNs, IBANs, phone numbers, and passport numbers.

After You Find PII

Once you know which files contain sensitive data, you have a few options:

Redact — replace PII with [REDACTED] using bf.redact() and overwrite the file.
Tokenize — replace PII with reversible tokens using bf.tokenize(). Keep the mapping to restore originals later if needed.
Delete — if the file should not exist (e.g. a test fixture with real data), remove it.
Move — relocate to encrypted storage with access controls.

The scan runs locally with no API key. For deeper detection of names, addresses, and medical terms, add an API key to enable NLP-powered detection alongside regex.

Start protecting sensitive data

Free plan includes 500K characters/month. No credit card required.