← Back to blog
HIPAAFebruary 12, 202610 min read

HIPAA and AI: Protecting PHI in US Healthcare Apps

How to build AI-powered healthcare applications that protect PHI. Covers the hipaa_us policy, encryption mode, Safe Harbor de-identification, and BAA readiness.

AI is transforming healthcare at a staggering pace. From clinical note summarization and diagnostic support to patient-facing chatbots and drug interaction analysis, developers are building AI features that touch patient data every day. But healthcare data comes with the strictest regulatory framework in the US: HIPAA.

The penalties for getting it wrong are severe. Civil penalties range from $100 to $50,000 per violation, with an annual maximum of $1.5 million per violation category. Criminal penalties can include fines up to $250,000 and imprisonment. And beyond the fines, a single breach can destroy patient trust and end a healthcare startup.

The core problem for AI developers is simple: every time you send patient data to an LLM, you risk exposing Protected Health Information (PHI). Even if the AI provider promises not to log inputs, HIPAA requires you to minimize disclosure and maintain a chain of accountability. This article shows you how to build HIPAA-compliant AI features using Blindfold to strip PHI before it ever reaches a model.

HIPAA Safe Harbor De-identification

HIPAA provides two methods for de-identifying health information: Expert Determination (a qualified statistician certifies the data cannot be used to identify an individual) and Safe Harbor (you remove a defined list of 18 identifiers). For developers building software, Safe Harbor is the practical choice because it gives you a concrete, auditable checklist rather than requiring an expensive expert review for every dataset.

The 18 Safe Harbor identifiers are:

  1. Names
  2. Geographic data smaller than a state (street address, city, zip code)
  3. Dates (except year) related to an individual (birth date, admission date, discharge date, date of death)
  4. Phone numbers
  5. Fax numbers
  6. Email addresses
  7. Social Security numbers
  8. Medical record numbers
  9. Health plan beneficiary numbers
  10. Account numbers
  11. Certificate/license numbers
  12. Vehicle identifiers and serial numbers
  13. Device identifiers and serial numbers
  14. Web URLs
  15. IP addresses
  16. Biometric identifiers
  17. Full-face photographs and comparable images
  18. Any other unique identifying number, characteristic, or code

If you remove all 18 identifier types from a record and have no actual knowledge that the remaining information could identify a patient, the data is considered de-identified under HIPAA. De-identified data is no longer PHI and falls outside HIPAA regulation entirely. This is the key insight for AI developers: de-identified text can be sent to any AI provider without triggering HIPAA obligations.

Using Blindfold for HIPAA Compliance

Blindfold provides a dedicated hipaa_us policy that automatically detects all 18 Safe Harbor identifiers. Instead of building custom regex patterns or maintaining a hand-tuned NER pipeline for each identifier type, you set the policy once and Blindfold handles detection using its GLiNER-based machine learning engine.

The workflow is straightforward: tokenize PHI before sending text to an AI model, get the AI response with tokens instead of real data, then detokenize when you need to display real values to an authorized clinician.

python
from blindfold import Blindfold
from openai import OpenAI

client = Blindfold(
    api_key="your-api-key",
    region="us"
)

# Clinical note containing PHI
note = """Patient John Smith (DOB: 03/15/1982, SSN: 123-45-6789)
presented with chest pain. Dr. Emily Chen ordered an ECG.
Contact: john.smith@email.com, (555) 867-5309.
MRN: 4820193. Address: 742 Evergreen Terrace, Springfield, IL 62704."""

# Tokenize using the hipaa_us policy
result = client.tokenize(
    text=note,
    policy="hipaa_us"
)

# All 18 HIPAA identifiers detected and replaced:
# "John Smith"          -> <Person_1>
# "03/15/1982"          -> <Date Of Birth_1>
# "123-45-6789"         -> <Ssn_1>
# "Dr. Emily Chen"      -> <Person_2>
# "john.smith@email.com"-> <Email Address_1>
# "(555) 867-5309"      -> <Phone Number_1>
# "4820193"             -> <Medical Record Number_1>
# "742 Evergreen..."    -> <Address_1>

# Now safe to send to any AI model
ai = OpenAI()
response = ai.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": result.tokenized_text}]
)

# Detokenize to restore PHI for the clinician
restored = client.detokenize(
    text=response.choices[0].message.content,
    token_map=result.token_map
)

The AI model only ever sees tokens like <Person_1> and <Ssn_1> instead of real patient data. The token map stays in your infrastructure, and you only detokenize when displaying results to authorized users.

Encryption Mode for PHI

Tokenization replaces PHI with readable placeholders, which is ideal when you want the AI model to understand the structure of the text. But in some cases, you need the PHI to be cryptographically protected rather than replaced with human-readable tokens. For example, when storing de-identified records in a database, transmitting them through an untrusted channel, or archiving clinical data for later processing.

Blindfold's encrypt operation uses AES/Fernet encryption to replace each detected identifier with an encrypted string. Only your encryption key can decrypt the values later.

python
# Encrypt PHI instead of tokenizing
encrypted = client.encrypt(
    text=note,
    policy="hipaa_us"
)

# PHI is now cryptographically unreadable
# "John Smith" -> "gAAAAABl..."
# Clinical content (diagnoses, medications) stays readable

# Decrypt when authorized personnel need the original data
decrypted = client.decrypt(
    text=encrypted.encrypted_text
)

When to use which: Use tokenize when you need the AI model to process and reason about the text structure — the model can see that <Person_1> is a person and work with the context. Use encrypt when you need to store or transmit data and want PHI to be cryptographically unrecoverable without your key.

US Region Processing

For healthcare applications, data residency is not optional in practice. While HIPAA itself does not explicitly mandate US-only processing, most healthcare organizations, compliance officers, and insurance contracts require it. Blindfold offers a dedicated US regional endpoint at us-api.blindfold.dev where all processing happens on US-based infrastructure. Your PHI never leaves US jurisdiction during the detection and tokenization step.

Configuring the SDK for US region is a single parameter:

python
from blindfold import Blindfold

# All requests go to us-api.blindfold.dev
client = Blindfold(
    api_key="your-api-key",
    region="us"
)

This is especially important when signing a Business Associate Agreement (BAA). A BAA typically specifies where data will be processed and stored. Being able to guarantee US-only processing simplifies the agreement and satisfies compliance officers during audits.

BAA Readiness

Under HIPAA, any service that creates, receives, maintains, or transmits PHI on behalf of a covered entity is a Business Associate and must sign a BAA. A Business Associate Agreement typically covers:

  • Permitted uses and disclosures of PHI
  • Safeguards the business associate will implement
  • Breach notification procedures and timelines
  • Return or destruction of PHI upon contract termination
  • Subcontractor obligations (downstream BAAs)

Blindfold's architecture simplifies BAA obligations significantly. The API operates on a zero-retention model: text is processed in memory, tokens or encrypted values are returned in the response, and no PHI is stored on Blindfold's servers. This means the “return or destruction of PHI” clause is satisfied by default — there is nothing to return or destroy. The attack surface for breach notification is minimal because no patient data persists beyond the API call.

Need a BAA? Blindfold offers BAAs for healthcare customers on paid plans. Contact support@blindfold.dev to start the process. Include your organization name, covered entity status, and expected data volume.

Building a Compliant Healthcare AI Pipeline

Let's walk through a complete end-to-end example. A patient intake system where a clinician enters notes, an AI model analyzes the notes for potential diagnoses and recommended follow-ups, and the results are displayed back to the clinician with real patient information restored.

python
from blindfold import Blindfold
from openai import OpenAI

blindfold = Blindfold(api_key="your-api-key", region="us")
openai_client = OpenAI()

def analyze_intake(clinician_notes: str) -> str:
    """Full HIPAA-compliant AI analysis pipeline."""

    # Step 1: Tokenize all PHI from the clinical notes
    tokenized = blindfold.tokenize(
        text=clinician_notes,
        policy="hipaa_us"
    )

    # Step 2: Send de-identified text to AI for analysis
    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": "You are a clinical decision support assistant. "
                    "Analyze the intake notes and suggest potential "
                    "diagnoses and recommended follow-up tests. "
                    "Preserve all patient identifier tokens as-is."
            },
            {
                "role": "user",
                "content": tokenized.tokenized_text
            }
        ]
    )

    ai_output = response.choices[0].message.content

    # Step 3: Detokenize to restore real PHI for the clinician
    restored = blindfold.detokenize(
        text=ai_output,
        token_map=tokenized.token_map
    )

    return restored.text


# Example: patient intake with multiple PHI types
notes = """New patient Maria Garcia, DOB 11/22/1975, SSN 987-65-4321.
Chief complaint: persistent cough for 3 weeks, low-grade fever.
History of asthma. Current medications: albuterol inhaler.
Phone: (555) 234-5678. Insurance ID: BCBS-449281.
Referring physician: Dr. Robert Kim, Springfield Medical Group."""

result = analyze_intake(notes)
# The clinician sees the AI's assessment with Maria Garcia's
# real name, DOB, and all other identifiers restored
print(result)

In this pipeline, OpenAI never sees Maria Garcia's name, SSN, date of birth, phone number, or insurance ID. The AI model receives tokens like <Person_1>, <Ssn_1>, and <Phone Number_1> instead. When the AI response comes back, Blindfold swaps the tokens back so the clinician sees the full, personalized output. This pattern works for any AI use case in healthcare: clinical note summarization, diagnostic support, patient communication drafting, medical coding assistance, or clinical trial matching.

HIPAA Compliance Checklist for AI Apps

Use this checklist when building any AI feature that touches patient data:

  1. De-identify before AI processing. Tokenize or encrypt all PHI before sending text to any LLM or third-party AI service. Use the hipaa_us policy to cover all 18 Safe Harbor identifiers automatically.
  2. Use US region processing. Configure your Blindfold client with region="us" to ensure PHI never leaves US infrastructure during the detection step.
  3. Sign a BAA with every vendor that touches PHI. This includes your AI provider, your PII protection service, your cloud host, and any analytics tools. If a vendor refuses to sign a BAA, they cannot process PHI.
  4. Implement access controls on detokenization. Only authorized clinicians and staff should be able to call the detokenize endpoint. Store token maps securely and restrict access by role.
  5. Maintain audit logs. Record who accessed PHI, when, and why. Blindfold provides request audit logs that pair with your application-level logging for a complete compliance trail.
  6. Encrypt data at rest and in transit. Use TLS for all API calls (Blindfold enforces HTTPS). For stored records, consider using Blindfold's encrypt mode so PHI is cryptographically protected in your database.
  7. Plan for breach notification. HIPAA requires notification within 60 days of discovering a breach. By de-identifying data before it reaches third parties, you reduce the scope of what constitutes a reportable breach.
  8. Review and test regularly. Run your de-identification pipeline against sample clinical notes to verify all 18 identifier types are being caught. Use Blindfold's detect endpoint to audit what entities are found in your data.

Remember: HIPAA compliance is not just a technical problem. It requires organizational policies, staff training, and ongoing risk assessments alongside the technical safeguards described here. Blindfold handles the technical de-identification layer so you can focus on building your healthcare AI features with confidence.

Try It Yourself

Clone a complete working example from our cookbook and run it in minutes:

Start protecting sensitive data

Free plan includes 500K characters/month. No credit card required.