HIPAA and AI: Protecting PHI in US Healthcare Apps
How to build AI-powered healthcare applications that protect PHI. Covers the hipaa_us policy, encryption mode, Safe Harbor de-identification, and BAA readiness.
AI is transforming healthcare at a staggering pace. From clinical note summarization and diagnostic support to patient-facing chatbots and drug interaction analysis, developers are building AI features that touch patient data every day. But healthcare data comes with the strictest regulatory framework in the US: HIPAA.
The penalties for getting it wrong are severe. Civil penalties range from $100 to $50,000 per violation, with an annual maximum of $1.5 million per violation category. Criminal penalties can include fines up to $250,000 and imprisonment. And beyond the fines, a single breach can destroy patient trust and end a healthcare startup.
The core problem for AI developers is simple: every time you send patient data to an LLM, you risk exposing Protected Health Information (PHI). Even if the AI provider promises not to log inputs, HIPAA requires you to minimize disclosure and maintain a chain of accountability. This article shows you how to build HIPAA-compliant AI features using Blindfold to strip PHI before it ever reaches a model.
HIPAA Safe Harbor De-identification
HIPAA provides two methods for de-identifying health information: Expert Determination (a qualified statistician certifies the data cannot be used to identify an individual) and Safe Harbor (you remove a defined list of 18 identifiers). For developers building software, Safe Harbor is the practical choice because it gives you a concrete, auditable checklist rather than requiring an expensive expert review for every dataset.
The 18 Safe Harbor identifiers are:
- Names
- Geographic data smaller than a state (street address, city, zip code)
- Dates (except year) related to an individual (birth date, admission date, discharge date, date of death)
- Phone numbers
- Fax numbers
- Email addresses
- Social Security numbers
- Medical record numbers
- Health plan beneficiary numbers
- Account numbers
- Certificate/license numbers
- Vehicle identifiers and serial numbers
- Device identifiers and serial numbers
- Web URLs
- IP addresses
- Biometric identifiers
- Full-face photographs and comparable images
- Any other unique identifying number, characteristic, or code
If you remove all 18 identifier types from a record and have no actual knowledge that the remaining information could identify a patient, the data is considered de-identified under HIPAA. De-identified data is no longer PHI and falls outside HIPAA regulation entirely. This is the key insight for AI developers: de-identified text can be sent to any AI provider without triggering HIPAA obligations.
Using Blindfold for HIPAA Compliance
Blindfold provides a dedicated hipaa_us policy that automatically detects all 18 Safe Harbor identifiers. Instead of building custom regex patterns or maintaining a hand-tuned NER pipeline for each identifier type, you set the policy once and Blindfold handles detection using its GLiNER-based machine learning engine.
The workflow is straightforward: tokenize PHI before sending text to an AI model, get the AI response with tokens instead of real data, then detokenize when you need to display real values to an authorized clinician.
from blindfold import Blindfold from openai import OpenAI client = Blindfold( api_key="your-api-key", region="us" ) # Clinical note containing PHI note = """Patient John Smith (DOB: 03/15/1982, SSN: 123-45-6789) presented with chest pain. Dr. Emily Chen ordered an ECG. Contact: john.smith@email.com, (555) 867-5309. MRN: 4820193. Address: 742 Evergreen Terrace, Springfield, IL 62704.""" # Tokenize using the hipaa_us policy result = client.tokenize( text=note, policy="hipaa_us" ) # All 18 HIPAA identifiers detected and replaced: # "John Smith" -> <Person_1> # "03/15/1982" -> <Date Of Birth_1> # "123-45-6789" -> <Ssn_1> # "Dr. Emily Chen" -> <Person_2> # "john.smith@email.com"-> <Email Address_1> # "(555) 867-5309" -> <Phone Number_1> # "4820193" -> <Medical Record Number_1> # "742 Evergreen..." -> <Address_1> # Now safe to send to any AI model ai = OpenAI() response = ai.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": result.tokenized_text}] ) # Detokenize to restore PHI for the clinician restored = client.detokenize( text=response.choices[0].message.content, token_map=result.token_map )
The AI model only ever sees tokens like <Person_1> and <Ssn_1> instead of real patient data. The token map stays in your infrastructure, and you only detokenize when displaying results to authorized users.
Encryption Mode for PHI
Tokenization replaces PHI with readable placeholders, which is ideal when you want the AI model to understand the structure of the text. But in some cases, you need the PHI to be cryptographically protected rather than replaced with human-readable tokens. For example, when storing de-identified records in a database, transmitting them through an untrusted channel, or archiving clinical data for later processing.
Blindfold's encrypt operation uses AES/Fernet encryption to replace each detected identifier with an encrypted string. Only your encryption key can decrypt the values later.
# Encrypt PHI instead of tokenizing encrypted = client.encrypt( text=note, policy="hipaa_us" ) # PHI is now cryptographically unreadable # "John Smith" -> "gAAAAABl..." # Clinical content (diagnoses, medications) stays readable # Decrypt when authorized personnel need the original data decrypted = client.decrypt( text=encrypted.encrypted_text )
When to use which: Use tokenize when you need the AI model to process and reason about the text structure — the model can see that <Person_1> is a person and work with the context. Use encrypt when you need to store or transmit data and want PHI to be cryptographically unrecoverable without your key.
US Region Processing
For healthcare applications, data residency is not optional in practice. While HIPAA itself does not explicitly mandate US-only processing, most healthcare organizations, compliance officers, and insurance contracts require it. Blindfold offers a dedicated US regional endpoint at us-api.blindfold.dev where all processing happens on US-based infrastructure. Your PHI never leaves US jurisdiction during the detection and tokenization step.
Configuring the SDK for US region is a single parameter:
from blindfold import Blindfold # All requests go to us-api.blindfold.dev client = Blindfold( api_key="your-api-key", region="us" )
This is especially important when signing a Business Associate Agreement (BAA). A BAA typically specifies where data will be processed and stored. Being able to guarantee US-only processing simplifies the agreement and satisfies compliance officers during audits.
BAA Readiness
Under HIPAA, any service that creates, receives, maintains, or transmits PHI on behalf of a covered entity is a Business Associate and must sign a BAA. A Business Associate Agreement typically covers:
- Permitted uses and disclosures of PHI
- Safeguards the business associate will implement
- Breach notification procedures and timelines
- Return or destruction of PHI upon contract termination
- Subcontractor obligations (downstream BAAs)
Blindfold's architecture simplifies BAA obligations significantly. The API operates on a zero-retention model: text is processed in memory, tokens or encrypted values are returned in the response, and no PHI is stored on Blindfold's servers. This means the “return or destruction of PHI” clause is satisfied by default — there is nothing to return or destroy. The attack surface for breach notification is minimal because no patient data persists beyond the API call.
Need a BAA? Blindfold offers BAAs for healthcare customers on paid plans. Contact support@blindfold.dev to start the process. Include your organization name, covered entity status, and expected data volume.
Building a Compliant Healthcare AI Pipeline
Let's walk through a complete end-to-end example. A patient intake system where a clinician enters notes, an AI model analyzes the notes for potential diagnoses and recommended follow-ups, and the results are displayed back to the clinician with real patient information restored.
from blindfold import Blindfold from openai import OpenAI blindfold = Blindfold(api_key="your-api-key", region="us") openai_client = OpenAI() def analyze_intake(clinician_notes: str) -> str: """Full HIPAA-compliant AI analysis pipeline.""" # Step 1: Tokenize all PHI from the clinical notes tokenized = blindfold.tokenize( text=clinician_notes, policy="hipaa_us" ) # Step 2: Send de-identified text to AI for analysis response = openai_client.chat.completions.create( model="gpt-4o", messages=[ { "role": "system", "content": "You are a clinical decision support assistant. " "Analyze the intake notes and suggest potential " "diagnoses and recommended follow-up tests. " "Preserve all patient identifier tokens as-is." }, { "role": "user", "content": tokenized.tokenized_text } ] ) ai_output = response.choices[0].message.content # Step 3: Detokenize to restore real PHI for the clinician restored = blindfold.detokenize( text=ai_output, token_map=tokenized.token_map ) return restored.text # Example: patient intake with multiple PHI types notes = """New patient Maria Garcia, DOB 11/22/1975, SSN 987-65-4321. Chief complaint: persistent cough for 3 weeks, low-grade fever. History of asthma. Current medications: albuterol inhaler. Phone: (555) 234-5678. Insurance ID: BCBS-449281. Referring physician: Dr. Robert Kim, Springfield Medical Group.""" result = analyze_intake(notes) # The clinician sees the AI's assessment with Maria Garcia's # real name, DOB, and all other identifiers restored print(result)
In this pipeline, OpenAI never sees Maria Garcia's name, SSN, date of birth, phone number, or insurance ID. The AI model receives tokens like <Person_1>, <Ssn_1>, and <Phone Number_1> instead. When the AI response comes back, Blindfold swaps the tokens back so the clinician sees the full, personalized output. This pattern works for any AI use case in healthcare: clinical note summarization, diagnostic support, patient communication drafting, medical coding assistance, or clinical trial matching.
HIPAA Compliance Checklist for AI Apps
Use this checklist when building any AI feature that touches patient data:
- De-identify before AI processing. Tokenize or encrypt all PHI before sending text to any LLM or third-party AI service. Use the
hipaa_uspolicy to cover all 18 Safe Harbor identifiers automatically. - Use US region processing. Configure your Blindfold client with
region="us"to ensure PHI never leaves US infrastructure during the detection step. - Sign a BAA with every vendor that touches PHI. This includes your AI provider, your PII protection service, your cloud host, and any analytics tools. If a vendor refuses to sign a BAA, they cannot process PHI.
- Implement access controls on detokenization. Only authorized clinicians and staff should be able to call the detokenize endpoint. Store token maps securely and restrict access by role.
- Maintain audit logs. Record who accessed PHI, when, and why. Blindfold provides request audit logs that pair with your application-level logging for a complete compliance trail.
- Encrypt data at rest and in transit. Use TLS for all API calls (Blindfold enforces HTTPS). For stored records, consider using Blindfold's encrypt mode so PHI is cryptographically protected in your database.
- Plan for breach notification. HIPAA requires notification within 60 days of discovering a breach. By de-identifying data before it reaches third parties, you reduce the scope of what constitutes a reportable breach.
- Review and test regularly. Run your de-identification pipeline against sample clinical notes to verify all 18 identifier types are being caught. Use Blindfold's detect endpoint to audit what entities are found in your data.
Remember: HIPAA compliance is not just a technical problem. It requires organizational policies, staff training, and ongoing risk assessments alongside the technical safeguards described here. Blindfold handles the technical de-identification layer so you can focus on building your healthcare AI features with confidence.
Try It Yourself
Clone a complete working example from our cookbook and run it in minutes:
- HIPAA Healthcare Chatbot — US region,
hipaa_uspolicy, multi-turn chat, batch redaction - All cookbook examples — OpenAI, LangChain, FastAPI, Express, E2B, and more
Start protecting sensitive data
Free plan includes 500K characters/month. No credit card required.