Role-Based Client Data Protection in Financial RAG Pipelines
Implement role-based PII control in financial services RAG systems. Relationship managers, risk analysts, compliance officers, and external auditors each see only the client data their role permits, satisfying PCI DSS, SOX, GDPR, and MiFID II requirements.
Financial institutions are deploying RAG systems at an accelerating pace — for client advisory, risk analysis, compliance monitoring, and internal knowledge bases. The promise is compelling: connect your LLM to client records, transaction histories, and regulatory filings, and let analysts, advisors, and compliance teams get answers in seconds instead of hours.
The problem is that those client records contain some of the most sensitive data in any industry. Account numbers, investment portfolios, credit scores, transaction histories, Social Security numbers, wire transfer details — all of it flows into the RAG pipeline and, without protection, straight to your LLM provider. In financial services, this is not just a privacy concern. It is a regulatory violation across multiple frameworks simultaneously.
Making matters more complex, different roles within a financial institution need fundamentally different views of the same data. A relationship manager needs client names and portfolio summaries to serve their clients, but has no business seeing internal risk scores. A risk analyst needs aggregated positions and risk metrics, but does not need to know which client they belong to. A compliance officer needs the complete picture for investigations. An external auditor needs to review transaction patterns without identifying individual clients.
This article shows you how to implement role-based PII control in financial RAG pipelines using Blindfold. Each role gets a tailored view of the data — the relationship manager sees client context without SSNs, the risk analyst sees anonymized metrics, the compliance officer sees everything, and the external auditor sees fully de-identified records. The same underlying data, four different privacy policies, enforced at the API level.
The Regulatory Landscape
Financial services operate under a uniquely dense web of data protection requirements. Before building any RAG system that touches client data, you need to understand the regulations that govern how that data can be accessed, processed, and shared.
- PCI DSS — The Payment Card Industry Data Security Standard governs how payment card data (card numbers, CVVs, expiration dates) is stored, processed, and transmitted. Any RAG system that ingests transaction records containing card data must comply. Sending unmasked card numbers to an LLM provider is a clear violation.
- SOX (Sarbanes-Oxley) — Requires financial reporting controls and audit trails. When a RAG system is used for financial analysis or reporting, SOX demands that you can demonstrate who accessed what data, when, and what controls were in place.
- GDPR — Applies to any client data belonging to EU residents. Sending personal data to a US-based LLM provider constitutes a cross-border data transfer under Articles 44–49. Data minimization under Article 5(1)(c) requires that only the minimum necessary personal data is processed for each purpose.
- MiFID II — The Markets in Financial Instruments Directive requires that investment advice records are maintained with full audit trails. When a RAG system generates investment advice, the complete chain of data access must be traceable.
- Chinese walls and information barriers — In investment banking, strict separation must exist between departments. An M&A analyst must not see information from the trading desk, and vice versa. These barriers are not optional — violating them can result in insider trading charges.
- Need-to-know principle — A cross-cutting requirement in virtually all financial regulation: every person should have access only to the information they need to perform their specific role. No more, no less.
The common thread across all of these frameworks is role-based access control. The same client record must be visible in different ways to different people, depending on their role and the regulatory context. A RAG system that gives everyone the same view of client data fails to meet any of these requirements.
Role-Based Access with Blindfold Policies
Blindfold's entity-level control lets you define exactly which PII types each role can see. By specifying which entities to redact per role, you create tailored views of the same underlying data. Here is how the four primary financial services roles map to Blindfold policies:
| Role | Sees | Redacted | Regulatory Basis |
|---|---|---|---|
| Relationship Manager | Client names, portfolio summaries, account summaries | SSN, full account numbers, internal risk scores | Client-facing — needs relationship context |
| Risk Analyst | Aggregated positions, risk metrics, transaction amounts | Client names, contact info, account numbers | Analysis does not need client identity |
| Compliance Officer | Full access — names, transactions, flags, contact info | Nothing — needs complete picture | Regulatory investigation authority |
| External Auditor | Transaction patterns, anonymized data | All PII — names, accounts, contact info | Independent review, no client identification |
Each role gets a different Blindfold configuration. The relationship manager's policy redacts only the most sensitive identifiers (SSN, full account numbers) while preserving client names and portfolio context. The risk analyst's policy strips all identifying information so the analysis focuses purely on positions and metrics. The compliance officer bypasses redaction entirely because investigations require the full picture. The external auditor uses the strictest policy, removing all personally identifiable information for an independent, arms-length review.
Implementation
Let us build a complete financial RAG system with role-based PII control. We will start with realistic client records, define role-specific entity configurations, and show how the same query produces different outputs for each role.
Sample Client Data
These records represent the kind of data a financial RAG system would ingest from a CRM, portfolio management system, or compliance database:
# Sample financial client records CLIENT_RECORDS = [ "Client Record #CR-2024-001: Robert Anderson (robert.a@anderson-holdings.com, " "+1-212-555-0891, SSN 234-56-7890) — High-net-worth individual. Portfolio " "value: $4.2M. Holdings: 45% equities, 30% bonds, 25% alternatives. Credit " "score: 780. Account #AC-9928-4471. Annual advisory fee: $42,000.", "Client Record #CR-2024-002: Li Wei (li.wei@globalventures.cn, " "+86-138-0013-8000) — Corporate client. Account #AC-7735-2289. Wire transfer " "of $850,000 on 2024-03-15 flagged for enhanced due diligence. KYC review " "pending. Source of funds: Guangzhou property sale.", "Client Record #CR-2024-003: Sofia Martinez (sofia.m@martinez-family.com, " "+1-305-555-0234, SSN 567-89-0123) — Trust account beneficiary. Trust value: " "$2.8M. Monthly distribution: $15,000. Account #AC-3341-8856. Tax ID: " "EIN 47-1234567.", ]
Role Entity Configurations
Each role maps to a list of entity types that should be redacted before the data reaches the LLM. An empty list means no redaction (full access), and None triggers the strictest built-in policy:
# Entity types to REDACT for each role # The listed entities are what gets REMOVED — everything else is visible ROLE_ENTITIES = { "relationship_manager": [ "social security number", "credit card number", ], "risk_analyst": [ "person", "email address", "phone number", "social security number", "address", ], "compliance_officer": [], # Full access — no redaction "external_auditor": None, # Strictest policy — redact all PII }
Why these configurations? The relationship manager needs client names and portfolio details to maintain the advisory relationship, but SSNs and credit card numbers are never needed for client conversations. The risk analyst works with aggregated data and does not need to know who the clients are. The compliance officer has full investigative authority. The external auditor performs independent reviews and must not be able to identify individual clients.
The Complete Query Function
Here is the full implementation. The query_financial_rag function takes a question and a role, applies the appropriate Blindfold policy, and returns an answer with only the data that role is permitted to see:
import os import chromadb from blindfold import Blindfold from openai import OpenAI # Initialize clients blindfold_client = Blindfold(api_key=os.environ["BLINDFOLD_API_KEY"]) openai_client = OpenAI() # Set up vector store with client records chroma = chromadb.Client() collection = chroma.get_or_create_collection("financial_records") # Ingest client records into the vector store for i, record in enumerate(CLIENT_RECORDS): collection.add(documents=[record], ids=[f"cr-{i}"]) # Role-based entity configurations ROLE_ENTITIES = { "relationship_manager": ["social security number", "credit card number"], "risk_analyst": ["person", "email address", "phone number", "social security number", "address"], "compliance_officer": [], "external_auditor": None, } def query_financial_rag(question: str, role: str) -> str: # Retrieve relevant client records results = collection.query( query_texts=[question], n_results=3 ) context = "\n".join(results["documents"][0]) # Combine context and question prompt = f"Client records:\n{context}\n\nQuestion: {question}" # Apply role-based PII protection entities = ROLE_ENTITIES[role] if role == "compliance_officer": # Full access — skip tokenization entirely safe_prompt = prompt mapping = {} elif entities is None: # External auditor — use strict policy to redact all PII tokenized = blindfold_client.tokenize(prompt, policy="strict") safe_prompt = tokenized.text mapping = tokenized.mapping else: # Role-specific entity redaction tokenized = blindfold_client.tokenize(prompt, entities=entities) safe_prompt = tokenized.text mapping = tokenized.mapping # Send to LLM completion = openai_client.chat.completions.create( model="gpt-4o", messages=[ { "role": "system", "content": "You are a financial services assistant. Answer " "questions using only the provided client records.", }, {"role": "user", "content": safe_prompt}, ], ) ai_response = completion.choices[0].message.content # Detokenize if needed (compliance officer has no mapping) if mapping: restored = blindfold_client.detokenize(ai_response, mapping) return restored.text return ai_response
Running the Query
With the function in place, each role can ask the same question and receive a response tailored to their access level:
# Same question, four different roles question = "What is the status of the flagged wire transfer?" for role in ROLE_ENTITIES: answer = query_financial_rag(question, role) print(f"\n[{role}]\n{answer}")
What Each Role Sees
When each role asks the same question — “What is the status of the flagged wire transfer?” — the RAG system retrieves the same client records but applies different privacy filters before the data reaches the LLM. Here is what each role sees:
Relationship Manager
The relationship manager sees client names and transaction details because they need this context to manage the client relationship. SSNs and credit card numbers are redacted:
Li Wei had a wire transfer of $850,000 on 2024-03-15 that was flagged for enhanced due diligence. The KYC review is currently pending. The source of funds was reported as a Guangzhou property sale. Account #AC-7735-2289 is the associated account. You may want to reach out to the client at li.wei@globalventures.cn to request additional documentation for the KYC review.
Risk Analyst
The risk analyst sees transaction amounts and risk flags but client identity is fully removed. Names, emails, phone numbers, and SSNs are all tokenized:
<Person_1> had a wire transfer of $850,000 flagged for enhanced due diligence. The KYC review is pending. The source of funds was reported as a property sale. The transfer amount of $850,000 exceeds the standard threshold, and the pending KYC status represents an elevated risk position.
Compliance Officer
The compliance officer sees everything — full names, contact information, account numbers, and all transaction details. Nothing is redacted because regulatory investigations require the complete picture:
Li Wei (li.wei@globalventures.cn, +86-138-0013-8000) has a wire transfer of $850,000 on 2024-03-15 flagged for enhanced due diligence on account #AC-7735-2289. The KYC review is pending. The source of funds is listed as a Guangzhou property sale. This transaction requires immediate attention under AML screening procedures.
External Auditor
The external auditor sees fully anonymized data. All PII — names, organizations, contact information, account numbers — is replaced with tokens. The auditor can evaluate transaction patterns and compliance procedures without identifying any client:
<Person_1> at <Organization_1> had a transfer of <Currency_1> flagged for enhanced due diligence. The KYC review is pending. The source of funds was a property sale. The flagging procedure appears to have been triggered correctly based on the transfer amount threshold.
Key insight: The same RAG pipeline, the same client records, the same question — but four entirely different responses tailored to each role's regulatory requirements. The Blindfold tokenization layer acts as a programmable privacy filter between your data and the LLM.
Chinese Walls and Information Barriers
In investment banking, information barriers (commonly called “Chinese walls”) are legally mandated separations between departments. An analyst in the M&A division who has access to non-public information about a pending acquisition must not see data from the trading desk — and vice versa. Violating these barriers can constitute insider trading and carries severe criminal penalties.
When a RAG system spans multiple departments, Blindfold policies can enforce these barriers at the data level. The M&A analyst's policy redacts trading-specific entity types (position sizes, trading account identifiers, order details) while preserving deal-relevant information (company names, deal structures, timelines). The trader's policy does the reverse — redacting non-public deal information while preserving market data.
# Information barrier policies for investment banking BARRIER_POLICIES = { "ma_analyst": { # M&A analyst: redact trading desk data "entities": [ "trading account", "order id", "position size", ], }, "trader": { # Trader: redact non-public deal information "entities": [ "deal name", "acquisition target", "deal value", "merger party", ], }, } def query_with_barrier(question: str, department: str) -> str: # Retrieve documents across the shared knowledge base results = collection.query( query_texts=[question], n_results=5 ) context = "\n".join(results["documents"][0]) prompt = f"Context:\n{context}\n\nQuestion: {question}" # Apply information barrier policy = BARRIER_POLICIES[department] tokenized = blindfold_client.tokenize( prompt, entities=policy["entities"] ) # The LLM never sees cross-barrier data completion = openai_client.chat.completions.create( model="gpt-4o", messages=[ {"role": "user", "content": tokenized.text}, ], ) return completion.choices[0].message.content
This approach enforces information barriers at the data level rather than the infrastructure level. Traditional approaches require separate databases, separate networks, and separate RAG pipelines for each side of the wall. With Blindfold, a single RAG pipeline serves both departments — the privacy layer ensures each department sees only the data it is permitted to access.
PCI DSS Compliance
When your RAG system handles payment card data — credit card numbers, CVVs, expiration dates, cardholder names in payment contexts — PCI DSS compliance is mandatory. The standard requires that cardholder data is never stored, processed, or transmitted in plain text outside of a PCI-compliant environment. Sending unmasked card numbers to an LLM provider is a direct violation.
Blindfold's built-in pci_dss policy automatically detects and redacts credit card numbers, CVVs, expiration dates, and other payment data. Use this policy at minimum for any RAG pipeline that might encounter payment information:
# PCI DSS policy catches all payment card data automatically transaction_record = ( "Payment received from Robert Anderson. Card ending 4471, " "full number 4532-0151-2847-4471, CVV 892, exp 09/2027. " "Amount: $42,000. Reference: TXN-2024-88291." ) # Apply PCI DSS policy tokenized = blindfold_client.tokenize( transaction_record, policy="pci_dss", ) print(tokenized.text) # "Payment received from Robert Anderson. Card ending <Credit_Card_1>, # full number <Credit_Card_1>, CVV <CVV_1>, exp <Expiry_Date_1>. # Amount: $42,000. Reference: TXN-2024-88291."
Important: The pci_dss policy is a minimum baseline. In practice, combine it with role-based entity configurations. A compliance officer using the PCI DSS policy still gets card data redacted — PCI DSS requirements override role-based access when payment data is involved.
For RAG pipelines that ingest both payment data and general client records, layer the PCI DSS policy on top of your role-based configuration. This ensures that even roles with broad access (like the compliance officer) do not inadvertently send raw card numbers to the LLM:
def query_with_pci_compliance(question: str, role: str) -> str: results = collection.query( query_texts=[question], n_results=3 ) context = "\n".join(results["documents"][0]) prompt = f"Client records:\n{context}\n\nQuestion: {question}" # Step 1: Always apply PCI DSS policy first pci_result = blindfold_client.tokenize(prompt, policy="pci_dss") # Step 2: Then apply role-based policy on top entities = ROLE_ENTITIES[role] if entities: role_result = blindfold_client.tokenize( pci_result.text, entities=entities ) safe_prompt = role_result.text mapping = {**pci_result.mapping, **role_result.mapping} else: # Compliance officer: still gets PCI protection safe_prompt = pci_result.text mapping = pci_result.mapping completion = openai_client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": safe_prompt}], ) ai_response = completion.choices[0].message.content if mapping: return blindfold_client.detokenize(ai_response, mapping).text return ai_response
Audit Trail and Regulatory Reporting
Every Blindfold API call is logged with metadata that includes the timestamp, the policy applied, the entity types detected, and the session identifier. This creates a complete audit trail of who queried what data, which privacy policy was applied, and what information was protected — exactly what regulators ask for during audits.
SOX Compliance
SOX requires that financial reporting controls are documented and auditable. When a RAG system is used for financial analysis or reporting, the audit trail demonstrates:
- Which role accessed the system and when
- Which client records were retrieved for each query
- Which privacy policy was applied (and therefore which data was visible vs. redacted)
- The complete input/output chain for every LLM interaction
MiFID II Record-Keeping
MiFID II requires that firms keep records of all communications and decisions related to investment advice. When a relationship manager uses the RAG system to prepare for a client meeting, the Blindfold audit trail records the exact data that was accessed and the privacy controls that were in place. This satisfies the record-keeping requirements without storing raw client PII in your application logs.
# Every Blindfold call generates audit metadata tokenized = blindfold_client.tokenize( prompt, entities=["social security number", "credit card number"], ) # Log the audit trail (no PII in logs) audit_entry = { "timestamp": "2024-03-15T14:32:01Z", "role": "relationship_manager", "query": "What is the status of the flagged wire transfer?", "policy_applied": "custom_entities", "entities_redacted": ["social security number", "credit card number"], "entities_detected": 4, "tokens_generated": 2, "records_retrieved": 3, } # This audit entry contains no PII — safe to store # in your SIEM, compliance database, or log aggregator log_audit_event(audit_entry)
The critical property of this audit trail is that it contains no PII itself. You can store these entries in your SIEM system, compliance database, or log aggregator without creating an additional data protection liability. Regulators can review the complete access history without being exposed to client data in the process.
GDPR and EU Client Data
When your financial RAG system processes data belonging to EU residents, GDPR adds an additional layer of requirements on top of financial regulations. The data minimization principle under Article 5(1)(c) aligns naturally with role-based access control — each role should see only the minimum data necessary for their function.
For EU client data, configure Blindfold with the region="eu" parameter to ensure all PII processing stays within the EU. This prevents cross-border data transfers during the tokenization step itself:
# EU-compliant configuration for financial services blindfold_eu = Blindfold( api_key=os.environ["BLINDFOLD_API_KEY"], region="eu", # All PII processing stays within the EU ) # Use the GDPR policy for EU-specific entity detection tokenized = blindfold_eu.tokenize( prompt, policy="gdpr_eu", entities=["social security number", "credit card number"], ) # Detects EU-specific entities: IBAN, national IDs, EU tax IDs # In addition to the standard PII categories
The gdpr_eu policy detects EU-specific entity types such as IBAN codes, national identity numbers, and EU tax identifiers, in addition to standard PII categories. Combined with role-based entity configurations, this gives you GDPR-compliant access control that also satisfies financial regulatory requirements.
Production Architecture
In a production financial RAG deployment, the role-based privacy layer sits between your retrieval system and your LLM. Here is how the components fit together:
- Authentication layer — Verifies the user's identity and determines their role from your IAM system (Active Directory, Okta, internal RBAC).
- RAG retrieval — Searches the vector database using the original question. Retrieval happens before any privacy filtering so search quality is maximized.
- Blindfold privacy layer — Applies the role-specific tokenization policy to the combined context and question. This is the single point of enforcement for all access control decisions.
- LLM inference — Receives only the tokenized prompt. The LLM provider never sees raw client data regardless of the user's role.
- Detokenization — Restores real values in the response using the token mapping. Only the end user's browser sees the restored data.
- Audit logging — Records the role, policy, entity counts, and timestamps for every query. No PII in the logs.
Single point of enforcement: The Blindfold privacy layer is the only place where access control decisions are made. This avoids the fragility of scattered access checks throughout your application code. If a new role is added, you define one new entity configuration — the rest of the pipeline is unchanged.
Handling Multi-Turn Conversations
Financial advisory conversations are often multi-turn. A relationship manager might ask “What is Robert Anderson's portfolio allocation?” followed by “How does that compare to our recommended allocation for his risk profile?” The role-based policy must be applied consistently across all turns.
The key pattern is mapping accumulation — each tokenization call produces a new mapping, and you merge them together so tokens from earlier turns can still be resolved in later responses:
class FinancialAdvisorSession: def __init__(self, role: str): self.role = role self.accumulated_mapping = {} self.conversation_history = [] def ask(self, question: str) -> str: # Retrieve relevant records results = collection.query( query_texts=[question], n_results=3 ) context = "\n".join(results["documents"][0]) prompt = f"Client records:\n{context}\n\nQuestion: {question}" # Apply role-based tokenization entities = ROLE_ENTITIES[self.role] if self.role == "compliance_officer": safe_prompt = prompt elif entities is None: tokenized = blindfold_client.tokenize(prompt, policy="strict") safe_prompt = tokenized.text self.accumulated_mapping.update(tokenized.mapping) else: tokenized = blindfold_client.tokenize(prompt, entities=entities) safe_prompt = tokenized.text self.accumulated_mapping.update(tokenized.mapping) # Build messages with conversation history messages = [ { "role": "system", "content": "You are a financial services assistant.", }, *self.conversation_history, {"role": "user", "content": safe_prompt}, ] completion = openai_client.chat.completions.create( model="gpt-4o", messages=messages ) ai_response = completion.choices[0].message.content # Store tokenized history (no PII in session state) self.conversation_history.append( {"role": "user", "content": safe_prompt} ) self.conversation_history.append( {"role": "assistant", "content": ai_response} ) # Detokenize with accumulated mapping if self.accumulated_mapping: return blindfold_client.detokenize( ai_response, self.accumulated_mapping ).text return ai_response # Usage session = FinancialAdvisorSession(role="relationship_manager") print(session.ask("What is Robert Anderson's portfolio allocation?")) print(session.ask("How does that compare to our recommended allocation?"))
Notice that the conversation history stores only tokenized text. This means your session storage, Redis cache, or database contains no raw client PII — reducing your data protection surface area significantly.
Compliance Summary
Here is how the role-based approach maps to each regulatory framework:
| Regulation | Requirement | How Blindfold Addresses It |
|---|---|---|
| PCI DSS | Never transmit card data in plain text | policy="pci_dss" automatically detects and redacts all payment card data |
| SOX | Audit trail for financial reporting controls | Every API call is logged with role, policy, and entity metadata |
| GDPR | Data minimization, EU data residency | Role-based entities enforce minimization; region="eu" keeps processing in the EU |
| MiFID II | Record-keeping for investment advice | Complete audit trail of data access and privacy controls applied |
| Information Barriers | Strict separation between departments | Department-specific entity configurations enforce barriers at the data level |
Try It Yourself
Ready to implement role-based PII control in your financial RAG system? Here are the resources to get started:
- RBAC Cookbook Example (Python) — Complete working example with four financial roles, sample client data, and output comparison
- RBAC Cookbook Example (Node.js) — Same example in TypeScript with Express middleware integration
- PCI DSS Policy Documentation — Full reference for the built-in PCI DSS, GDPR, HIPAA, and SOX compliance policies
- GDPR-Compliant AI — Deep dive into GDPR compliance for AI applications, including EU data residency and audit trails
The entire setup takes about twenty minutes. Define your role-entity mappings, wire up the query function with the appropriate Blindfold policy for each role, and every query through your financial RAG pipeline is automatically tailored to the user's access level. Your compliance team gets a complete audit trail, your LLM provider never sees raw client data, and your information barriers are enforced at the data level rather than the infrastructure level.
Start protecting sensitive data
Free plan includes 500K characters/month. No credit card required.