FinanceFebruary 26, 202612 min read

Role-Based Client Data Protection in Financial RAG Pipelines

Implement role-based PII control in financial services RAG systems. Relationship managers, risk analysts, compliance officers, and external auditors each see only the client data their role permits, satisfying PCI DSS, SOX, GDPR, and MiFID II requirements.

Financial institutions are deploying RAG systems at an accelerating pace — for client advisory, risk analysis, compliance monitoring, and internal knowledge bases. The promise is compelling: connect your LLM to client records, transaction histories, and regulatory filings, and let analysts, advisors, and compliance teams get answers in seconds instead of hours.

The problem is that those client records contain some of the most sensitive data in any industry. Account numbers, investment portfolios, credit scores, transaction histories, Social Security numbers, wire transfer details — all of it flows into the RAG pipeline and, without protection, straight to your LLM provider. In financial services, this is not just a privacy concern. It is a regulatory violation across multiple frameworks simultaneously.

Making matters more complex, different roles within a financial institution need fundamentally different views of the same data. A relationship manager needs client names and portfolio summaries to serve their clients, but has no business seeing internal risk scores. A risk analyst needs aggregated positions and risk metrics, but does not need to know which client they belong to. A compliance officer needs the complete picture for investigations. An external auditor needs to review transaction patterns without identifying individual clients.

This article shows you how to implement role-based PII control in financial RAG pipelines using Blindfold. Each role gets a tailored view of the data — the relationship manager sees client context without SSNs, the risk analyst sees anonymized metrics, the compliance officer sees everything, and the external auditor sees fully de-identified records. The same underlying data, four different privacy policies, enforced at the API level.

The Regulatory Landscape

Financial services operate under a uniquely dense web of data protection requirements. Before building any RAG system that touches client data, you need to understand the regulations that govern how that data can be accessed, processed, and shared.

PCI DSS — The Payment Card Industry Data Security Standard governs how payment card data (card numbers, CVVs, expiration dates) is stored, processed, and transmitted. Any RAG system that ingests transaction records containing card data must comply. Sending unmasked card numbers to an LLM provider is a clear violation.
SOX (Sarbanes-Oxley) — Requires financial reporting controls and audit trails. When a RAG system is used for financial analysis or reporting, SOX demands that you can demonstrate who accessed what data, when, and what controls were in place.
GDPR — Applies to any client data belonging to EU residents. Sending personal data to a US-based LLM provider constitutes a cross-border data transfer under Articles 44–49. Data minimization under Article 5(1)(c) requires that only the minimum necessary personal data is processed for each purpose.
MiFID II — The Markets in Financial Instruments Directive requires that investment advice records are maintained with full audit trails. When a RAG system generates investment advice, the complete chain of data access must be traceable.
Chinese walls and information barriers — In investment banking, strict separation must exist between departments. An M&A analyst must not see information from the trading desk, and vice versa. These barriers are not optional — violating them can result in insider trading charges.
Need-to-know principle — A cross-cutting requirement in virtually all financial regulation: every person should have access only to the information they need to perform their specific role. No more, no less.

The common thread across all of these frameworks is role-based access control. The same client record must be visible in different ways to different people, depending on their role and the regulatory context. A RAG system that gives everyone the same view of client data fails to meet any of these requirements.

Role-Based Access with Blindfold Policies

Blindfold's entity-level control lets you define exactly which PII types each role can see. By specifying which entities to redact per role, you create tailored views of the same underlying data. Here is how the four primary financial services roles map to Blindfold policies:

Role	Sees	Redacted	Regulatory Basis
Relationship Manager	Client names, portfolio summaries, account summaries	SSN, full account numbers, internal risk scores	Client-facing — needs relationship context
Risk Analyst	Aggregated positions, risk metrics, transaction amounts	Client names, contact info, account numbers	Analysis does not need client identity
Compliance Officer	Full access — names, transactions, flags, contact info	Nothing — needs complete picture	Regulatory investigation authority
External Auditor	Transaction patterns, anonymized data	All PII — names, accounts, contact info	Independent review, no client identification

Each role gets a different Blindfold configuration. The relationship manager's policy redacts only the most sensitive identifiers (SSN, full account numbers) while preserving client names and portfolio context. The risk analyst's policy strips all identifying information so the analysis focuses purely on positions and metrics. The compliance officer bypasses redaction entirely because investigations require the full picture. The external auditor uses the strictest policy, removing all personally identifiable information for an independent, arms-length review.

Implementation

Let us build a complete financial RAG system with role-based PII control. We will start with realistic client records, define role-specific entity configurations, and show how the same query produces different outputs for each role.

Sample Client Data

These records represent the kind of data a financial RAG system would ingest from a CRM, portfolio management system, or compliance database:

python

# Sample financial client records
CLIENT_RECORDS = [
    "Client Record #CR-2024-001: Robert Anderson (robert.a@anderson-holdings.com, "
    "+1-212-555-0891, SSN 234-56-7890) — High-net-worth individual. Portfolio "
    "value: $4.2M. Holdings: 45% equities, 30% bonds, 25% alternatives. Credit "
    "score: 780. Account #AC-9928-4471. Annual advisory fee: $42,000.",

    "Client Record #CR-2024-002: Li Wei (li.wei@globalventures.cn, "
    "+86-138-0013-8000) — Corporate client. Account #AC-7735-2289. Wire transfer "
    "of $850,000 on 2024-03-15 flagged for enhanced due diligence. KYC review "
    "pending. Source of funds: Guangzhou property sale.",

    "Client Record #CR-2024-003: Sofia Martinez (sofia.m@martinez-family.com, "
    "+1-305-555-0234, SSN 567-89-0123) — Trust account beneficiary. Trust value: "
    "$2.8M. Monthly distribution: $15,000. Account #AC-3341-8856. Tax ID: "
    "EIN 47-1234567.",
]

Role Entity Configurations

Each role maps to a list of entity types that should be redacted before the data reaches the LLM. An empty list means no redaction (full access), and None triggers the strictest built-in policy:

python

# Entity types to REDACT for each role
# The listed entities are what gets REMOVED — everything else is visible
ROLE_ENTITIES = {
    "relationship_manager": [
        "social security number",
        "credit card number",
    ],
    "risk_analyst": [
        "person",
        "email address",
        "phone number",
        "social security number",
        "address",
    ],
    "compliance_officer": [],    # Full access — no redaction
    "external_auditor": None,     # Strictest policy — redact all PII
}

Why these configurations? The relationship manager needs client names and portfolio details to maintain the advisory relationship, but SSNs and credit card numbers are never needed for client conversations. The risk analyst works with aggregated data and does not need to know who the clients are. The compliance officer has full investigative authority. The external auditor performs independent reviews and must not be able to identify individual clients.

The Complete Query Function

Here is the full implementation. The query_financial_rag function takes a question and a role, applies the appropriate Blindfold policy, and returns an answer with only the data that role is permitted to see:

python

import os
import chromadb
from blindfold import Blindfold
from openai import OpenAI

# Initialize clients
blindfold_client = Blindfold(api_key=os.environ["BLINDFOLD_API_KEY"])
openai_client = OpenAI()

# Set up vector store with client records
chroma = chromadb.Client()
collection = chroma.get_or_create_collection("financial_records")

# Ingest client records into the vector store
for i, record in enumerate(CLIENT_RECORDS):
    collection.add(documents=[record], ids=[f"cr-{i}"])

# Role-based entity configurations
ROLE_ENTITIES = {
    "relationship_manager": ["social security number", "credit card number"],
    "risk_analyst": ["person", "email address", "phone number", "social security number", "address"],
    "compliance_officer": [],
    "external_auditor": None,
}

def query_financial_rag(question: str, role: str) -> str:
    # Retrieve relevant client records
    results = collection.query(
        query_texts=[question], n_results=3
    )
    context = "\n".join(results["documents"][0])

    # Combine context and question
    prompt = f"Client records:\n{context}\n\nQuestion: {question}"

    # Apply role-based PII protection
    entities = ROLE_ENTITIES[role]

    if role == "compliance_officer":
        # Full access — skip tokenization entirely
        safe_prompt = prompt
        mapping = {}
    elif entities is None:
        # External auditor — use strict policy to redact all PII
        tokenized = blindfold_client.tokenize(prompt, policy="strict")
        safe_prompt = tokenized.text
        mapping = tokenized.mapping
    else:
        # Role-specific entity redaction
        tokenized = blindfold_client.tokenize(prompt, entities=entities)
        safe_prompt = tokenized.text
        mapping = tokenized.mapping

    # Send to LLM
    completion = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": "You are a financial services assistant. Answer "
                           "questions using only the provided client records.",
            },
            {"role": "user", "content": safe_prompt},
        ],
    )
    ai_response = completion.choices[0].message.content

    # Detokenize if needed (compliance officer has no mapping)
    if mapping:
        restored = blindfold_client.detokenize(ai_response, mapping)
        return restored.text
    return ai_response

Running the Query

With the function in place, each role can ask the same question and receive a response tailored to their access level:

python

# Same question, four different roles
question = "What is the status of the flagged wire transfer?"

for role in ROLE_ENTITIES:
    answer = query_financial_rag(question, role)
    print(f"\n[{role}]\n{answer}")

What Each Role Sees

When each role asks the same question — “What is the status of the flagged wire transfer?” — the RAG system retrieves the same client records but applies different privacy filters before the data reaches the LLM. Here is what each role sees:

Relationship Manager

The relationship manager sees client names and transaction details because they need this context to manage the client relationship. SSNs and credit card numbers are redacted:

Li Wei had a wire transfer of $850,000 on 2024-03-15 that was flagged for enhanced due diligence. The KYC review is currently pending. The source of funds was reported as a Guangzhou property sale. Account #AC-7735-2289 is the associated account. You may want to reach out to the client at li.wei@globalventures.cn to request additional documentation for the KYC review.

Risk Analyst

The risk analyst sees transaction amounts and risk flags but client identity is fully removed. Names, emails, phone numbers, and SSNs are all tokenized:

<Person_1> had a wire transfer of $850,000 flagged for enhanced due diligence. The KYC review is pending. The source of funds was reported as a property sale. The transfer amount of $850,000 exceeds the standard threshold, and the pending KYC status represents an elevated risk position.

Compliance Officer

The compliance officer sees everything — full names, contact information, account numbers, and all transaction details. Nothing is redacted because regulatory investigations require the complete picture:

Li Wei (li.wei@globalventures.cn, +86-138-0013-8000) has a wire transfer of $850,000 on 2024-03-15 flagged for enhanced due diligence on account #AC-7735-2289. The KYC review is pending. The source of funds is listed as a Guangzhou property sale. This transaction requires immediate attention under AML screening procedures.

External Auditor

The external auditor sees fully anonymized data. All PII — names, organizations, contact information, account numbers — is replaced with tokens. The auditor can evaluate transaction patterns and compliance procedures without identifying any client:

<Person_1> at <Organization_1> had a transfer of <Currency_1> flagged for enhanced due diligence. The KYC review is pending. The source of funds was a property sale. The flagging procedure appears to have been triggered correctly based on the transfer amount threshold.

Key insight: The same RAG pipeline, the same client records, the same question — but four entirely different responses tailored to each role's regulatory requirements. The Blindfold tokenization layer acts as a programmable privacy filter between your data and the LLM.

Chinese Walls and Information Barriers

In investment banking, information barriers (commonly called “Chinese walls”) are legally mandated separations between departments. An analyst in the M&A division who has access to non-public information about a pending acquisition must not see data from the trading desk — and vice versa. Violating these barriers can constitute insider trading and carries severe criminal penalties.

When a RAG system spans multiple departments, Blindfold policies can enforce these barriers at the data level. The M&A analyst's policy redacts trading-specific entity types (position sizes, trading account identifiers, order details) while preserving deal-relevant information (company names, deal structures, timelines). The trader's policy does the reverse — redacting non-public deal information while preserving market data.

python

# Information barrier policies for investment banking
BARRIER_POLICIES = {
    "ma_analyst": {
        # M&A analyst: redact trading desk data
        "entities": [
            "trading account",
            "order id",
            "position size",
        ],
    },
    "trader": {
        # Trader: redact non-public deal information
        "entities": [
            "deal name",
            "acquisition target",
            "deal value",
            "merger party",
        ],
    },
}

def query_with_barrier(question: str, department: str) -> str:
    # Retrieve documents across the shared knowledge base
    results = collection.query(
        query_texts=[question], n_results=5
    )
    context = "\n".join(results["documents"][0])
    prompt = f"Context:\n{context}\n\nQuestion: {question}"

    # Apply information barrier
    policy = BARRIER_POLICIES[department]
    tokenized = blindfold_client.tokenize(
        prompt, entities=policy["entities"]
    )

    # The LLM never sees cross-barrier data
    completion = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": tokenized.text},
        ],
    )
    return completion.choices[0].message.content

This approach enforces information barriers at the data level rather than the infrastructure level. Traditional approaches require separate databases, separate networks, and separate RAG pipelines for each side of the wall. With Blindfold, a single RAG pipeline serves both departments — the privacy layer ensures each department sees only the data it is permitted to access.

PCI DSS Compliance

When your RAG system handles payment card data — credit card numbers, CVVs, expiration dates, cardholder names in payment contexts — PCI DSS compliance is mandatory. The standard requires that cardholder data is never stored, processed, or transmitted in plain text outside of a PCI-compliant environment. Sending unmasked card numbers to an LLM provider is a direct violation.

Blindfold's built-in pci_dss policy automatically detects and redacts credit card numbers, CVVs, expiration dates, and other payment data. Use this policy at minimum for any RAG pipeline that might encounter payment information:

python

# PCI DSS policy catches all payment card data automatically
transaction_record = (
    "Payment received from Robert Anderson. Card ending 4471, "
    "full number 4532-0151-2847-4471, CVV 892, exp 09/2027. "
    "Amount: $42,000. Reference: TXN-2024-88291."
)

# Apply PCI DSS policy
tokenized = blindfold_client.tokenize(
    transaction_record,
    policy="pci_dss",
)

print(tokenized.text)
# "Payment received from Robert Anderson. Card ending <Credit_Card_1>,
#  full number <Credit_Card_1>, CVV <CVV_1>, exp <Expiry_Date_1>.
#  Amount: $42,000. Reference: TXN-2024-88291."

Important: The pci_dss policy is a minimum baseline. In practice, combine it with role-based entity configurations. A compliance officer using the PCI DSS policy still gets card data redacted — PCI DSS requirements override role-based access when payment data is involved.

For RAG pipelines that ingest both payment data and general client records, layer the PCI DSS policy on top of your role-based configuration. This ensures that even roles with broad access (like the compliance officer) do not inadvertently send raw card numbers to the LLM:

python

def query_with_pci_compliance(question: str, role: str) -> str:
    results = collection.query(
        query_texts=[question], n_results=3
    )
    context = "\n".join(results["documents"][0])
    prompt = f"Client records:\n{context}\n\nQuestion: {question}"

    # Step 1: Always apply PCI DSS policy first
    pci_result = blindfold_client.tokenize(prompt, policy="pci_dss")

    # Step 2: Then apply role-based policy on top
    entities = ROLE_ENTITIES[role]
    if entities:
        role_result = blindfold_client.tokenize(
            pci_result.text, entities=entities
        )
        safe_prompt = role_result.text
        mapping = {**pci_result.mapping, **role_result.mapping}
    else:
        # Compliance officer: still gets PCI protection
        safe_prompt = pci_result.text
        mapping = pci_result.mapping

    completion = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": safe_prompt}],
    )
    ai_response = completion.choices[0].message.content

    if mapping:
        return blindfold_client.detokenize(ai_response, mapping).text
    return ai_response

Audit Trail and Regulatory Reporting

Every Blindfold API call is logged with metadata that includes the timestamp, the policy applied, the entity types detected, and the session identifier. This creates a complete audit trail of who queried what data, which privacy policy was applied, and what information was protected — exactly what regulators ask for during audits.

SOX Compliance

SOX requires that financial reporting controls are documented and auditable. When a RAG system is used for financial analysis or reporting, the audit trail demonstrates:

Which role accessed the system and when
Which client records were retrieved for each query
Which privacy policy was applied (and therefore which data was visible vs. redacted)
The complete input/output chain for every LLM interaction

MiFID II Record-Keeping

MiFID II requires that firms keep records of all communications and decisions related to investment advice. When a relationship manager uses the RAG system to prepare for a client meeting, the Blindfold audit trail records the exact data that was accessed and the privacy controls that were in place. This satisfies the record-keeping requirements without storing raw client PII in your application logs.

python

# Every Blindfold call generates audit metadata
tokenized = blindfold_client.tokenize(
    prompt,
    entities=["social security number", "credit card number"],
)

# Log the audit trail (no PII in logs)
audit_entry = {
    "timestamp": "2024-03-15T14:32:01Z",
    "role": "relationship_manager",
    "query": "What is the status of the flagged wire transfer?",
    "policy_applied": "custom_entities",
    "entities_redacted": ["social security number", "credit card number"],
    "entities_detected": 4,
    "tokens_generated": 2,
    "records_retrieved": 3,
}

# This audit entry contains no PII — safe to store
# in your SIEM, compliance database, or log aggregator
log_audit_event(audit_entry)

The critical property of this audit trail is that it contains no PII itself. You can store these entries in your SIEM system, compliance database, or log aggregator without creating an additional data protection liability. Regulators can review the complete access history without being exposed to client data in the process.

GDPR and EU Client Data

When your financial RAG system processes data belonging to EU residents, GDPR adds an additional layer of requirements on top of financial regulations. The data minimization principle under Article 5(1)(c) aligns naturally with role-based access control — each role should see only the minimum data necessary for their function.

For EU client data, configure Blindfold with the region="eu" parameter to ensure all PII processing stays within the EU. This prevents cross-border data transfers during the tokenization step itself:

python

# EU-compliant configuration for financial services
blindfold_eu = Blindfold(
    api_key=os.environ["BLINDFOLD_API_KEY"],
    region="eu",  # All PII processing stays within the EU
)

# Use the GDPR policy for EU-specific entity detection
tokenized = blindfold_eu.tokenize(
    prompt,
    policy="gdpr_eu",
    entities=["social security number", "credit card number"],
)

# Detects EU-specific entities: IBAN, national IDs, EU tax IDs
# In addition to the standard PII categories

The gdpr_eu policy detects EU-specific entity types such as IBAN codes, national identity numbers, and EU tax identifiers, in addition to standard PII categories. Combined with role-based entity configurations, this gives you GDPR-compliant access control that also satisfies financial regulatory requirements.

Production Architecture

In a production financial RAG deployment, the role-based privacy layer sits between your retrieval system and your LLM. Here is how the components fit together:

Authentication layer — Verifies the user's identity and determines their role from your IAM system (Active Directory, Okta, internal RBAC).
RAG retrieval — Searches the vector database using the original question. Retrieval happens before any privacy filtering so search quality is maximized.
Blindfold privacy layer — Applies the role-specific tokenization policy to the combined context and question. This is the single point of enforcement for all access control decisions.
LLM inference — Receives only the tokenized prompt. The LLM provider never sees raw client data regardless of the user's role.
Detokenization — Restores real values in the response using the token mapping. Only the end user's browser sees the restored data.
Audit logging — Records the role, policy, entity counts, and timestamps for every query. No PII in the logs.

Single point of enforcement: The Blindfold privacy layer is the only place where access control decisions are made. This avoids the fragility of scattered access checks throughout your application code. If a new role is added, you define one new entity configuration — the rest of the pipeline is unchanged.

Handling Multi-Turn Conversations

Financial advisory conversations are often multi-turn. A relationship manager might ask “What is Robert Anderson's portfolio allocation?” followed by “How does that compare to our recommended allocation for his risk profile?” The role-based policy must be applied consistently across all turns.

The key pattern is mapping accumulation — each tokenization call produces a new mapping, and you merge them together so tokens from earlier turns can still be resolved in later responses:

python

class FinancialAdvisorSession:
    def __init__(self, role: str):
        self.role = role
        self.accumulated_mapping = {}
        self.conversation_history = []

    def ask(self, question: str) -> str:
        # Retrieve relevant records
        results = collection.query(
            query_texts=[question], n_results=3
        )
        context = "\n".join(results["documents"][0])
        prompt = f"Client records:\n{context}\n\nQuestion: {question}"

        # Apply role-based tokenization
        entities = ROLE_ENTITIES[self.role]
        if self.role == "compliance_officer":
            safe_prompt = prompt
        elif entities is None:
            tokenized = blindfold_client.tokenize(prompt, policy="strict")
            safe_prompt = tokenized.text
            self.accumulated_mapping.update(tokenized.mapping)
        else:
            tokenized = blindfold_client.tokenize(prompt, entities=entities)
            safe_prompt = tokenized.text
            self.accumulated_mapping.update(tokenized.mapping)

        # Build messages with conversation history
        messages = [
            {
                "role": "system",
                "content": "You are a financial services assistant.",
            },
            *self.conversation_history,
            {"role": "user", "content": safe_prompt},
        ]

        completion = openai_client.chat.completions.create(
            model="gpt-4o", messages=messages
        )
        ai_response = completion.choices[0].message.content

        # Store tokenized history (no PII in session state)
        self.conversation_history.append(
            {"role": "user", "content": safe_prompt}
        )
        self.conversation_history.append(
            {"role": "assistant", "content": ai_response}
        )

        # Detokenize with accumulated mapping
        if self.accumulated_mapping:
            return blindfold_client.detokenize(
                ai_response, self.accumulated_mapping
            ).text
        return ai_response

# Usage
session = FinancialAdvisorSession(role="relationship_manager")
print(session.ask("What is Robert Anderson's portfolio allocation?"))
print(session.ask("How does that compare to our recommended allocation?"))

Notice that the conversation history stores only tokenized text. This means your session storage, Redis cache, or database contains no raw client PII — reducing your data protection surface area significantly.

Compliance Summary

Here is how the role-based approach maps to each regulatory framework:

Regulation	Requirement	How Blindfold Addresses It
PCI DSS	Never transmit card data in plain text	`policy="pci_dss"` automatically detects and redacts all payment card data
SOX	Audit trail for financial reporting controls	Every API call is logged with role, policy, and entity metadata
GDPR	Data minimization, EU data residency	Role-based entities enforce minimization; `region="eu"` keeps processing in the EU
MiFID II	Record-keeping for investment advice	Complete audit trail of data access and privacy controls applied
Information Barriers	Strict separation between departments	Department-specific entity configurations enforce barriers at the data level

Try It Yourself

Ready to implement role-based PII control in your financial RAG system? Here are the resources to get started:

RBAC Cookbook Example (Python) — Complete working example with four financial roles, sample client data, and output comparison
RBAC Cookbook Example (Node.js) — Same example in TypeScript with Express middleware integration
PCI DSS Policy Documentation — Full reference for the built-in PCI DSS, GDPR, HIPAA, and SOX compliance policies
GDPR-Compliant AI — Deep dive into GDPR compliance for AI applications, including EU data residency and audit trails

The entire setup takes about twenty minutes. Define your role-entity mappings, wire up the query function with the appropriate Blindfold policy for each role, and every query through your financial RAG pipeline is automatically tailored to the user's access level. Your compliance team gets a complete audit trail, your LLM provider never sees raw client data, and your information barriers are enforced at the data level rather than the infrastructure level.

Start protecting sensitive data

Free plan includes 500K characters/month. No credit card required.