← Back to blog
PrivacyFebruary 26, 202612 min read

Role-Based Candidate Privacy in HR and Recruiting RAG Systems

Implement role-based PII control in HR and recruiting RAG pipelines. HR managers, recruiters, interviewers, and hiring committees each see only the candidate data their role permits, enabling blind hiring and GDPR compliance.

HR departments are building AI assistants that search across candidate applications, employee records, and performance reviews. These RAG systems are powerful — a recruiter can ask “find me senior engineers with distributed systems experience” and get instant, contextual answers from hundreds of applications. But different roles need different levels of access to candidate data. A recruiter needs skills and experience to source candidates. An interviewer should see the resume but not salary expectations. A hiring committee needs fully anonymized profiles to reduce unconscious bias.

Traditional RAG pipelines ignore these distinctions entirely. When a document is retrieved and injected into the LLM prompt, every piece of personal information — names, emails, salary history, demographics — flows to the model regardless of who is querying. This violates GDPR data minimization principles, creates bias risks, and exposes sensitive compensation data to people who should never see it.

This article shows you how to implement role-based PII control in an HR recruiting RAG system using Blindfold. Each role — HR manager, recruiter, interviewer, hiring committee — sees only the candidate data their function requires, with everything else redacted or tokenized before it reaches the LLM.

The Privacy Problem in HR AI

HR data is among the most sensitive information an organization handles. Candidate applications and employee records contain a dense concentration of personal data that creates multiple risk vectors when processed through AI systems.

What Candidate Applications Contain

A typical candidate application includes the applicant's full name, email address, phone number, home address, educational history, employment history with company names and dates, salary expectations, and sometimes demographic information like date of birth or nationality. Some applications include references with additional third-party contact information.

What Employee Records Contain

Employee records go further: Social Security numbers, bank account numbers for payroll, emergency contact information, health insurance details, performance reviews, and disciplinary records. When an HR AI assistant has access to this data for answering workforce planning questions, every query potentially exposes all of it.

Why Traditional RAG Fails Here

A standard RAG pipeline retrieves the top-k most relevant documents and injects them verbatim into the LLM prompt. There is no concept of “who is asking.” When a recruiter asks about a candidate's experience, the retrieved application also contains salary history, contact details, and potentially sensitive demographic information — all of which flows to the LLM provider.

GDPR Article 5(1)(c) — Data Minimization: Personal data shall be adequate, relevant, and limited to what is necessary in relation to the purposes for which they are processed. This applies to internal processing too — a recruiter sourcing candidates does not need access to salary history or Social Security numbers.

Bias Reduction Through Data Isolation

Beyond compliance, there is a strong practical reason for role-based data access: reducing unconscious bias. Research consistently shows that hiding candidate names, gender indicators, age, and ethnicity from evaluators improves the fairness of hiring decisions. A hiring committee that sees “<Person_1> has 8 years of experience in Python, Go, and distributed systems” instead of a named candidate evaluates qualifications more objectively. Role-based PII control makes blind hiring a concrete, enforceable practice rather than a policy aspiration.

Role-Based Access with Blindfold Policies

Blindfold's entity-level redaction lets you define exactly which PII categories each role can see. By specifying different entity lists per role, the same candidate document produces four different outputs — each tailored to the information needs and privacy requirements of that role.

RoleSeesRedactedRationale
HR ManagerFull profile — names, contact, salaryNothing (full access)Needs complete picture for offers and negotiations
RecruiterNames, skills, experience, educationContact info, salary, SSN, demographicsSource candidates without compensation data
InterviewerSkills, experience, educationNames, contact, salary, demographicsReduce unconscious bias during evaluation
Hiring CommitteeSkills + experience onlyEverything personalFully blind evaluation of qualifications

The key insight is that each role maps to a different set of entity types to redact. The HR manager sees almost everything, so only extreme identifiers like Social Security numbers are removed. The recruiter loses contact info and salary. The interviewer loses names on top of that. The hiring committee uses policy="strict" for full anonymization — every personal entity is replaced with a token.

Implementation

Let's build a complete HR recruiting RAG system with role-based PII control. We start with sample candidate applications, define per-role entity configurations, and build a query function that applies the correct level of redaction based on who is asking.

Sample Candidate Data

Here are three candidate applications that we will index into our vector store. Each contains a mix of professional qualifications and personal information:

python
# Sample candidate applications
candidates = [
    "Application #APP-2024-001: Emily Zhang (emily.zhang@gmail.com, "
    "+1-415-555-0134) — Senior Software Engineer with 8 years "
    "experience in Python, Go, and distributed systems. Currently "
    "at TechCorp earning $185,000. Stanford University BS Computer "
    "Science 2016. Requesting $210,000.",

    "Application #APP-2024-002: James O'Brien (james.obrien@outlook.com, "
    "+1-212-555-0198) — Product Manager with 5 years experience "
    "leading B2B SaaS products. MIT MBA 2019. Currently at StartupCo "
    "earning $165,000. Requesting $195,000.",

    "Application #APP-2024-003: Priya Patel (priya.p@yahoo.com, "
    "+1-650-555-0267) — Data Scientist with 6 years experience in "
    "ML, NLP, and recommendation systems. UC Berkeley PhD 2018. "
    "Currently at DataInc earning $175,000. Requesting $200,000.",
]

Role Entity Configurations

Each role maps to a list of entity types that should be redacted. The HR manager has minimal redaction. The recruiter loses contact and financial data. The interviewer additionally loses names. The hiring committee uses policy="strict" which automatically redacts all detected PII — no entity list needed:

python
# Entity types to redact per role
# None means use policy="strict" for full anonymization
ROLE_ENTITIES = {
    "hr_manager": ["social security number"],  # sees almost everything
    "recruiter": [
        "email address",
        "phone number",
        "address",
        "social security number",
        "credit card number",
        "iban",
    ],
    "interviewer": [
        "person",
        "email address",
        "phone number",
        "address",
        "social security number",
        "date of birth",
    ],
    "hiring_committee": None,  # policy="strict" — fully anonymized
}

The HR RAG System

The core system indexes candidate applications into a vector store and applies role-appropriate redaction at query time. The query() method accepts a role parameter that determines which entity types are redacted before the context reaches the LLM:

python
import os
import chromadb
from blindfold import Blindfold
from openai import OpenAI

class HRRecruitingRAG:
    def __init__(self):
        self.blindfold = Blindfold(
            api_key=os.environ["BLINDFOLD_API_KEY"],
        )
        self.openai = OpenAI()
        self.collection = chromadb.Client().create_collection(
            "candidates"
        )

    def ingest(self, applications):
        # Index applications as-is into the vector store
        # Redaction happens at query time, per role
        for i, app in enumerate(applications):
            self.collection.add(
                documents=[app],
                ids=[f"app-{i}"],
            )

    def query(self, question, role):
        # Retrieve relevant candidate applications
        results = self.collection.query(
            query_texts=[question], n_results=3
        )
        context = "\n\n".join(results["documents"][0])

        # Apply role-based redaction
        entities = ROLE_ENTITIES.get(role)
        if entities is None:
            # Hiring committee: full anonymization
            tokenized = self.blindfold.tokenize(
                context, policy="strict"
            )
        elif len(entities) == 0:
            # No redaction needed
            tokenized = None
        else:
            # Selective redaction for this role
            tokenized = self.blindfold.tokenize(
                context, entities=entities
            )

        safe_context = tokenized.text if tokenized else context

        # Send redacted context to LLM
        messages = [
            {
                "role": "system",
                "content": (
                    "You are an HR assistant. Answer questions "
                    "about candidates based on this context:\n\n"
                    f"{safe_context}"
                ),
            },
            {"role": "user", "content": question},
        ]

        completion = self.openai.chat.completions.create(
            model="gpt-4o-mini", messages=messages
        )
        response = completion.choices[0].message.content

        # Detokenize for the end user (restore real values)
        if tokenized and tokenized.mapping:
            restored = self.blindfold.detokenize(
                response, tokenized.mapping
            )
            return restored.text
        return response

Querying with Different Roles

Here is how different roles query the same system. The same question produces different levels of detail based on the caller's role:

python
# Initialize and ingest
rag = HRRecruitingRAG()
rag.ingest(candidates)

question = "Tell me about the senior engineer candidates"

# HR Manager — sees everything
hr_answer = rag.query(question, role="hr_manager")
print("HR Manager:", hr_answer)

# Recruiter — no contact info or salary
recruiter_answer = rag.query(question, role="recruiter")
print("Recruiter:", recruiter_answer)

# Interviewer — no names, no contact, no salary
interviewer_answer = rag.query(question, role="interviewer")
print("Interviewer:", interviewer_answer)

# Hiring Committee — fully anonymized
committee_answer = rag.query(question, role="hiring_committee")
print("Committee:", committee_answer)

Bias Reduction Through De-identification

The hiring_committee role uses policy="strict" which removes all identifying information. The committee sees skills and qualifications without knowing the candidate's name, gender, ethnicity, or age. Names are replaced with tokens like <Person_1>, organizations become <Organization_1>, and numbers are replaced with <Number_1>.

This is not just a nice-to-have feature — it is a concrete implementation of blind hiring. Instead of relying on HR staff to ignore names on printed resumes, the system enforces anonymization at the infrastructure level. There is no way for the hiring committee to see identifying information even if they wanted to, because the data is tokenized before it ever reaches the LLM that generates their answers.

Blind hiring at the infrastructure level: When the hiring committee asks about candidates, the LLM itself never sees real names or personal details. The model cannot leak information it never received. This is fundamentally stronger than instructing the model to “ignore names” via prompt engineering.

Studies on blind auditions in orchestras, name-blind resume reviews, and structured interview processes consistently show that removing identifying information leads to more equitable outcomes. With role-based tokenization, you can apply these principles systematically across your entire AI-assisted hiring pipeline.

What Each Role Sees

To make the differences concrete, here is what the LLM receives when each role asks the same question: “Tell me about the senior engineer candidates.”

HR Manager View

The HR manager sees the complete picture, including compensation data needed for making offers:

Emily Zhang is a Senior Software Engineer with 8 years of experience in Python, Go, and distributed systems. She is currently at TechCorp earning $185,000 and is requesting $210,000. She holds a BS in Computer Science from Stanford University (2016). You can reach her at emily.zhang@gmail.com or +1-415-555-0134.

Recruiter View

The recruiter sees the candidate's name and qualifications but not contact details or salary information:

Emily Zhang is a Senior Software Engineer with 8 years of experience in Python, Go, and distributed systems. She holds a BS in Computer Science from Stanford University (2016). [Contact info and salary data redacted]

Interviewer View

The interviewer sees qualifications and experience but not the candidate's name, reducing unconscious bias:

<Person_1> is a Senior Software Engineer with 8 years of experience in Python, Go, and distributed systems. They hold a BS in Computer Science from Stanford University (2016).

Hiring Committee View

The hiring committee sees fully anonymized profiles. Names, organizations, numbers, and all personal identifiers are replaced with tokens:

<Person_1> is a Senior Software Engineer with <Number_1> years of experience in Python, Go, and distributed systems. They hold a BS in Computer Science from <Organization_1> (<Number_2>).

Notice how each successive role sees less personal information. The professional qualifications — skills, job titles, areas of expertise — remain visible across all roles because they are essential for evaluating candidates. Only the personal identifiers change.

Custom Policies for HR

Instead of hardcoding entity lists in your application, you can create named policies in the Blindfold dashboard. This provides centralized management: updating access levels requires no code changes, just a policy edit in the dashboard.

python
# Using named policies instead of hardcoded entity lists
# Create these in the Blindfold dashboard:
#   - "hr_manager"          → redacts SSN only
#   - "recruiter"            → redacts contact + financial
#   - "interviewer_panel"    → redacts names + contact + financial
#   - "external_auditor"     → strict policy, full anonymization

def query_with_policy(self, question, policy_name):
    results = self.collection.query(
        query_texts=[question], n_results=3
    )
    context = "\n\n".join(results["documents"][0])

    # Single line change: just pass the policy name
    tokenized = self.blindfold.tokenize(
        context, policy=policy_name
    )

    # Rest of the pipeline is identical
    messages = [
        {
            "role": "system",
            "content": (
                "You are an HR assistant. Answer based "
                "on this context:\n\n"
                f"{tokenized.text}"
            ),
        },
        {"role": "user", "content": question},
    ]

    completion = self.openai.chat.completions.create(
        model="gpt-4o-mini", messages=messages
    )
    return completion.choices[0].message.content

Named policies decouple access control from application code. When your legal team decides that recruiters should no longer see education details, you update the “recruiter” policy in the dashboard — no deployment needed. You can also create specialized policies like external_auditor for third-party compliance audits where maximum anonymization is required, or interviewer_panel for group interviews where even the number of interviewers should not be inferable from the data.

Centralized policy management: Create and update policies at docs.blindfold.sh/essentials/policies. Changes take effect immediately for all applications using that policy name.

GDPR and Employment Data

GDPR applies fully to employee and candidate data processing, including internal processing by AI systems. Article 88 specifically addresses processing in the employment context, and several provisions directly relate to how HR RAG systems handle personal data.

Data Minimization in Practice

Article 5(1)(c) requires that personal data be limited to what is necessary for the processing purpose. In a role-based system, this translates directly: a recruiter sourcing candidates does not need salary history, so the system redacts it. An interviewer evaluating technical skills does not need the candidate's home address, so the system removes it. Each role processes only the minimum data necessary for its function.

Right to Erasure and LLM Logs

When a candidate exercises their right to erasure under Article 17, you need to ensure their data is removed from all systems. With role-based tokenization, the LLM provider's logs contain only tokens like <Person_1> and <Email_Address_1> rather than real values. This significantly reduces your exposure: even if LLM provider logs are retained beyond your control, they contain no personal data that needs to be erased.

Cross-Border Considerations

Multinational companies process candidate data across borders. A US-based recruiter reviewing applications from EU candidates must still comply with GDPR. Role-based tokenization helps here because the data that crosses borders — the tokenized context sent to an LLM provider — contains no personal data. The tokens are meaningless without the mapping, and the mapping stays within your controlled infrastructure.

GDPR RequirementHow Role-Based PII Control Addresses It
Data minimization (Art. 5(1)(c))Each role processes only the entity types required for its function
Purpose limitation (Art. 5(1)(b))Policies enforce that data is only used for the role's stated purpose
Right to erasure (Art. 17)Tokenized LLM logs contain no real PII, reducing erasure scope
Cross-border transfers (Art. 44-49)Tokenized data sent to LLM providers contains no personal data
Employment context (Art. 88)Role-specific policies map to organizational access controls

Important: Role-based tokenization is a technical control that supports GDPR compliance, but it does not replace legal obligations. You still need a lawful basis for processing (typically legitimate interest or consent for recruitment), a Data Protection Impact Assessment for AI-based hiring, and transparent privacy notices for candidates. Consult your DPO for a complete compliance strategy.

Production Considerations

When deploying role-based PII control in a production HR system, there are several architectural decisions to consider:

  • Authentication integration. Map your identity provider's roles (Okta, Azure AD, Auth0) to Blindfold policy names. When a user queries the HR assistant, their JWT claims determine which policy is applied.
  • Audit logging. Log which role accessed which candidate data and when. Blindfold API responses include metadata about detected entities, which you can store for compliance audits.
  • Ingestion strategy. For the examples in this article, we index raw applications and redact at query time. In high-volume systems, consider pre-computing redacted versions for each role at ingestion time to reduce query latency.
  • Multi-tenant isolation. If your HR system serves multiple departments or subsidiaries, combine role-based policies with tenant-scoped vector store collections to prevent cross-tenant data leakage.
  • Fallback behavior. Define a default policy (such as strict) for unknown or unauthenticated roles. The system should always fail toward more privacy, never less.

Try It Yourself

Clone the complete RBAC example and see role-based PII control in action with your own candidate data:

Start protecting sensitive data

Free plan includes 500K characters/month. No credit card required.