Role-Based Candidate Privacy in HR and Recruiting RAG Systems
Implement role-based PII control in HR and recruiting RAG pipelines. HR managers, recruiters, interviewers, and hiring committees each see only the candidate data their role permits, enabling blind hiring and GDPR compliance.
HR departments are building AI assistants that search across candidate applications, employee records, and performance reviews. These RAG systems are powerful — a recruiter can ask “find me senior engineers with distributed systems experience” and get instant, contextual answers from hundreds of applications. But different roles need different levels of access to candidate data. A recruiter needs skills and experience to source candidates. An interviewer should see the resume but not salary expectations. A hiring committee needs fully anonymized profiles to reduce unconscious bias.
Traditional RAG pipelines ignore these distinctions entirely. When a document is retrieved and injected into the LLM prompt, every piece of personal information — names, emails, salary history, demographics — flows to the model regardless of who is querying. This violates GDPR data minimization principles, creates bias risks, and exposes sensitive compensation data to people who should never see it.
This article shows you how to implement role-based PII control in an HR recruiting RAG system using Blindfold. Each role — HR manager, recruiter, interviewer, hiring committee — sees only the candidate data their function requires, with everything else redacted or tokenized before it reaches the LLM.
The Privacy Problem in HR AI
HR data is among the most sensitive information an organization handles. Candidate applications and employee records contain a dense concentration of personal data that creates multiple risk vectors when processed through AI systems.
What Candidate Applications Contain
A typical candidate application includes the applicant's full name, email address, phone number, home address, educational history, employment history with company names and dates, salary expectations, and sometimes demographic information like date of birth or nationality. Some applications include references with additional third-party contact information.
What Employee Records Contain
Employee records go further: Social Security numbers, bank account numbers for payroll, emergency contact information, health insurance details, performance reviews, and disciplinary records. When an HR AI assistant has access to this data for answering workforce planning questions, every query potentially exposes all of it.
Why Traditional RAG Fails Here
A standard RAG pipeline retrieves the top-k most relevant documents and injects them verbatim into the LLM prompt. There is no concept of “who is asking.” When a recruiter asks about a candidate's experience, the retrieved application also contains salary history, contact details, and potentially sensitive demographic information — all of which flows to the LLM provider.
GDPR Article 5(1)(c) — Data Minimization: Personal data shall be adequate, relevant, and limited to what is necessary in relation to the purposes for which they are processed. This applies to internal processing too — a recruiter sourcing candidates does not need access to salary history or Social Security numbers.
Bias Reduction Through Data Isolation
Beyond compliance, there is a strong practical reason for role-based data access: reducing unconscious bias. Research consistently shows that hiding candidate names, gender indicators, age, and ethnicity from evaluators improves the fairness of hiring decisions. A hiring committee that sees “<Person_1> has 8 years of experience in Python, Go, and distributed systems” instead of a named candidate evaluates qualifications more objectively. Role-based PII control makes blind hiring a concrete, enforceable practice rather than a policy aspiration.
Role-Based Access with Blindfold Policies
Blindfold's entity-level redaction lets you define exactly which PII categories each role can see. By specifying different entity lists per role, the same candidate document produces four different outputs — each tailored to the information needs and privacy requirements of that role.
| Role | Sees | Redacted | Rationale |
|---|---|---|---|
| HR Manager | Full profile — names, contact, salary | Nothing (full access) | Needs complete picture for offers and negotiations |
| Recruiter | Names, skills, experience, education | Contact info, salary, SSN, demographics | Source candidates without compensation data |
| Interviewer | Skills, experience, education | Names, contact, salary, demographics | Reduce unconscious bias during evaluation |
| Hiring Committee | Skills + experience only | Everything personal | Fully blind evaluation of qualifications |
The key insight is that each role maps to a different set of entity types to redact. The HR manager sees almost everything, so only extreme identifiers like Social Security numbers are removed. The recruiter loses contact info and salary. The interviewer loses names on top of that. The hiring committee uses policy="strict" for full anonymization — every personal entity is replaced with a token.
Implementation
Let's build a complete HR recruiting RAG system with role-based PII control. We start with sample candidate applications, define per-role entity configurations, and build a query function that applies the correct level of redaction based on who is asking.
Sample Candidate Data
Here are three candidate applications that we will index into our vector store. Each contains a mix of professional qualifications and personal information:
# Sample candidate applications candidates = [ "Application #APP-2024-001: Emily Zhang (emily.zhang@gmail.com, " "+1-415-555-0134) — Senior Software Engineer with 8 years " "experience in Python, Go, and distributed systems. Currently " "at TechCorp earning $185,000. Stanford University BS Computer " "Science 2016. Requesting $210,000.", "Application #APP-2024-002: James O'Brien (james.obrien@outlook.com, " "+1-212-555-0198) — Product Manager with 5 years experience " "leading B2B SaaS products. MIT MBA 2019. Currently at StartupCo " "earning $165,000. Requesting $195,000.", "Application #APP-2024-003: Priya Patel (priya.p@yahoo.com, " "+1-650-555-0267) — Data Scientist with 6 years experience in " "ML, NLP, and recommendation systems. UC Berkeley PhD 2018. " "Currently at DataInc earning $175,000. Requesting $200,000.", ]
Role Entity Configurations
Each role maps to a list of entity types that should be redacted. The HR manager has minimal redaction. The recruiter loses contact and financial data. The interviewer additionally loses names. The hiring committee uses policy="strict" which automatically redacts all detected PII — no entity list needed:
# Entity types to redact per role # None means use policy="strict" for full anonymization ROLE_ENTITIES = { "hr_manager": ["social security number"], # sees almost everything "recruiter": [ "email address", "phone number", "address", "social security number", "credit card number", "iban", ], "interviewer": [ "person", "email address", "phone number", "address", "social security number", "date of birth", ], "hiring_committee": None, # policy="strict" — fully anonymized }
The HR RAG System
The core system indexes candidate applications into a vector store and applies role-appropriate redaction at query time. The query() method accepts a role parameter that determines which entity types are redacted before the context reaches the LLM:
import os import chromadb from blindfold import Blindfold from openai import OpenAI class HRRecruitingRAG: def __init__(self): self.blindfold = Blindfold( api_key=os.environ["BLINDFOLD_API_KEY"], ) self.openai = OpenAI() self.collection = chromadb.Client().create_collection( "candidates" ) def ingest(self, applications): # Index applications as-is into the vector store # Redaction happens at query time, per role for i, app in enumerate(applications): self.collection.add( documents=[app], ids=[f"app-{i}"], ) def query(self, question, role): # Retrieve relevant candidate applications results = self.collection.query( query_texts=[question], n_results=3 ) context = "\n\n".join(results["documents"][0]) # Apply role-based redaction entities = ROLE_ENTITIES.get(role) if entities is None: # Hiring committee: full anonymization tokenized = self.blindfold.tokenize( context, policy="strict" ) elif len(entities) == 0: # No redaction needed tokenized = None else: # Selective redaction for this role tokenized = self.blindfold.tokenize( context, entities=entities ) safe_context = tokenized.text if tokenized else context # Send redacted context to LLM messages = [ { "role": "system", "content": ( "You are an HR assistant. Answer questions " "about candidates based on this context:\n\n" f"{safe_context}" ), }, {"role": "user", "content": question}, ] completion = self.openai.chat.completions.create( model="gpt-4o-mini", messages=messages ) response = completion.choices[0].message.content # Detokenize for the end user (restore real values) if tokenized and tokenized.mapping: restored = self.blindfold.detokenize( response, tokenized.mapping ) return restored.text return response
Querying with Different Roles
Here is how different roles query the same system. The same question produces different levels of detail based on the caller's role:
# Initialize and ingest rag = HRRecruitingRAG() rag.ingest(candidates) question = "Tell me about the senior engineer candidates" # HR Manager — sees everything hr_answer = rag.query(question, role="hr_manager") print("HR Manager:", hr_answer) # Recruiter — no contact info or salary recruiter_answer = rag.query(question, role="recruiter") print("Recruiter:", recruiter_answer) # Interviewer — no names, no contact, no salary interviewer_answer = rag.query(question, role="interviewer") print("Interviewer:", interviewer_answer) # Hiring Committee — fully anonymized committee_answer = rag.query(question, role="hiring_committee") print("Committee:", committee_answer)
Bias Reduction Through De-identification
The hiring_committee role uses policy="strict" which removes all identifying information. The committee sees skills and qualifications without knowing the candidate's name, gender, ethnicity, or age. Names are replaced with tokens like <Person_1>, organizations become <Organization_1>, and numbers are replaced with <Number_1>.
This is not just a nice-to-have feature — it is a concrete implementation of blind hiring. Instead of relying on HR staff to ignore names on printed resumes, the system enforces anonymization at the infrastructure level. There is no way for the hiring committee to see identifying information even if they wanted to, because the data is tokenized before it ever reaches the LLM that generates their answers.
Blind hiring at the infrastructure level: When the hiring committee asks about candidates, the LLM itself never sees real names or personal details. The model cannot leak information it never received. This is fundamentally stronger than instructing the model to “ignore names” via prompt engineering.
Studies on blind auditions in orchestras, name-blind resume reviews, and structured interview processes consistently show that removing identifying information leads to more equitable outcomes. With role-based tokenization, you can apply these principles systematically across your entire AI-assisted hiring pipeline.
What Each Role Sees
To make the differences concrete, here is what the LLM receives when each role asks the same question: “Tell me about the senior engineer candidates.”
HR Manager View
The HR manager sees the complete picture, including compensation data needed for making offers:
Recruiter View
The recruiter sees the candidate's name and qualifications but not contact details or salary information:
Interviewer View
The interviewer sees qualifications and experience but not the candidate's name, reducing unconscious bias:
Hiring Committee View
The hiring committee sees fully anonymized profiles. Names, organizations, numbers, and all personal identifiers are replaced with tokens:
Notice how each successive role sees less personal information. The professional qualifications — skills, job titles, areas of expertise — remain visible across all roles because they are essential for evaluating candidates. Only the personal identifiers change.
Custom Policies for HR
Instead of hardcoding entity lists in your application, you can create named policies in the Blindfold dashboard. This provides centralized management: updating access levels requires no code changes, just a policy edit in the dashboard.
# Using named policies instead of hardcoded entity lists # Create these in the Blindfold dashboard: # - "hr_manager" → redacts SSN only # - "recruiter" → redacts contact + financial # - "interviewer_panel" → redacts names + contact + financial # - "external_auditor" → strict policy, full anonymization def query_with_policy(self, question, policy_name): results = self.collection.query( query_texts=[question], n_results=3 ) context = "\n\n".join(results["documents"][0]) # Single line change: just pass the policy name tokenized = self.blindfold.tokenize( context, policy=policy_name ) # Rest of the pipeline is identical messages = [ { "role": "system", "content": ( "You are an HR assistant. Answer based " "on this context:\n\n" f"{tokenized.text}" ), }, {"role": "user", "content": question}, ] completion = self.openai.chat.completions.create( model="gpt-4o-mini", messages=messages ) return completion.choices[0].message.content
Named policies decouple access control from application code. When your legal team decides that recruiters should no longer see education details, you update the “recruiter” policy in the dashboard — no deployment needed. You can also create specialized policies like external_auditor for third-party compliance audits where maximum anonymization is required, or interviewer_panel for group interviews where even the number of interviewers should not be inferable from the data.
Centralized policy management: Create and update policies at docs.blindfold.sh/essentials/policies. Changes take effect immediately for all applications using that policy name.
GDPR and Employment Data
GDPR applies fully to employee and candidate data processing, including internal processing by AI systems. Article 88 specifically addresses processing in the employment context, and several provisions directly relate to how HR RAG systems handle personal data.
Data Minimization in Practice
Article 5(1)(c) requires that personal data be limited to what is necessary for the processing purpose. In a role-based system, this translates directly: a recruiter sourcing candidates does not need salary history, so the system redacts it. An interviewer evaluating technical skills does not need the candidate's home address, so the system removes it. Each role processes only the minimum data necessary for its function.
Right to Erasure and LLM Logs
When a candidate exercises their right to erasure under Article 17, you need to ensure their data is removed from all systems. With role-based tokenization, the LLM provider's logs contain only tokens like <Person_1> and <Email_Address_1> rather than real values. This significantly reduces your exposure: even if LLM provider logs are retained beyond your control, they contain no personal data that needs to be erased.
Cross-Border Considerations
Multinational companies process candidate data across borders. A US-based recruiter reviewing applications from EU candidates must still comply with GDPR. Role-based tokenization helps here because the data that crosses borders — the tokenized context sent to an LLM provider — contains no personal data. The tokens are meaningless without the mapping, and the mapping stays within your controlled infrastructure.
| GDPR Requirement | How Role-Based PII Control Addresses It |
|---|---|
| Data minimization (Art. 5(1)(c)) | Each role processes only the entity types required for its function |
| Purpose limitation (Art. 5(1)(b)) | Policies enforce that data is only used for the role's stated purpose |
| Right to erasure (Art. 17) | Tokenized LLM logs contain no real PII, reducing erasure scope |
| Cross-border transfers (Art. 44-49) | Tokenized data sent to LLM providers contains no personal data |
| Employment context (Art. 88) | Role-specific policies map to organizational access controls |
Important: Role-based tokenization is a technical control that supports GDPR compliance, but it does not replace legal obligations. You still need a lawful basis for processing (typically legitimate interest or consent for recruitment), a Data Protection Impact Assessment for AI-based hiring, and transparent privacy notices for candidates. Consult your DPO for a complete compliance strategy.
Production Considerations
When deploying role-based PII control in a production HR system, there are several architectural decisions to consider:
- Authentication integration. Map your identity provider's roles (Okta, Azure AD, Auth0) to Blindfold policy names. When a user queries the HR assistant, their JWT claims determine which policy is applied.
- Audit logging. Log which role accessed which candidate data and when. Blindfold API responses include metadata about detected entities, which you can store for compliance audits.
- Ingestion strategy. For the examples in this article, we index raw applications and redact at query time. In high-volume systems, consider pre-computing redacted versions for each role at ingestion time to reduce query latency.
- Multi-tenant isolation. If your HR system serves multiple departments or subsidiaries, combine role-based policies with tenant-scoped vector store collections to prevent cross-tenant data leakage.
- Fallback behavior. Define a default policy (such as
strict) for unknown or unauthenticated roles. The system should always fail toward more privacy, never less.
Try It Yourself
Clone the complete RBAC example and see role-based PII control in action with your own candidate data:
- RBAC Cookbook Example (Python) — role-based PII control with ChromaDB and OpenAI
- RBAC Cookbook Example (Node.js) — same pattern in TypeScript
- Policies Documentation — create and manage named policies in the dashboard
- GDPR-Compliant AI — comprehensive guide to GDPR compliance with Blindfold
Start protecting sensitive data
Free plan includes 500K characters/month. No credit card required.