KG Job & Candidate Matcher (HireMatch AI)

Streamlit UI for ranking jobs <=> candidates over a Neo4j knowledge graph using Rasch scores, TransR embeddings, and skill coverage. The app also integrates optional LLM helpers (OpenAI, Gemini, or local Ollama) for explanations and skill-coaching guidance.

This project is also referred to as HireMatch AI, highlighting its role as a two-way conversational job–candidate matching assistant built on knowledge graphs and statistical learning models.

System Architecture

User Interface

Candidate View

Upload resume (PDF/TXT)

Link resume to candidate node

View top-K job recommendations

Identify missing skills for target jobs

Optional AI explanations and coaching tips

Employer View

Search and select job postings

Rank best-fit candidates

Recruiter-friendly AI summaries

DATASET in Drive

https://drive.google.com/drive/folders/1cZb0mlXbhNcTOQklmxNYmzxMt7WGBtkr?usp=sharing

Problem Statement & Goals

Hiring platforms often rely on keyword matching, which fails to capture skill difficulty, candidate strength, and contextual relevance. This project aims to:

Match candidates to relevant jobs and jobs to relevant candidates
Use a Neo4j knowledge graph to model skills, jobs, candidates, companies, and locations
Combine Rasch statistical modeling with TransR graph embeddings
Provide interpretable and explainable rankings for recruiters and candidates
Support optional LLM-based explanations and coaching

High-level pipeline:

Data ingestion: Job postings and resumes are parsed to extract skills, locations, and metadata
Knowledge graph construction: Entities and relationships are stored in Neo4j
Metric computation: Rasch model -> skill difficulty, candidate ability, job difficulty. TransR -> relational graph embeddings
Recommendation engine: Composite scoring using coverage, Rasch, and TransR
Streamlit UI: Interactive candidate and recruiter experiences

Core Algorithms & Models

1. Knowledge Graph (Neo4j)

The Neo4j graph represents:

Nodes: JobPost, Candidate, Skill, Company, Location, Chunk
Edges: HAS_SKILL, REQUIRES_SKILL, POSTED_BY, LOCATED_IN, etc.

This structure enables rich graph queries and embedding-based reasoning.

2. Rasch Model (Skill & Ability Scoring)

The Rasch model converts skill rarity into interpretable difficulty and ability scores:

Rare skills → higher difficulty
Candidate ability increases with possession of difficult skills
Job difficulty aggregates required skill difficulties

This allows principled comparison between candidates and jobs beyond raw skill counts.

3. TransR Graph Embeddings

TransR embeds entities and relations into relation-specific vector spaces:

Captures latent semantic compatibility between jobs and candidates
Supports similarity-based ranking when direct skill overlap is limited
Trained offline and written back to Neo4j for fast retrieval

4. Composite Ranking Function

Final ranking score is a weighted combination of:

Skill Coverage (α) – overlap between candidate skills and job requirements
Rasch Signal (β) – candidate ability vs. job difficulty
TransR Similarity (γ) – embedding proximity

Weights are adjustable in the Streamlit UI.

Datasets

Job Postings Dataset

124,000+ LinkedIn job postings (2023–2024)
Scraped and stored originally as CSV
Fields used include job title, salary range, employment type, location, and skills

Resume Dataset

Real anonymized and synthetic resumes
Stored in JSON format
Includes experience, education, skills, and embedded text chunks

All data is cleaned, normalized, converted to JSONL, and integrated into Neo4j.

Evaluation

Evaluation performed at K = 5:

Precision@5: 0.86
Recall@5: 0.82
NDCG@5: 0.78
MAP@5: 0.67

These results indicate strong ranking quality and effective blending of Rasch, TransR, and coverage signals.

Quick Start

git clone <repo>
cd SWMP
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
__pycache__		__pycache__
demo-ss		demo-ss
transr_output		transr_output
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
app.py		app.py
evaluation.py		evaluation.py
export_triples.py		export_triples.py
kg_triples.tsv		kg_triples.tsv
neo4j_writer.py		neo4j_writer.py
ranking_engine.py		ranking_engine.py
rasch_scoring.py		rasch_scoring.py
requirements.txt		requirements.txt
train_transr.py		train_transr.py
write_transr_embeddings.py		write_transr_embeddings.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KG Job & Candidate Matcher (HireMatch AI)

System Architecture

User Interface

Candidate View

Employer View

DATASET in Drive

Problem Statement & Goals

Core Algorithms & Models

1. Knowledge Graph (Neo4j)

2. Rasch Model (Skill & Ability Scoring)

3. TransR Graph Embeddings

4. Composite Ranking Function

Datasets

Job Postings Dataset

Resume Dataset

Evaluation

Quick Start

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

KG Job & Candidate Matcher (HireMatch AI)

System Architecture

User Interface

Candidate View

Employer View

DATASET in Drive

Problem Statement & Goals

Core Algorithms & Models

1. Knowledge Graph (Neo4j)

2. Rasch Model (Skill & Ability Scoring)

3. TransR Graph Embeddings

4. Composite Ranking Function

Datasets

Job Postings Dataset

Resume Dataset

Evaluation

Quick Start

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages