Research Publication · ICMLC 2026 (Accepted, to be presented Feb 2026)

When Graph Structure Hurts

Lightweight Path Ranking for Dense KG-RAG

Knowledge GraphsRAGPyTorchGNNMLPGraph Neural NetworksResearch

Abstract

Knowledge Graph-augmented Retrieval-Augmented Generation (KG-RAG) systems have emerged as a powerful approach for question answering over structured knowledge. The conventional wisdom suggests that Graph Neural Networks (GNNs) should excel at ranking paths in knowledge graphs due to their ability to aggregate neighborhood information.

However, our research demonstrates a counterintuitive finding: in dense academic citation graphs with high entity overlap, simple MLP-based scorers outperform complex GNN architectures while using 13× fewer parameters.

This paper was accepted to ICMLC 2026 and will be presented in February 2026.

The Research Question

Knowledge graph RAG systems typically follow a two-stage process:

  1. Path Discovery: Find potential paths in the KG connecting query entities to candidate answers
  2. Path Ranking: Score and rank these paths by relevance

For path ranking, GNNs (Graph Convolutional Networks and Graph Attention Networks) are the popular choice. They're designed to leverage graph structure by aggregating information from neighboring nodes.

But what if the graph is too dense? What if nearly every entity connects to nearly every other entity through short paths? Does graph structure still help, or does it become noise?

Experimental Setup

Dataset: Dense Academic Citation Graph

I constructed a knowledge graph from academic papers in machine learning and NLP:

The key characteristic: this graph is dense. Most papers are connected through citation paths of length 2 or 3. This density is representative of real-world academic KGs where foundational papers create hub nodes.

Models Compared

I compared three path ranking approaches:

  1. MLP Scorer: Simple feedforward network operating on path features (length, relation types, node features)
  2. GCN (Graph Convolutional Network): Multi-layer GCN aggregating neighborhood information
  3. GAT (Graph Attention Network): Attention-based GNN that learns to weight neighbors dynamically

Evaluation Metrics

Key Findings

Main Results

  • MLP Scorer: 93.9% AUC, 180K parameters
  • GCN: 90.8% AUC, 2.4M parameters
  • GAT: 87.4% AUC, 3.1M parameters

Simple MLP outperforms complex GNNs while using 13× fewer parameters

Why Do GNNs Fail in Dense Graphs?

Through ablation studies and error analysis, I identified two primary failure modes:

1. Hub-Noise Amplification

In dense graphs with high entity overlap, certain papers (like BERT, Attention Is All You Need, ImageNet) become hub nodes cited by thousands of papers. GNNs aggregate features from all neighbors, meaning:

In contrast, the MLP scorer treats each path independently without aggregating across the full neighborhood, avoiding hub-induced noise.

2. Over-Smoothing

GNNs with multiple layers suffer from over-smoothing: node representations become increasingly similar as information propagates through the graph. In dense graphs where most nodes are within 2-3 hops of each other, this effect is pronounced.

After 3 GNN layers, I observed that node embeddings had average cosine similarity of 0.87—nearly all papers looked the same to the model.

When Lightweight Scorers Win

Based on these findings, I hypothesize that lightweight scorers (like MLPs) outperform GNNs when:

  1. High density: Entity overlap exceeds ~75%
  2. Hub-dominated: A small number of nodes have extremely high degree
  3. Short paths: Most paths are length 2-4 (little benefit from multi-hop aggregation)
  4. Strong node features: Individual node embeddings (e.g., paper abstracts encoded with BERT) are already highly informative

Methodology Details

Hybrid Retrieval Pipeline

The full KG-RAG system combines:

  1. Semantic Retrieval: BERT-based embedding similarity to find candidate papers
  2. Dual-Path Discovery:
    • Citation paths (Paper A → cites → Paper B)
    • Author paths (Paper A → authored_by → Author X → authored → Paper B)
  3. Path Scoring: The lightweight MLP scorer ranks discovered paths
  4. Answer Generation: GPT-4 synthesizes final answer using top-ranked paths as context

MLP Architecture

The winning MLP architecture is surprisingly simple:

Total parameters: 180K. Compare this to 2.4M for GCN and 3.1M for GAT.

Implications

For Practitioners

For Researchers

Limitations & Future Work

This research has several limitations:

Future work directions include:

  1. Developing density-aware hybrid architectures that combine MLP and GNN approaches
  2. Exploring selective neighbor aggregation to mitigate hub-noise amplification
  3. Extending analysis to other dense KG domains beyond academic citations
  4. Investigating whether findings hold for very large graphs (10M+ nodes)

Code & Resources

The paper will be published with full reproducibility materials:

PRESENTATION AT ICMLC 2026

I'll be presenting this work at ICMLC 2026 in February. If you're attending or interested in discussing the research, I'd love to connect.

Get in Touch