CS 128 Code Integrity System

Multi-dimensional plagiarism detection for large-enrollment programming courses

Status: In ProductionDuration: Fall 2025 - Present

Overview

Detecting academic dishonesty in large-enrollment programming courses requires more than pairwise code comparison. I built the CS 128 Code Integrity System to combine structural code analysis (i.e., winnowing fingerprinting and k-gram similarity) with edit volatility scoring across each student’s full submission history. The system integrates directly with PrairieLearn, ingests every submission attempt, and provides course staff with the evidence necessary to investigate integrity violations that would be undetectable through manual review or traditional similarity checking alone.

The Academic Integrity Challenge in Large-Enrollment Courses

As CS 128 is taught at scale, maintaining academic integrity is both critical and challenging. Traditional plagiarism detection tools designed for small classes fail in large-enrollment contexts where hundreds of submissions must be analyzed for potential violations. Simple pairwise similarity checking produces overwhelming numbers of false positives from boilerplate code and common algorithmic patterns, while sophisticated forms of dishonesty (such as students submitting code obtained from external sources) could go entirely undetected.

The problem intensified with the proliferation of AI code generation tools. Students can now obtain complete solutions that superficially differ from their peers’ submissions, and traditional plagiarism checkers that rely solely on textual similarity cannot detect these violations because the code appears unique when compared to other students’ work.

We needed a system built for large-scale integrity monitoring that could identify multiple forms of academic dishonesty, present evidence clearly for investigation, and operate efficiently enough to analyze thousands of submissions across multiple assignments throughout the semester.

Multi-Dimensional Detection

The system’s core innovation lies in analyzing submissions across multiple independent dimensions, each designed to detect different forms of academic dishonesty:

Structural Similarity Analysis

The primary detection mechanism employs winnowing fingerprinting and k-gram analysis to identify code that shares structural similarities even when superficially different. The system tokenizes student code, normalizes identifiers to detect renamed variables and functions, and computes cryptographic hashes of k-gram sequences (contiguous token sequences of configurable length). By comparing fingerprint overlaps between submissions, the algorithm identifies students who have produced structurally equivalent solutions despite surface-level differences.

This approach detects traditional plagiarism where students copy code and attempt to disguise it through variable renaming, comment removal, or statement reordering. The configurable k-gram size allows instructors to calibrate sensitivity: smaller k-grams detect short copied segments, while larger k-grams identify longer structural patterns that indicate coordinated development rather than coincidental similarity.

The system filters out starter code patterns and common boilerplate to reduce false positives, focusing attention on substantive code unique to each assignment. Code that appears in more than 75% of submissions is automatically classified as common and excluded from similarity scoring. Similarity scores are computed for each student pair, with configurable thresholds determining which matches warrant investigation.

Edit Volatility Analysis

The system ingests every submission attempt for every student—not just final submissions—enabling analysis of how each student’s code develops over time. For each pair of consecutive attempts, the system computes a volatility score that measures how dramatically the code changed. The score combines Levenshtein distance (measuring textual change) with token-based change ratios (measuring structural change), weighted by the time between attempts—rapid large changes score higher than gradual ones.

High volatility—where students make massive structural changes between submissions rather than incremental refinements—often indicates code obtained from external sources rather than developed through the student’s own iterative process. The system detects “volatility streaks”—sequences of three or more consecutive high-volatility attempts—a pattern inconsistent with genuine iterative development. Streak detection presents autograder scores alongside volatility metrics, enabling reviewers to identify cases where low scores are followed by sudden perfect solutions without intermediate progress.

Configurable Analysis Parameters

Recognizing that appropriate plagiarism detection standards vary by assignment type, the system provides per-run configuration options:

K-gram size: Adjusts the granularity of structural matching from short code fragments to longer algorithmic patterns
Window size: Controls the winnowing window for fingerprint selection
Similarity threshold: Controls how much overlap constitutes a potential match
Common code detection: Toggle automatic exclusion of code patterns appearing in 75% or more of submissions
Identifier normalization: Toggle normalization of variable and function names to detect renamed code
Starter code exclusion: Filter out code provided to students as part of the assignment

Analysis configurations are saved with each run, allowing instructors to compare results across different parameter settings and build institutional knowledge about effective parameters for different types of programming problems.

Investigation and Reporting Workflow

Beyond detection, the system provides a structured workflow for investigating and documenting integrity violations:

Match Review

Each similarity match enters a review queue where course staff categorize it as dismissed, suspicious, or confident. The review interface presents side-by-side code comparisons with matched regions highlighted, allowing reviewers to quickly assess whether structural similarity reflects genuine plagiarism or coincidental convergence. Review progress tracking shows completion rates per analysis run, helping staff manage the investigation workload across a team.

Report Generation

The system generates anonymized PDF reports suitable for submission to institutional academic integrity processes. Reports include selected similarity matches with overlap counts and similarity scores, optional submission timelines showing attempt progression, and volatility comparisons. Partner identities are automatically anonymized to comply with privacy requirements during the investigation process.

Educational Impact and Outcomes

Since deployment in Fall 2025, the CS 128 Code Integrity System has transformed academic integrity enforcement in the course:

AI Detection at Scale: In its first semester, the system flagged approximately 10% of enrolled students for potential AI-usage integrity violations through our volatility analysis. Of those flagged, roughly half admitted to the allegation when presented with the evidence. Another half denied the allegation but were found to have committed an infraction through the appeal process. Only 10% of students who denied the allegation chose to appeal the finding, and none of those appeals were overturned, reflecting the strength of the evidence the system produces. Reduced Investigation Burden: By filtering out starter code and common patterns and using configurable thresholds, the system reduces false positive rates substantially. The structured review workflow allows course staff to focus on high-confidence matches supported by multiple forms of evidence rather than manually reviewing hundreds of flagged pairs.

Objective Evidence for Investigations: When integrity violations are suspected, the system provides objective evidence for investigation. Generated reports present students and institutional reviewers with specific similarity scores and volatility metrics rather than subjective assessments of code similarity.

Broader Contributions to Computing Education

The question facing large-enrollment programming courses is whether academic integrity can be maintained as AI code generation tools become widely accessible. Our system demonstrates one approach: automated detection paired with structured human review and formal reporting. Neither purely manual review nor fully automated decision-making scales for courses of this size; the combination of both, with configurable detection parameters and clear evidentiary standards, has proven effective at the scale of CS 128.