CS 128 Code Integrity System

Multi-dimensional plagiarism detection for large-enrollment programming courses

Status: In ProductionDuration: Fall 2025 - Present

Overview

The CS 128 Code Integrity System is a specialized platform for detecting academic dishonesty in programming assignments at scale. Our system integrates directly with PrairieLearn to ingest student submissions and applies multiple complementary detection techniques to identify plagiarism and suspicious code development patterns. By combining structural code analysis with edit volatility scoring across each student’s full submission history, the platform provides course staff with evidence-based tools for investigating academic integrity violations that would be impossible to detect through manual review or traditional similarity checking alone.

The Academic Integrity Challenge in Large-Enrollment Courses

As CS 128 is taught at scale, maintaining academic integrity is both critical and challenging. Traditional plagiarism detection tools designed for small classes fail in large-enrollment contexts where hundreds of submissions must be analyzed for potential violations. Simple pairwise similarity checking produces overwhelming numbers of false positives from boilerplate code and common algorithmic patterns, while sophisticated forms of dishonesty (such as students submitting code obtained from external sources) could go entirely undetected.

The problem intensified with the proliferation of AI code generation tools. Students can now obtain complete solutions that superficially differ from their peers’ submissions, and traditional plagiarism checkers that rely solely on textual similarity cannot detect these violations because the code appears unique when compared to other students’ work.

We needed a system built for large-scale integrity monitoring that could identify multiple forms of academic dishonesty, present evidence clearly for investigation, and operate efficiently enough to analyze thousands of submissions across multiple assignments throughout the semester.

Multi-Dimensional Detection

The system’s core innovation lies in analyzing submissions across multiple independent dimensions, each designed to detect different forms of academic dishonesty:

Structural Similarity Analysis

The primary detection mechanism employs winnowing fingerprinting and k-gram analysis to identify code that shares structural similarities even when superficially different. The system tokenizes student code, normalizes identifiers to detect renamed variables and functions, and computes cryptographic hashes of k-gram sequences (contiguous token sequences of configurable length). By comparing fingerprint overlaps between submissions, the algorithm identifies students who have produced structurally equivalent solutions despite surface-level differences.

This approach detects traditional plagiarism where students copy code and attempt to disguise it through variable renaming, comment removal, or statement reordering. The configurable k-gram size allows instructors to calibrate sensitivity: smaller k-grams detect short copied segments, while larger k-grams identify longer structural patterns that indicate coordinated development rather than coincidental similarity.

The system filters out starter code patterns and common boilerplate to reduce false positives, focusing attention on substantive code unique to each assignment. Code that appears in more than 75% of submissions is automatically classified as common and excluded from similarity scoring. Similarity scores are computed for each student pair, with configurable thresholds determining which matches warrant investigation.

Edit Volatility Analysis

The system ingests every submission attempt for every student—not just final submissions—enabling analysis of how each student’s code develops over time. For each pair of consecutive attempts, the system computes a volatility score that measures how dramatically the code changed. The score combines Levenshtein distance (measuring textual change) with token-based change ratios (measuring structural change), weighted by the time between attempts—rapid large changes score higher than gradual ones.

High volatility—where students make massive structural changes between submissions rather than incremental refinements—often indicates code obtained from external sources rather than developed through the student’s own iterative process. The system detects “volatility streaks”—sequences of three or more consecutive high-volatility attempts—a pattern inconsistent with genuine iterative development. Streak detection presents autograder scores alongside volatility metrics, enabling reviewers to identify cases where low scores are followed by sudden perfect solutions without intermediate progress.

Configurable Analysis Parameters

Recognizing that appropriate plagiarism detection standards vary by assignment type, the system provides per-run configuration options:

  • K-gram size: Adjusts the granularity of structural matching from short code fragments to longer algorithmic patterns
  • Window size: Controls the winnowing window for fingerprint selection
  • Similarity threshold: Controls how much overlap constitutes a potential match
  • Common code detection: Toggle automatic exclusion of code patterns appearing in 75% or more of submissions
  • Identifier normalization: Toggle normalization of variable and function names to detect renamed code
  • Starter code exclusion: Filter out code provided to students as part of the assignment

Analysis configurations are saved with each run, allowing instructors to compare results across different parameter settings and build institutional knowledge about effective parameters for different types of programming problems.

Investigation and Reporting Workflow

Beyond detection, the system provides a structured workflow for investigating and documenting integrity violations:

Match Review

Each similarity match enters a review queue where course staff categorize it as dismissed, suspicious, or confident. The review interface presents side-by-side code comparisons with matched regions highlighted, allowing reviewers to quickly assess whether structural similarity reflects genuine plagiarism or coincidental convergence. Review progress tracking shows completion rates per analysis run, helping staff manage the investigation workload across a team.

Report Generation

The system generates anonymized PDF reports suitable for submission to institutional academic integrity processes. Reports include selected similarity matches with overlap counts and similarity scores, optional submission timelines showing attempt progression, and volatility comparisons. Partner identities are automatically anonymized to comply with privacy requirements during the investigation process.

Educational Impact and Outcomes

Since deployment in Fall 2025, the CS 128 Code Integrity System has transformed academic integrity enforcement in the course:

AI Detection at Scale: In its first semester, the system flagged approximately 10% of enrolled students for potential AI-usage integrity violations through our volatility analysis. Of those flagged, roughly half admitted to the allegation when presented with the evidence. Another half denied the allegation but were found to have committed an infraction through the appeal process. Only 10% of students who denied the allegation chose to appeal the finding, and none of those appeals were overturned, reflecting the strength of the evidence the system produces. Edit volatility analysis has proven particularly effective at flagging students whose submission history shows patterns inconsistent with authentic iterative development.

Reduced Investigation Burden: By filtering out starter code and common patterns and using configurable thresholds, the system dramatically reduces false positive rates. The structured review workflow allows course staff to focus investigation time on high-confidence matches supported by multiple forms of evidence rather than manually reviewing hundreds of flagged pairs.

Objective Evidence for Investigations: When integrity violations are suspected, the system provides objective, data-driven evidence to support investigations. Generated reports present students and institutional reviewers with specific similarity scores and volatility metrics rather than subjective assessments of code similarity. This evidence-based approach strengthens the integrity investigation process and helps distinguish genuine violations from coincidental similarities.

Broader Contributions to Computing Education

The analytical framework developed for this system represents a methodological contribution that extends beyond CS 128. As computing courses nationwide grapple with maintaining academic integrity in the era of AI code generation, the multi-dimensional detection approach demonstrated here provides a replicable model for other institutions.

The system demonstrates that large-enrollment programming courses can maintain rigorous academic integrity standards through sophisticated automated analysis combined with human investigation of flagged cases. This approach—automated detection paired with structured review workflows and formal reporting—scales more effectively than either purely manual review or fully automated decision-making, providing a framework applicable across institutions facing similar challenges in large programming courses.