Programming education continues to struggle with a gap between cohort scale and feedback quality. Manual code review is slow and inconsistent in large classes, while traditional automated tools emphasize compliance and defect detection rather than pedagogy. Recent Artificial Intelligence (AI) systems offer promise, but classroom-ready pipelines with transparent evaluation on accuracy, latency, and cost remain scarce.
We design and evaluate a classroom-oriented feedback pipeline for practical web programming in a .NET environment. The system integrates: (i) a large language model for explanations and improvement strategies, (ii) a static analyzer for precise rule-based findings, and (iii) retrieval-augmented access to curated course materials for grounding. The study uses 200 anonymized C# submissions collected across three consecutive semesters (Fall 2024, Spring 2025, Summer 2025). A fixed prompt template and a rubric aligned with formative assessment guide outputs. We compare the pipeline against conventional tools and instructor workflows, and we run ablations to assess the effect of retrieval and static-analysis signals. Primary endpoints are the technical accuracy of findings, the usefulness and specificity of comments, mean end-to-end latency, and estimated cost per submission.
The integrated pipeline achieved 89.08% technical accuracy on real student submissions, with a mean end-to-end latency of 54.6 seconds and an estimated cost of USD 0.0203 per submission. Retrieval increased concept linkage and assignment awareness relative to a Large Language Model (LLM) baseline, while static-analysis signals reduced unsupported claims and tightened alignment with rubric criteria. Against conventional tools, the pipeline produced more specific, concept-linked guidance rather than checklist-style defect flags, and it met practical turnaround targets suitable for routine classroom use. Coupling a large language model with static analysis and retrieval over course materials delivers timely, specific, and learner-centered feedback at a classroom scale while maintaining low operational cost. The article contributes an end-to-end, reproducible protocol and a documented design for prompts, grounding, and safety rules. Limitations include a single-institution .NET focus and the absence of long-term learning measures. Future work will examine generalization to other languages and course types, governance of the knowledge base, and impacts on equity and durable learning.
If you have any questions about submitting your review, please email us at [email protected].