Grade essays with evidence.

Essay grading is not one problem. It includes source-based writing, literary analysis, argument essays, DBQs, short constructed responses, multilingual learner writing, and tutoring feedback. Each needs different evidence and a defensible rubric path.

Pilot essay grading All use cases

Essay grading is several different jobs.

A generic essay score is not enough. The page, prompt, source set, and instructional context determine what evidence matters and what feedback is useful.

Source-based response

Students must use one or more passages, charts, documents, or evidence sets. The scoring challenge is whether the evidence is selected, interpreted, and connected to the prompt.

Argument essay

The response needs a defensible claim, line of reasoning, counterargument handling, and enough support to make the position credible.

Literary analysis

The response interprets theme, character, structure, language, or author craft. The hard part is grading interpretation depth rather than summary.

Document-based question

Students synthesize multiple documents and outside context. The rubric usually separates thesis, evidence use, sourcing, reasoning, and complexity.

Short constructed response

A few sentences can still require prompt fit, evidence, explanation, and conventions. These items need tight partial-credit rules.

Placement and proficiency writing

The goal is less about a single class rubric and more about level placement, language control, organization, and readiness.

Tutoring drafts

The score matters, but the next action matters more: what the student should revise, practice, or stop repeating.

Multilingual learner writing

Idea quality, source use, vocabulary, syntax, and conventions should be separated so language background does not hide reasoning quality.

What Evalysis grades.

The criteria below are intentionally analytic. A single holistic score is easy to output; the hard part is showing which evidence caused the score and what the student should improve next.

Prompt fit and task completion

Detects whether the response answers the actual task, addresses all required sources or documents, and avoids memorized or off-topic writing.

Claim, thesis, and position

Separates a defensible claim from a vague topic sentence, tracks whether the stance is sustained, and flags contradiction across paragraphs.

Evidence and source use

Checks whether quoted, paraphrased, or observed evidence is relevant, integrated, and enough for the claim being made.

Reasoning and commentary

Grades the bridge between evidence and claim: explanation, warrant, inference quality, counterargument, and depth of analysis.

Organization and development

Evaluates paragraphing, progression, cohesion, transitions, elaboration, and whether the essay builds rather than repeats.

Language control and conventions

Scores grammar, sentence control, vocabulary, register, clarity, and multilingual learner considerations as separate evidence lanes.

Originality, safety, and integrity

Routes suspected copied, template, unsafe, extremely short, or condition-code responses to human review instead of forcing a score.

Feedback usefulness

Produces criterion-level feedback students can act on: missing evidence, weak commentary, unclear thesis, or recurring convention errors.

Anchor the rubric in the room where the essay is scored.

State-style extended constructed response

A reading passage asks students to write an evidence-based response. Evalysis checks claim, evidence selection, explanation, conventions, and condition codes, then reports score distributions and escalation rates.

AP / IB / international-school essays

Rubrics often value thesis, line of reasoning, evidence, sophistication, and commentary. Evalysis uses anchor examples to align severity and provides an item-level evidence report.

Tutoring-school weekly writing

Teachers need fast, formative feedback. The workflow clusters class-wide problems: weak thesis, insufficient evidence, mechanical errors, and repeated structural issues.

Multilingual learner writing

The model separates idea quality from language-control evidence, supports bilingual feedback, and makes fairness review visible when language background is relevant.

Use the same essay workflow at different stakes.

Zero-shot

Use the rubric immediately for formative grading, draft feedback, and quick curriculum pilots.

Human-in-loop

Ask teachers to label a small set of representative essays, align severity, then scale feedback across the rest.

Fine-tuned / on-prem

Use approved anchors, scoring notes, and sample responses for high-stakes programs that need alignment reports and local data control.

Discuss essay data