Grade every kind of student work with decisions you can inspect.

Essays, handwritten proofs, exercise books, diagrams, lab images, speech, and multilingual short answers — scored with rationale, feedback, and deployment controls fit for both tutoring schools and high-stakes exams.

Try now on cloud Request a pilot See scoring room Validation results →

Input modalities

Onboarding modes

100+

Subject families

Multimodal

Text, handwriting, math, diagrams, image, speech, and languages.

A multimodal understanding layer normalizes student work before specialist raters score it.

Deployment

Cloud when speed matters. On-prem when control is non-negotiable.

Zero-shot, human-in-loop, and fine-tuned onboarding paths match the stakes of the exam.

Use cases

Essay grading, handwritten math proofs, and tutoring exercise books.

Dedicated workflows for real classroom and exam scenarios, not generic answer checking.

Open-the-box scoring,with an SME report.

Try the cloud path with a single student response or a batch upload.
The workflow is built around expert review: rubric alignment, subject reasoning, score rationale, feedback, and escalation notes.

Try a sample

Student work

No files attached

Upload or paste

Drop files here

From messy submissions to controlled scoring.

Evalysis is organized around four product decisions: what student work must be handled, which subject rules apply, how scores are challenged, and where the scoring run can legally operate.

Multimodal by default

Student work arrives as essays, handwriting, math notation, diagrams, lab photos, speech, short answers, tables, and mixed-language responses. Evalysis turns that into structured scoring inputs.

Text and handwriting
Math notation and proof structure
Images, diagrams, speech, and languages

Broad subject coverage

Score school, exam, tutoring, and professional subjects with item-specific rubrics and reporting views, without building a custom mini-product for every format.

Humanities and social sciences
STEM and lab work
Languages, arts, business, medicine, law, and vocational tracks

Multi-agent review

Independent raters, critics, adjudicators, calibrators, fairness reviewers, and audit loggers create a defensible scoring chain.

Double-rating and challenge
Confidence-based routing
Replayable decision trail

Deployment for the stakes

Open-the-box cloud for pilots and tutoring schools; private cloud or on-prem for high-stakes exams with strict data residency.

Zero-shot
Human-in-loop
Fine-tuned alignment

Essay grading Handwritten math proofs Tutoring exercise books Deployment modes

Specialist panels for high-stakes scoring.

setup · multimodal intake · blind scoring · adjudication · QA

Setup

Setup agents read the rubric, build the anchor set, and link items to your knowledge graph.

Multimodal

Perception agents decode handwriting, speech, math notation, and lab imagery into a clean canonical form.

Rater Panel

The rater panel scores blindly — two independent personas, plus separate dimensions for conventions and process.

Adjudication

On disagreement, a critic challenges, an adjudicator decides, and a calibrator reports confidence + routing.

QA & Fairness

QA agents inject validity papers mid-batch, monitor drift, audit subgroups, and back-read random samples.

Setup

stage 01

Rubric Author

Item Writer / Rangefinder

Anchor Curator

Anchor Approver

Item Profiler

Content Specialist

Multimodal

stage 02

Handwriting Reader

Diagrams & written work

Speech Reader

Oral responses

Experiment Vision

Lab Observer

Math Normalizer

Symbols & notation

Rater Panel

stage 03

Rater A

Scorer 1 (strict)

Rater B

Scorer 2 (lenient)

Convention Rater

Language & Conventions

Process Rater

Step-by-step Reasoner

Adjudication

stage 04

Critic

Devil's Advocate

Adjudicator

Table Leader

Calibrator

Chief Reader

QA & Fairness

stage 05

Validity Injector

Calibration Master

Drift Detector

Score-Distribution Monitor

Bias Auditor

Fairness Reviewer

Backreader

QC Sampler

Multimodal scoring across what students actually do on a test.

Essay & short answer

writing evidence

Thesis & evidence chain
Rhetorical structure (claim → warrant → backing)
Domain-specific vocabulary recall
Cohesion, register, conventions
Legitimacy / off-task detection

Agents engaged

Rater ARater BConvention RaterCriticAdjudicator

Essay · evidence-based response

rubric · 0–5 ECR

Item #482

The author argues that automated scoring is necessary because human capacity cannot scale with the new constructed-response volume. Two pieces of textual evidence are offered, but the second claim conflates cost with reliability, which the rubric treats as a partial-credit issue.

Development

3 / 3

Conventions

2 / 2

Overall

4 / 5

Built for broad curriculum coverage, not a narrow essay demo.

A school or exam operator may start with one workflow and expand across academic, professional, vocational, language, and oral-response subjects. Below is a concrete sample of the spectrum.

Subject atlas

Coverage by submission type.

The useful question is not whether a subject is on a static list. It is whether the program can handle the work students submit and apply the right rubric, anchors, and review path.

sample labels

families

submission modes

Rubric + modality matrix

Where each family expands

sample pilot configuration

Written

essay · SCR · DBQ

Handwritten

steps · proofs

Visual

diagram · lab photo

Oral

speaking · listening

Structured

table · code · file

Language & writing

argument writingsource synthesisliterary analysis

essays, short response, conventions

scanned short answers

presentation rubrics

Mathematics

algebrageometry proofsstatistics

explanations

worked steps, proofs

graphs, constructions

symbolic checks

Science & lab

physicschemistrybiology labs

CER response

calculations

lab photos, diagrams

tables, graphs

Humanities & social science

history DBQgeographyeconomics

argument, case analysis

paper booklet scans

maps, sources

Professional & vocational

lawmedicineteacher certification

scenario judgment

workplace artifacts

interview exams

forms, tables, logs

Arts, languages & oral exams

translationdebateportfolio critique

reflection, commentary

portfolio evidence

speaking, listening

Example mix

Language & writing

22%

Mathematics

18%

Science & lab

17%

Humanities & social science

15%

Professional & vocational

14%

Arts, languages & oral exams

14%

Concrete subject labels

Language & writing

English compositionargument writingsource-based synthesisliterary analysisreading short responsegrammar and usageESL/EAL writingChinese composition+4 more

Mathematics

arithmeticpre-algebraalgebrageometry proofstrigonometryprecalculuscalculusstatistics+4 more

Science & lab

biologychemistryphysicsearth scienceenvironmental sciencelab notebooksclaim-evidence-reasoningexperimental design+4 more

Humanities & social science

world historyUS historygeographyeconomicscivicspsychologysociologyphilosophy+4 more

Professional & vocational

business writingaccounting explanationslaw hypotheticalsnursing scenariosteacher certificationsafety procedurestechnical writingcoding explanations+4 more

Arts, languages & oral exams

speaking testslistening responsetranslationinterpretationmusic theoryart critiquemedia studiesdrama reflection+4 more

Multi-language scoring and feedback

Rubrics, anchors, examples, and feedback can be localized. The goal is not merely translation; it is alignment to the scoring culture, language background, and classroom context of the program.

EnglishChineseSpanishFrenchJapaneseKoreanArabicGermanPortuguesebilingual feedbacklocalized rubricsregional examples

Fast cloud pilots, controlled on-prem scoring.

Different exams have different stakes. Evalysis supports both open-the-box cloud use and controlled on-prem deployments, with three onboarding modes that decide how much human alignment happens before scoring at scale.

Decision axis 01

Where it runs

Choose the deployment environment first. This determines data custody, integration boundaries, and operational controls.

Open the box

Cloud

Pilots, tutoring schools, formative scoring

Fastest path for pilots, tutoring schools, internal benchmarks, and formative feedback. Start in the Cloud Trial with a mock example or upload student work first, then review the inferred setup before scoring.

Managed isolation

Private cloud / VPC

Districts, institutions, assessment operators

For districts, institutions, and assessment operators that need SSO, role controls, private storage, API integration, and stricter data boundaries.

High-stakes control

On-prem

Sensitive exams and local audit custody

A major option for high-stakes exams. Keep sensitive responses inside your network, run scoring locally, and retain customer-controlled audit artifacts.

Decision axis 02

How it aligns

After the environment is chosen, pick the onboarding path. This is about rubric alignment, teacher input, and confidence thresholds.

No training samples

Zero-shot

Evalysis reads the rubric and grades immediately. Best for quick pilots, low-stakes practice, and formative feedback where speed matters.

Teacher calibration

Human-in-loop

The system selects representative samples for teachers or scoring leaders to label, aligns to those decisions, then grades the rest with escalation for uncertain cases.

Formal alignment

Fine-tuned

For formal alignment. Tune on approved samples and anchors, then receive a comprehensive report with item behavior, agreement, confidence, fairness, and routing thresholds.

Compare deployment modes Open Cloud Trial Discuss on-prem requirements

Get started

Run a pilot on your own scoring data.

Bring a rubric, a sample set, and the deployment constraints that matter. We return sample traces, alignment recommendations, and the reporting shape your scoring team can review before scale.

Request a pilot Download the technical brief

FERPA-aligned · data isolationVPC / on-prem optionAudit-by-replay built in