The infrastructure behind serious AI scoring.

Operators and technical buyers need to know how submissions enter, how rubrics and anchors shape scoring, what the APIs return, and how cloud, private cloud, and on-prem deployments differ.

Compare deployment Try cloud scoring

What the platform actually does.

Evalysis turns messy student work into reviewable scoring data, feedback, analytics, and audit records that teams can operate at classroom, district, or exam-program scale.

Submission intake

Accept pasted responses, PDFs, scans, photos, worksheets, diagrams, speech files, and exported answer data. Normalize everything into a response package before scoring.

Scoring workspace

Rubric, anchors, calibration samples, response viewer, criterion-level scores, confidence, feedback, and human review queues in one operations surface.

Agent orchestration

Specialist raters, critics, adjudicators, calibrators, fairness reviewers, and audit loggers are assembled according to item type and risk level.

Reporting layer

Produce per-student feedback, class summaries, item analytics, confidence bins, score distributions, subgroup checks, and replayable audit artifacts.

Built for reviewers, scoring leaders, and technical teams.

The workspace should feel closer to an assessment operations console than a chatbot. Teachers and SMEs see the response and feedback; program teams see routing, item behavior, and quality control.

Workspace model

Response, rubric, decision, report.

Every workflow is organized around four objects: the original student response, the criteria being applied, the scored decision, and the reporting package that makes the decision usable.

Response viewer with original submission beside parsed response

Rubric criteria, score scale, anchors, and scoring notes

Panel votes, disagreements, confidence, and escalation reason

Student-facing feedback and teacher-facing misconception notes

Batch dashboard for item behavior, outliers, and routing

Export package for LMS, SIS, research, and technical reporting

Use the UI, the API, or both.

Some customers want a hosted reviewer console. Others want Evalysis embedded inside an LMS, exam platform, or internal scoring pipeline. The platform supports both while preserving the scoring contract.

Try Now API

The cloud trial follows the production contract: upload student work, pass rubric context, receive a structured SME scoring report.

Batch scoring API

Submit a class set or exam batch, track job status, retrieve scores, feedback, routing decisions, and audit IDs.

Reporting API

Pull score distributions, item stats, confidence bands, human-review queues, and trace artifacts into your own systems.

Response contract

{
  "score": "4 / 5",
  "confidence": "Medium-high",
  "criteria": [{ "name": "Reasoning", "score": "3 / 4" }],
  "feedback": "Student-facing feedback",
  "escalation": ["Review source use"],
  "audit_id": "trace_..."
}

Cloud for speed, on-prem for control.

Deployment is a product decision, not a footnote. High-stakes exams often need stronger data boundaries, local audit custody, and program-controlled operations.

Fastest pilot

Cloud

For tutoring schools, internal benchmarks, curriculum teams, and quick pilots. Upload rubrics and responses, then use the hosted workspace and APIs.

Managed isolation

Private cloud / VPC

For institutions that need SSO, role controls, private storage, network boundaries, and controlled integration with existing data systems.

High-stakes control

On-prem

A major option for sensitive exams. Keep responses, rubrics, anchor sets, logs, and scoring outputs inside the customer-controlled environment.

Zero-shot

Use the rubric directly and grade immediately. Best for formative use, pilots, and lower-stakes feedback.

Human-in-loop

Select representative samples for teachers or scoring leaders to label, align the panel, then score the rest with confidence routing.

Fine-tuned

Tune to approved anchors and program-specific criteria, then deliver a full alignment and technical report.

Every score should be inspectable.

The trace is the connective tissue between product and validation: it shows what source material was reviewed, which agents disagreed, why the final score landed where it did, and what should be reviewed by a human.

Sample

beat 0 / 7

Response intake · essay

Essay · evidence-based response (Grade 9 RLA)

Prompt: 'Drawing on the passage, explain whether the narrator's choice was justified.'

rubric · Development 0–3 · Conventions 0–2 · Total 0–5

input ›The narrator's choice is justified because the passage shows she had limited information and the consequences for waiting were greater than for acting. First, the author writes that 'no signal had come for three days,' which suggests action was overdue. Second, her hesitation in the earlier scene cost the village a harvest, so she had learned to act. Some readers might argue she was impulsive, but the text shows she weighed the options carefully.

Panel state

Scoring Director

Intake

Multimodal Intake

Perception

Rater A · strict

Rater Panel

Rater B · lenient

Rater Panel

Critic

Adjudicator

Calibrator

Debate transcript · streaming

Press play to watch the panel reach a verdict

The trace below streams live as agents fire.

See validation Discuss integration