Validation progress · Midyear-2026

EvoMind is emerging from benchmark evidence to reusable cognitive machinery.

The newest validation work shows a governed cognitive runtime operational on the local system level: desktop perception, memory, planning, browser operation, Word execution, MiniWoB++ diligence, evidence capture, and operator promotion are now converging into reusable autonomy primitives.

View accomplishments → Read Zenodo release

What is EvoMind AGI: EvoMind is presented as a governed cognitive runtime with strong internal validation evidence of generality. It is not presented as externally certified AGI.

What changed

New validation accomplishments

Capabilities that can be explained, demonstrated, reproduced, & generalized.

Desktop autonomy moved beyond text response.

EvoMind has demonstrated controlled operation across Windows applications, including Notepad, Calculator, File Explorer, Microsoft Word, browser workflows, and readback-verified document actions.

Desktop controlUIAReadback

Browser-to-Word workflow closure.

Recent validation closed a browser-first research-to-Word lane with guarded prompt execution, source handling, downstream save/readback repair, and workflow completion under strict sequence constraints.

Cross-appResearchWord execution

Memory became operational, not decorative.

The short-horizon memory fast path now supports direct remember-and-recall behavior without routing through heavy cognition, making conversational state usable inside the runtime.

MemoryRecallRuntime fast path

MiniWoB++ diligence produced reusable operators.

Benchmark work has been converted into generalizable EvoMind primitives: row discovery, target binding, disclosure handling, date selection, fallback cascades, click verification, and no-progress recovery.

Operator promotionMiniWoB++Generalization

Evidence fabric connects observation to learning.

The runtime gained a governed evidence layer for raw observations, extracted paths, recurrence detection, adaptive compression, and planner feedback. This is the substrate for durable learning instead of one-off traces.

EvidenceRecurrenceLearning substrate

Governance stayed part of the execution path.

Validation has continued to separate capability from authority: bounded action, operator constraints, provenance, global gating, and claim boundaries remain central to how EvoMind acts.

GovernanceProvenanceBounded action

No raw percentage headlineThe focus is validated capability classes and differentiators.

No AGI certification claimArchitecture and validation are presented with disciplined claim boundaries.

No partner inflationTechnical progress and Milestone updates.

No internal leakageSensitive IP is protected

EvoMind’s recent progress is not just new emergent or governed capability. It is a pattern: perception, action, memory, evidence, recovery, and governance becoming a coherent runtime.

Why it matters

Benchmarks became engineering assets.

Many systems treat benchmark runs as static proof. EvoMind’s validation loop is different: successful behavior is extracted, named, tested, governed, and promoted into reusable operators that can support broader workflows.

That shift matters because the product is not a prompt. The product is the governed cognitive runtime that can preserve state, inspect environments, choose tools, verify action, recover from failure, and produce evidence.

Capability ledger

What the latest validation work supports.

Each row states a capability, the validation signal behind it, and the practical differentiator for SALT19/EvoMind.

Desktop perception and control Validated interaction with live desktop windows, application state, UI controls, browser surfaces, and document workflows. Operational

Readback verification Actions are not treated as complete merely because they were attempted; output can be checked through readback and state confirmation. Validated

Cross-application workflows Browser-first research, document creation, save behavior, and downstream verification have been exercised as connected workflows. Demonstrated

Short-horizon memory Direct remember-and-recall behavior has been separated from heavy planning so memory can act as a practical runtime primitive. Operational

Benchmark-to-operator promotion MiniWoB++ discoveries are being generalized into reusable N1 operators instead of remaining trapped inside benchmark-specific scripts. Differentiator

Failure recovery Recent work added no-progress detection, browser session reuse, episode persistence, fallback cascades, and commitment verification. Hardened

Evidence fabric Observation packets, raw evidence storage, path extraction, recurrence indexing, adaptive compression, and planner feedback create a learning-grade evidence layer. Built

Governed execution Capability is bounded by safety gates, execution authority, provenance requirements, verification expectations, and rollback policy. Governed

Embodied control lane Arcade and visual-control validation helped exercise perception-action loops beyond static text generation. Explored

Public research artifact EvoMind v0.1.0 is anchored by a Zenodo DOI and architecture white paper for public reference and prior-art visibility. Published

Architecture after validation

The runtime is becoming layered, auditable, and reusable.

The newest work strengthens EvoMind as a cognitive operating layer, not a thin wrapper around a model.

Perception cortex

Scene graphs, screen interpretation, screenshot grounding, UIA/DOM priority, and controlled OCR fallback.

Memory systems

Short-horizon recall, episodic state, persistent context, and workflow continuity.

Evidence fabric

Observation packets, raw evidence, recurrence indexing, path extraction, and planner feedback.

EvoMind RuntimeGoverned cognition: observe, reason, plan, act, verify, learn, and preserve evidence.

N1 operators

Promoted cognitive operators from desktop, browser, document, and MiniWoB++ validation lanes.

Governance layer

Omega invariants, global gates, claim boundaries, execution authority, rollback, and audit trails.

Action substrates

Desktop apps, browser sessions, documents, controlled tools, and sandboxed execution pathways.

2026 - Midyear update · engineering narrative

From working behaviors to cognitive primitives.

This timeline frames the recent accomplishment stack without exposing internal IP.

Public reference.

EvoMind v0.1.0 and the architecture white paper established a public research artifact and DOI-backed reference point for the governed cognitive runtime.

Desktop proof lanes matured.

Core app control, readback, browser navigation, Word execution, and live conversation repairs moved the system toward visible, operator-trustworthy desktop autonomy.

MiniWoB++ became a private diligence engine.

Benchmark runs exposed brittle areas, validated targeted repairs, and produced reusable primitives for email rows, forms, date picking, target matching, fallback behavior, and click verification.

N1 generalization tightened.

Validated families are being promoted beyond the benchmark harness into a broader operator registry, supporting the doctrine that MiniWoB closure is proof evidence, not the final product boundary.

Evidence and learning infrastructure expanded.

The evidence fabric, observer swarms, world model hooks, and planner feedback pathways strengthened the loop from observation to adaptation.

Governance remained non-optional.

Recent work preserved the core thesis: an autonomous runtime must be capable, bounded, inspectable, and reversible.

Claim boundary

What this page claims.

EvoMind is an R&D-stage governed cognitive runtime with recent validation progress across desktop perception, browser workflow execution, short-horizon memory, governed tool use, trace capture, reusable operator promotion, and benchmark-driven workflow repair.

The recent work shows EvoMind moving from isolated demonstrations toward a reusable autonomy substrate: tasks are observed, decomposed, executed, checked, traced, and converted into governed operators that can be reused outside the original validation harness.

The strongest defensible public claim is not that EvoMind has solved autonomy. It is that SALT19 is building a disciplined autonomy stack where every capability is expected to produce evidence, respect execution boundaries, survive workflow variation, and improve the system’s future operating surface.

Validation posture

From task success to governed cognitive infrastructure.

EvoMind’s recent validation work shows a shift from isolated capability demos into an integrated autonomy stack. The system has demonstrated desktop perception, bounded interface control, browser workflow execution, short-horizon memory, tool-mediated reasoning, trace capture, workflow recovery, and operator promotion across increasingly complex task families.

The breakthrough is not a single benchmark number. It is the operating pattern now emerging inside the system: observe a task, decompose it, act through a governed interface, verify the result, preserve evidence, repair the failure mode, and promote successful behavior into a reusable cognitive primitive.

That turns validation into compounding infrastructure. Every closed workflow can become more than a test result; it can become a reusable operator that expands EvoMind’s future execution surface across desktop, browser, document, memory, and planning workflows.

SALT19 is building EvoMind for autonomy that can be inspected, constrained, recovered, audited, and improved. The public claim is therefore evidence-led: shipped architecture, DOI-backed publication, desktop execution proofs, memory validation, benchmark diligence, operator registry expansion, and a growing path from benchmark closure to real-world workflow generalization.

SALT19 EvoMind

From benchmark evidence to reusable governed autonomy.

EvoMind’s validation work is focused on a practical question: can autonomous software perceive a real interface, plan a bounded action sequence, execute through tools, preserve evidence, recover from variation, and turn successful workflows into reusable cognitive operators?

SALT19 can provide technical reviewers with a live demo, architecture walkthrough, validation summary, operator-promotion overview, and DOI-backed EvoMind v0.1.0 reference release.

Request technical deep-dive → Open DOI