Barcelona’s CHI 2026 (3) | HC AI Evaluation

FROM SIMULATED USERS TO HYBRID EVALUATION

The growing use of LLM‑based avatars as simulated users offers cost-effective speed and scale when compared to conventional methods. But these approaches have important limitations when used in isolation.

There are cases of purely synthetic users smoothing over variance by under-representing human and contextual variability, adaptation over time, and edge‑case behavior that are critical in real‑world deployments. Today’s AI agents alone might not yet become a robust enough proxy or digital twins, which would still require real interaction telemetry.

Hybrid approaches combining AI agents with human‑in‑the‑loop (HITL) participation and telemetry from real-world living labs can provide more representative foundations for quality audits, design studies, testing, and evaluation by better enabling:

Outcome Based Experience Design

Universal design enabled through adaptive interactions and personalization for all users
Dynamic user guidance and assistive design in context, including germane workloads

Evaluation and Validation

Multivariate A/B stress‑testing assumptions under realistic cognitive load conditions, supplemented by sentiment analysis
Early detection of emergent behavior, hidden patterns and anomalies over time

Observability, Governability and Lifecycle Assurance

Identification of failure modes that surface only during prolonged, real‑world use
Traceability, diagnosis, predictive and preventive product and service maintenance
Live front‑ and back‑stage service blueprints instrumented with analytics and coupled with value stream mapping (VSM) to explicitly correlate underlying operational processes with business logic and decision drivers.

Knowledge graphs are needed to address dynamic system modeling (DSM) in context (entities, relationships, constraints, policies, dependencies and provenance) to support testability and observability, impact as well as predicting and understanding network effects.

BEYOND JUST LANGUAGE: MORE HUMAN-LIKE MULTIMODAL

Conversational AI is no longer limited to language‑based models, because words alone do not fully capture human sensing and sense‑making. Intelligence increasingly encompasses multimodal, embodied, and spatial interaction, combining speech, visual representations, gesture, motion awareness, haptics, biosignals, and environmental context across both digital and physical environments.

Equipping AI with more human‑like capabilities is not about imitation for its own sake. It is about enabling systems to:

Interpret more effectively
Coordinate action across modalities
Operate at human tempo rather than machine tempo

Human Scale AI, in this sense, is about humanizing technology, not by making it feel uncanny, but by making it beneficial to the humans it is designed to respectfully serve.

CLOSING THOUGHTS

AI decisions now carry both business and human consequences. The critical leadership move is to define “ready for the real world” in measurable and operational terms, so AI systems scale in service to people rather than away from them, eroding trust and customer relationships.

This set of three posts builds on my earlier work on AI failure modes and singularity levels, extending the series from testing to evaluation and real‑world readiness, including the shift beyond language‑only systems toward multimodal interaction.

The core message is practical: there is plenty of risks observed in AI‑first deployments that are preventable when addressed as systems‑engineering challenges with explicit human‑factors requirements, rather than being misattributed downstream to human error, employee performance, or training gaps… often when effective resolution is already costly or too late to make a difference.

Human‑Centered AI is therefore not only essential for risk reduction, but also a driver of product and service quality and a sustainable source of competitive advantage. Offerings that users value and work well for them are easier to accept, adopt, and successfully roll out and evolve over time.

Achieving this does not depend on addressing only the most critical scenarios, but on treating Human‑Centered AI as a mainstream practice for delivering genuinely beneficial AI and shaping shared standards of quality and design culture in the process.

SERIES

REFERENCES

PICTURES

https://www.youtube.com/watch?v=ajacBIanxOw

SOCIAL MEDIA COPY | LinkedIn

JdF BLOG

Barcelona’s CHI 2026 (3) | HC AI Evaluation

Leave a comment Cancel reply

Barcelona’s CHI 2026 (3) | HC AI Evaluation

Share this:

Leave a comment Cancel reply