
FROM SIMULATED USERS TO HYBRID EVALUATION
The growing use of LLM‑based avatars as simulated users offers cost-effective speed and scale when compared to conventional methods. But these approaches have important limitations when used in isolation.
There are cases of purely synthetic users smoothing over variance by under-representing human and contextual variability, adaptation over time, and edge‑case behavior that are critical in real‑world deployments. Today’s AI agents alone might not yet become a robust enough proxy or digital twins, which would still require real interaction telemetry.
Hybrid approaches combining AI agents with human‑in‑the‑loop (HITL) participation and telemetry from real-world living labs can provide more representative foundations for quality audits, design studies, testing, and evaluation by better enabling:
Outcome Based Experience Design
- Universal design enabled through adaptive interactions and personalization for all users
- Dynamic user guidance and assistive design in context, including germane workloads
Evaluation and Validation
- Multivariate A/B stress‑testing assumptions under realistic cognitive load conditions, supplemented by sentiment analysis
- Early detection of emergent behavior, hidden patterns and anomalies over time
Observability, Governability and Lifecycle Assurance
- Identification of failure modes that surface only during prolonged, real‑world use
- Traceability, diagnosis, predictive and preventive product and service maintenance
- Live front‑ and back‑stage service blueprints instrumented with analytics and coupled with value stream mapping (VSM) to explicitly correlate underlying operational processes with business logic and decision drivers.
Knowledge graphs are needed to address dynamic system modeling (DSM) in context (entities, relationships, constraints, policies, dependencies and provenance) to support testability and observability, impact as well as predicting and understanding network effects.

BEYOND JUST LANGUAGE: MORE HUMAN-LIKE MULTIMODAL
Conversational AI is no longer limited to language‑based models, because words alone do not fully capture human sensing and sense‑making. Intelligence increasingly encompasses multimodal, embodied, and spatial interaction, combining speech, visual representations, gesture, motion awareness, haptics, biosignals, and environmental context across both digital and physical environments.
Equipping AI with more human‑like capabilities is not about imitation for its own sake. It is about enabling systems to:
- Interpret more effectively
- Coordinate action across modalities
- Operate at human tempo rather than machine tempo
Human Scale AI, in this sense, is about humanizing technology, not by making it feel uncanny, but by making it beneficial to the humans it is designed to respectfully serve.

CLOSING THOUGHTS
AI decisions now carry both business and human consequences. The critical leadership move is to define “ready for the real world” in measurable and operational terms, so AI systems scale in service to people rather than away from them, eroding trust and customer relationships.
This set of three posts builds on my earlier work on AI failure modes and singularity levels, extending the series from testing to evaluation and real‑world readiness, including the shift beyond language‑only systems toward multimodal interaction.
The core message is practical: there is plenty of risks observed in AI‑first deployments that are preventable when addressed as systems‑engineering challenges with explicit human‑factors requirements, rather than being misattributed downstream to human error, employee performance, or training gaps… often when effective resolution is already costly or too late to make a difference.
Human‑Centered AI is therefore not only essential for risk reduction, but also a driver of product and service quality and a sustainable source of competitive advantage. Offerings that users value and work well for them are easier to accept, adopt, and successfully roll out and evolve over time.
Achieving this does not depend on addressing only the most critical scenarios, but on treating Human‑Centered AI as a mainstream practice for delivering genuinely beneficial AI and shaping shared standards of quality and design culture in the process.

SERIES
- Barcelona’s CHI 2026 | AI Singularity Levels
- Barcelona’s CHI 2026 (2) | Testable AI: Use Case
- Barcelona’s CHI 2026 (3) | Hybrid Evaluation, Multimodal
REFERENCES
PICTURES
SOCIAL MEDIA COPY | LinkedIn
Leave a comment