Physicist · AI Developer · Data Scientist/ML

Eduardo Rdgz-Á

Models Compete On Benchmarks. Products Compete In Production.

I do harness engineering — the infrastructure that operates the model. Every output traceable. Every cost audited. Accountability stays human.

// Mexico City

Portrait of Eduardo Rodríguez Ávila

open to work

soft skills

  • Persuasion & Influence I communicate in a way that connects technical ideas with human decisions.
  • Relationship-Building I listen first to build the trust that projects need.
  • Sensemaking I spot patterns in complexity to guide team decisions.
  • Abductive Reasoning I generate plausible hypotheses where the data isn't yet enough.

01 / About

What's lived beats what's designed.

The dominant AI discourse in 2026 is still centered on the model: how big, how well benchmarked, how fast. But the AI products that deliver real value share something that never shows up in benchmarks: behind them is someone who knows the problem because they lived it, not because they researched it.

That conviction organizes my work. I come from three places — physics, eight years in the classroom, and building and operating an AI-native platform in production — and all three tell me the same thing: the sustainable advantage lies in the distance between whoever designs the solution and whoever suffered the problem. The shorter that distance, the better the product.

What this means in practice:

  • An unoperated model is not a product. Governance doesn't live in a document: it lives at runtime — in the log, in the queue, in the versioned prompt, in who signs off on every output.
  • Most of what we call “educational problems” is bureaucracy stealing time from people trying to do their jobs. I learned it in the classroom; I confirmed it building ILC, where a user told me: “We have global coherence across all 250 pages of lesson planning.” The system works because it solves a problem I lived, not one I imagined.
  • The harness is human. Sustaining long chains of thought for hours — where the real breakthroughs appear — is human work. Responsibility for what the system generates is not delegated to the algorithm.

I'm drawn to teams building real DS/ML products with AI in production — with accountability anchored in a person, not in the algorithm.

Full resume →

02 / How I work

Four layers of practice. Four categories of responsibility.

// instances:

Four layers of responsibility when building with AI: 1. Experience: Immersion, Users, Constraints, Real metrics. 2. Harness: Context, Control, Orchestration, Validation. 3. Runtime: Observability, Costs, Traceability, Resilience. 4. Stewardship: Locus, Trade-offs, Audit, Incidents.

ExperienceThe context of the problemHarnessThe system's infrastructureRuntimeThe system's life in productionStewardshipHuman responsibility over the outputImmersionUsersConstraintsRealmetricsContextControlOrchestrationValidationObservabilityCostsTraceabilityResilienceLocusTrade-offsAuditIncidents
// where the AI Index 2026 classifies by principle, this network classifies by place.

Experience — Immersion

Being inside the problem before modeling it. The difference between who lived it and who researched it.

Experience — Users

Concrete people with concrete tasks. The metric that counts is what changes in their day.

Experience — Constraints

The real ones, not the comfortable ones. Time, regulation, cost, institutional friction.

Experience — Real metrics

Outcome over accuracy. What counts as success in the world, not in the paper.

Harness — Context

What the model needs to know to answer well. Retrieval, memory, embeddings.

Harness — Control

The model's invocation surface. Versioned prompts, inheritance, snapshots.

Harness — Orchestration

How calls are coordinated. Queues, priorities, dependencies.

Harness — Validation

The human in the loop. What is kept, what is discarded, what is iterated.

Runtime — Observability

What the operator sees. Logs, traces, dashboards.

Runtime — Costs

What each output costs. Audited per API, optimized at runtime.

Runtime — Traceability

From every output to its origin. Which prompt, which context, which version.

Runtime — Resilience

What happens when something fails. Fallback, safe degradation, recovery.

Stewardship — Locus

The human who owns the output. Not the algorithm, not the provider.

Stewardship — Trade-offs

What is sacrificed and in exchange for what. Cost, latency, risk: decisions the human owns, not the model.

Stewardship — Audit

What gets recorded to be reviewed. An accessible audit trail.

Stewardship — Incidents

What happens when the output causes harm. Who answers, how it's fixed, what is learned.

03 / Projects

// Project write-ups are in Spanish.

Harness Engineering · Agentic-Native · EdTech

private repo · IP protected

ILC-HUB

Integrated Learning Core

A wrapper solves the API. A harness solves the problem. ILC-HUB is an AI-native EdTech platform built from a problem lived through eight years in the classroom: lesson planning that takes two to three weeks per subject and ends with four teachers teaching the same concept four different ways, with no shared system. The harness cuts that work to thirty or forty minutes and restores global coherence to two hundred fifty pages of planning. In operation with real users since early 2026, with responsibility for every output anchored in a person, not the algorithm.

Versions

  1. 09/2024

    v1

    The seed: first assisted-generation tests, still a prototype

  2. 01/2026

    v2

    First harness core: from prototype to a system with its own interface

  3. 04/2026

    v3

    In production and at scale: multi-subject, every output traceable, −30% costs

  4. in progress

    v4

    Next leap: evolving toward an agentic architecture

30–40 min

per subject (was 2–3 weeks)

−30%

in AI costs

250 pp.

of coherent planning

2+ yrs

in real production

  • Python
  • FastAPI
  • OpenAI API
  • MongoDB
  • React
  • TypeScript
  • Docker
  • Railway

Machine Learning · MLOps · Fintech

public repo

Fraud Detection scoring de fraude en tiempo real

Saying you know it is a résumé. Showing the code is evidence. Fraud Detection is a real-time bank-fraud scoring system, built as a public project to demonstrate the end-to-end cycle outside EdTech. Trained on a benchmark academic dataset: one million real transactions with the difficulty the literature recognizes as most demanding — a ninety-to-one imbalance between legitimate and fraudulent transactions. Every prediction comes with its explanation; served in production, with an open repository.

  • Python
  • FastAPI
  • Docker
  • Railway
  • XGBoost
  • Astro

Data Science · EDA · Educación

repo in progress

MIT GTL ChiMIT · Sleep Clinic

Colaboración con el MIT Global Teaching Labs: co-diseño y liderazgo del Code Development Work Cell. EDA sobre ~180k registros de sueño con 25 estudiantes, guiado con notebooks parametrizados que explican el código línea a línea.

  • Python
  • pandas
  • Jupyter
  • matplotlib

04 / Contact

Looking for someone who builds AI products end to end? Get in touch.

Mexico City

+ More fields · company, country, phone, reason