How AI Agents Work

The Foundation

What is an agent, exactly?

An agent is not a chatbot. A chatbot completes one turn and waits. An agent runs a loop — continuously — until a task is done or a human steps in.

Observe

Reads the current state — user input, tool outputs, memory, context

Decide

Chooses the next action — call a tool, ask a question, or finish

Act

Executes — calls an API, reads a database, drafts an email, writes a record

Repeat

Feeds the result back in and decides the next step. Goes again.

Before building anything, three questions must be answered: What does the agent need to know? What tools does it have access to? And where does the human stay in the loop? Get those three right and you can automate almost anything responsibly.

Complexity Ladder

Four levels of agent capability

Most businesses are at Level 1 without realising it. The leverage — and the competitive moat — is at Levels 3 and 4.

Level 1

Structured Prompt

A single call to an AI model with a carefully engineered system prompt. One input, one output. No loop, no tools. This is what most "AI tools" actually are.

e.g. Generate a meeting summary. Draft a proposal. Classify a support ticket.

Level 2 — Real Agent

Tool-Use Loop

The model is given tools it can call — read a database, send a message, query an API. It decides which tool to use, gets the result, and decides the next step. This is the agent pattern.

e.g. Find all uncontacted leads this week, look up each one in the CRM, and draft personalised follow-up emails.

Level 3 — Orchestration

Orchestrator + Specialists

One orchestrator agent breaks a complex task into sub-tasks and delegates to specialist agents — each with a narrow scope, purpose-built prompt, and specific tool access. Results are synthesised back into a final output.

e.g. Score every inbound sales call: transcription agent scoring agent CRM-writer agent.

Level 4 — Production Fleet

Memory · Gates · Audit

Adds persistent memory (the agent knows your business across sessions), human-in-the-loop gates for irreversible actions, an append-only audit trail, and evaluation pipelines to catch drift.

e.g. The full CAO stack: every function automated, humans reviewing exceptions, board-level reporting generated automatically.

The services, as agent systems

Every live demo, mapped to the architecture

Each product below is the same pattern at a different level — deterministic code owns the facts, the model owns the judgement and language, a human owns every irreversible action. Here's what each one actually runs.

SnapCheck

Vision agent · Level 2

Property condition reporting. Computer vision grades each photo, writes the description, assigns severity and a trade.

PhotoVision agent gradesStructured defectAgent RFQ

Auto: grade + describeIn the showcase

ISO 11226 · clinical

MoveLens · HomeTask Ergo

Pose + deterministic · Level 2

Clinical movement analysis. Pose estimation extracts 33 landmarks; Python computes every clinical number; the AI only validates keyframes — never invents a figure.

Video33 landmarksPython metricsClinician signs off

Assist: clinician reviewsLive demo

️

Kesher OS — Legal

Orchestrator + 5 specialists · Level 4

A family-law & mediation back office. A coordinator safety-screens every message before the model, then routes to intake, conflict & safety, drafting, scheduling or billing.

EnquirySafety gate (code)SpecialistPractitioner decides

Critical AI bypassed5-diagram case study

Property Agent

3-agent fleet · Level 3

A property-management desk for Gold Coast & Bondi portfolios. Arrears, maintenance and renewals agents each draft work and queue it for one-click approval.

Portfolio data3 specialists draftPM approves send

Nothing sends unapprovedRun the agents

BiffCoin Desk

Trading orchestrator · Level 3

Five specialist analysts (trend, range, breakout, accumulation, risk) vote on the market regime. The orchestrator decides in deterministic Python; advisory only.

Market snapshot5 specialists voteAny live trade gated

Fails closed on bad dataIn the showcase

LiftAI'd

Pose coaching · Level 2

The gym-facing sibling of MoveLens. Counts reps, measures joint angles, scores form against reference patterns — the numbers are computed, the AI only coaches.

Lift videoPose + anglesForm scoreCoaching note

Auto: rep + angle metricsSee LiftAI'd

Diagnostic Tool

Single structured agent · Level 1

The simplest level: one well-structured call. Paste a business description; get the top 3 automation opportunities and a 90-day roadmap. The foundation everything else builds on.

Business descriptionStructured promptDiagnostic report

Operator reviews outputRun a diagnostic

Real Application

Inside a $50M business — four processes, rebuilt AI-first

On a $50M business, SG&A is typically $10–15M — much of it labour in repeatable functions. These four workflows are where the hours bleed.

Score every inbound sales call

Recording lands in storage. Transcription agent converts speech to text. Scoring agent rates the call against the firm's qualification criteria (budget, timeline, authority, need). CRM-writer agent logs the score and surfaces follow-up actions. Escalation email drafted — human approves before send.

Auto: transcribe + score + log Gate: send escalation email

8–12 hrs/wksaved per sales team

Reconcile vendor invoices

Invoice arrives by email. PDF-parser agent extracts line items. Matcher agent cross-references against purchase orders in the accounting system. Anomaly-detector agent flags mismatches and exceptions. Matched invoices auto-approved for payment; unmatched invoices routed to finance with a summary of the discrepancy.

Auto: extract + match + approve clean invoices Gate: flag anomalies for human review

15–20 hrs/wksaved in finance

Triage and respond to support tickets

New ticket arrives. Classifier agent assigns priority and category. For tier-1 queries (FAQ-level), a responder agent drafts and sends the reply immediately. For tier-2, a draft is prepared and queued for human review before sending. Routing agent assigns unresolved tickets to the right team member with context pre-populated.

Auto: classify + tier-1 responses Gate: tier-2 drafts need approval

12–16 hrs/wksaved in support

Draft the monthly board update

Scheduled cron fires on the 1st of each month. Data-collector agent pulls KPIs from finance, sales, and ops systems. Analyst agent compares against targets and prior period, flags variances above threshold. Writer agent structures the board pack with commentary for each section. CEO receives a draft for review — edits and approves before distribution.

Assist: draft + variance commentary Gate: CEO approves before send

2 days/mosaved for leadership

All savings are indicative estimates based on typical SME staffing ratios. Actual impact depends on process complexity, data quality, and implementation approach.

System Architecture

How the pieces connect

Every production agent deployment follows this structure — from the trigger that starts the job to the audit trail that proves it was done right.

TRIGGER

Webhook · Scheduled cron · User action · Email arrival
Returns a job ID immediately. Agent processes asynchronously.

↓

ORCHESTRATOR AGENT

Reads memory breaks task into sub-tasks delegates to specialists synthesises final output

↓

Specialist A

Narrow scope. Focused system prompt. Specific tools only.

Specialist B

Narrow scope. Focused system prompt. Specific tools only.

Specialist C

Narrow scope. Focused system prompt. Specific tools only.

Human Gate

HIGH-risk actions pause here. Human approves or rejects.

↓

Memory Store

Business context persists across sessions. Agent reads before acting, writes after.

Audit Log

Append-only. Every action logged: agent, tool, input, output, timestamp.

Guardrails

The 70% Principle — what AI must never own

AI should handle the 70% that is repetitive, rules-heavy, and judgement-light. The remaining 30% requires human accountability — not because the AI can't produce an output, but because the consequences of being wrong require a human to own them.

AI owns this (the 70%)

Data extraction and entry — parsing invoices, pulling CRM records, populating fields

Classification and scoring — leads, tickets, calls, documents

First-draft communication — emails, reports, proposals (human reviews before send)

Monitoring and alerting — watching for anomalies, flagging exceptions

Routine approvals — matched invoices, standard leave requests, tier-1 support

Humans must own this (the 30%)

Strategic direction — where the business is going and why

Key relationships — trust, empathy, negotiation, conflict resolution

Legal and compliance sign-off — final approval on contracts, regulatory filings

Performance management — hiring, firing, performance conversations

Crisis judgment — decisions under pressure with incomplete information

Getting Started

The 90-day playbook

Land one workflow. Prove the number. Build trust. Then scale. The technology is 30% of the job — rebuilding the workflow and getting people to use it is 70%.

Weeks 1–4

Diagnose & Quick Win

Map every manual, repetitive workflow
Baseline each: volume, time, error rate, cost
Pick one high-leverage, low-risk process
Deploy Level 1: structured prompt, first measurable result

Months 2–3

Foundation

Add tool connections: CRM, accounting, comms
Wire the tool-use loop (Level 2)
Implement audit logging from day one
Prove the ROI number — present with evidence

Months 3–6

Scale

Split into orchestrator + specialists (Level 3)
Add persistent memory across sessions
Human-in-the-loop gates for all HIGH-risk actions
Expand to second and third function

From chatbot to agent fleet — how AI actually works inside a business

What is an agent, exactly?

Four levels of agent capability

Structured Prompt

Tool-Use Loop

Orchestrator + Specialists

Memory · Gates · Audit

Every live demo, mapped to the architecture

SnapCheck

MoveLens · HomeTask Ergo

Kesher OS — Legal

Property Agent

BiffCoin Desk

LiftAI'd

Diagnostic Tool

Inside a $50M business — four processes, rebuilt AI-first

Score every inbound sales call

Reconcile vendor invoices

Triage and respond to support tickets

Draft the monthly board update

How the pieces connect

TRIGGER

ORCHESTRATOR AGENT

Specialist A

Specialist B

Specialist C

Human Gate

Memory Store

Audit Log

The 70% Principle — what AI must never own

AI owns this (the 70%)

Humans must own this (the 30%)

The 90-day playbook

Diagnose & Quick Win

Foundation

Scale

See where your business should start