Chat Plus SaaS Is No Longer Enough. What an AI-First Operator Substrate Actually Looks Like.

Faraz Rizvi · 14 May 2026 · Piece 3 of 4 · Markdown

Faraz Rizvi is a UK operator-practitioner writing about the work between a research breakthrough and a fundable company. He runs SpinUp Forge.

Most UK academic spinouts in their first 18 months are running on chat access and SaaS subscriptions. A browser tab open to a large language model. Notion for the things that want to be documents but are not quite yet. Drive for the deck, the model, the data room. Excel for the financial model. Slack and email for everything in flight. I have seen this configuration in enough founder conversations, and recognised it from enough of my own early-stage experience, to know what it looks like from the inside. It looks functional. It probably was functional, in 2024. In 2026, it is a starting point, not a substrate. And the gap between those two things is now wide enough to cost a spinout its next round.

What the first 18 months actually kills

None of the failures that end spinouts in this window are science problems; every one of them is operational.

There is a specific cluster of failures that ends spinouts in the period between licence execution and institutional Series A, and none of them are science problems. I want to name them plainly, because most founders I speak to will recognise at least three without prompting.

Board pack inconsistency before the first institutional cheque is the first. A Series A-leading fund expects consistent KPI definitions across four quarters, a cash bridge that ties to the last bank statement, a risk register that matches the founder's narrative on the call. Most spinouts arrive having reinvented those documents three times in nine months. Numbers do not reconcile. The fund passes, politely, and the founder does not always understand why.

The financial model that cannot model the next round is the second. The model that closed the seed back-solves from the number the founders wanted. It cannot answer "what does an 18-month plan to a £6 million Series A on three named hires look like" without a rebuild from scratch. Every diligence cycle costs three weeks the founder does not have.

Customer-discovery cycles that decay come third. Post-ICURe, a strong cohort runs forty interviews. By month six post-licence, the rhythm has settled at two interviews a month, notes scattered across three Notion pages and an Otter transcript, synthesis last updated in the ICURe write-up. When a lead investor asks what the founding team has learned in ninety days, the honest answer ("we have been heads-down on technical milestones") is the wrong answer.

IP register drift follows. The licence schedule lists three patent families. By month ten there is a fourth invention disclosure the TTO has not seen, a paper draft mentioning a fifth, a contractor agreement that is silent on assignment. Nothing malicious. Nobody owns the register, so it drifts. The institutional context behind this drift is structural, not accidental. The Office for Students projects 45% of English providers in deficit in 2025–26, with TTOs across the sector cutting FTEs and shifting toward narrow IP triage rather than proactive scouting (OfS financial modelling). The TTO capacity that once caught register drift before it compounded is under sustained pressure. A founder who assumes TTO oversight will fill this gap is, increasingly, working from an outdated model of what the TTO can carry.

Hiring stalls at the role most needed. No job description. No compensation band. No interview rubric. The founder knows they need a head of commercial and an MLE; they have written neither. Six months pass. A warm-intro candidate takes another offer because the founder cannot articulate the role in writing within forty-eight hours.

And investor update cadence collapses by month four. Month one goes out on time. Month two is slightly late. Month three is two paragraphs at eleven at night. Month four does not happen. Investors notice. The trust signal (this team does what it says, on schedule) has degraded, and the next round becomes materially harder for a reason that has nothing to do with the science.

The shape of those failures is easier to read on a time axis than as a list. The diagram below puts the technical track and the operational substrate on the same eighteen-month window from licence execution to the next institutional conversation. The two line shapes are illustrative, Piece 3 is not measuring either track quantitatively, but the load-bearing point is the gap the investor reads at the right edge.

Spinout founding team's two tracks, licence execution to Series A conversation (illustrative)
0 3 6 9 12 15 18 Months from licence execution 0 25 50 75 100 Track completeness (0–100) Technical milestones Operational substrate gap the investor reads →
Technical milestones Operational substrate
Illustrative, line shapes communicate trajectory, not measured data. The widening gap at the right edge is what the Series A conversation reads when the team slide and the artefact surface fail to corroborate each other.

The risk picture investors are actually pricing

The team slide is the primary proxy for execution; the substrate, built now, is a second and more durable signal.

Early-stage investment is, by nature, high-risk. A seed-stage spinout has approximately nine to eighteen months to address that risk before the next institutional conversation becomes consequential. In the typical UK spinout sequence, this window is clearly bounded. The academic research demonstrated the technology worked. The ICURe or proof-of-concept work evidenced that customers have a problem the technology could solve. Both of those things happened. What has not happened yet is any systematic demonstration that this team can operate the company. The execution risk and the governance risk remain entirely open. That is precisely what the seed-to-Series-A window is for.

Investors looking at a seed-stage spinout know this, even when they do not say it explicitly. A pitch deck demands a team slide because the team is the investor's primary proxy for execution-and-governance de-risking. The team slide asks the reader to believe: these are the people who will make the decisions, build the procedures, hit the milestones, and keep the cap table clean. In the absence of any operational track record, the investor is pricing headcount, credentials, and the vague signal of "founder quality." That is the only visible signal the team slide carries.

The argument the rest of this piece builds is that in 2026, the team slide is no longer the only visible signal. Operational substrate, built deliberately in the seed window, is a second signal of the same underlying thing (execution discipline and governance competence), and in some respects a more durable one. A procedure persists when a person leaves. A versioned IP register does not resign to take another offer. A board pack that has been produced consistently for six quarters tells the Series A fund something the team slide cannot: that the operating cadence is real, not promised. The failure modes named above are, precisely, the failure modes that erode investor confidence in the seed window. Building the substrate is not a productivity exercise. It is a risk de-risking exercise, conducted on the investor's timeline.

Why chat-plus-SaaS is sub-scale now

Chat-plus-SaaS produces output; it does not produce cadence, and cadence is what the 2027 investor will examine.

None of those failure modes are new. What is new is why the old tooling can no longer absorb them.

Three specific reasons, in flowing order. Expected cadence has tightened. A 2026 spinout aiming for an institutional Series A in 2027 is expected to ship monthly investor updates with artefacts, quarterly board packs that read like a Series B company's, and a rolling customer-discovery synthesis, not a one-time write-up. Chat-plus-SaaS cannot produce that cadence without the founder becoming the integration layer full-time. Science slips.

Comparable founders are shipping it. Founding teams who have built named workflows on top of agentic tools are producing investor-grade artefacts at a cadence their chat-only peers cannot match on the same headcount. The investor sees the gap now. It shows up in the board pack quality before it shows up in anything the founder says. Innovate UK's Velocity account-management model, announced March 2026, makes this structural: spinouts entering the pipeline need board governance, financial controls, a product roadmap, and GTM clarity already in place before the account-management conversation begins (UKRI announcement). The artefact-quality gap is being assessed at programme level, not just by individual investors.

And the structural point: substrate drifts under chat-plus-SaaS because nothing accretes. Every prompt is rebuilt from scratch. Every artefact starts blank. There is no versioning, no evaluation, no procedure that survives the founder. Each new investor update is roughly as expensive to produce as the last. The tooling produces output. It does not produce cadence.

Chat-plus-SaaS is not failing because it is bad. It is failing because it is flat. It does not compound.

The structural reason chat-plus-SaaS underdelivers at cadence is architectural, and Garry Tan names it in the YC Lightcone "Tokenmaxxing" episode: "all of the difficulty in agentic engineering today is when people try to do things that should be in markdown in code, and it fails because code is brittle". His preferred shape is a thin orchestration core with the load-bearing work living in codified markdown plays, "a harness is the core loop … what we should be spending all our time doing is thinking about what markdown should there be" (YC Lightcone, 'Tokenmaxxing'). Read at the scale of a single founding team, this is the diagnostic for why bolting a chat interface onto a SaaS stack does not produce cadence: the procedures live in heads and in code, not in markdown that the model can re-execute. The substrate the next section names is the response to that diagnosis.

What an AI-first operator substrate actually is

It is four layers the founder builds and owns; this section names them without assuming prior technical knowledge.

"AI-first operator substrate" is my own formulation, not industry terminology, so it needs an immediate definition. It is four layers the founder builds and owns.

The first layer is named workflows: two to six on-demand procedures with typed inputs and outputs, written procedures, and named review gates. The second is a knowledge layer: a structured, agent-reachable store of spinout facts including IP position, cap table, financial model assumptions, customer notes, prior board packs, and the founder's own voice, indexed and versioned so retrieval is deterministic rather than approximate. Underneath the knowledge layer sits a sensor layer the four-layer model has been silently folding into "knowledge": the data-ingress mechanisms (bank feeds, transcripts, CRM events, telemetry, customer emails) that bring fresh evidence into the substrate without the founder copy-pasting it in. The third is eval and observability: golden tasks per workflow, token and latency budgets, stored traces, regressions visible before they reach a board meeting. Around the founder review gate sits a learning loop the four-layer model has also been silently folding into "review": the mechanism by which substrate-level feedback, which workflows are improving, which are drifting, which need re-codification, feeds the next quarter's substrate work. The fourth is on-demand skills and sub-agents, narrow specialists the orchestrator can call, from a financial-model validator to a prior-art scanner. The sensor layer and the learning loop are not additions to the model; they are primitives the four named layers already imply, surfaced here so the founder can name them when designing the substrate rather than discovering them in retrospect.

The distinction from chat-plus-SaaS is structural, not tonal. Chat-plus-SaaS is access: paste context, receive output, bespoke each time, nothing accumulates. The substrate is standing capability: procedure in writing, agent runs it, output lands in the right place, trace logged. The first model degrades under fatigue and founder-absence. The second gets cheaper to run as it learns the company.

The six workflows a spinout actually needs

The minimum spanning set for 18-month survival covers six procedures, each with a named eval check.

The minimum spanning set for 18-month survival cadence covers six procedures, with an optional seventh.

Board pack assembly takes a KPI feed, a cash bridge feed, a risk register, and last quarter's pack as inputs and produces a pre-meeting pack on a fixed schedule. The review gate is the founder reading and signing. The eval check confirms KPIs reconcile to cash and the risk register diffs against last quarter without unexplained gaps.

The investor update takes the month's KPI deltas, customer-discovery notes, hiring status, and IP register diffs, and outputs a monthly email on a named date. The eval checks word count, named-metric coverage, and plain-prose register.

Financial-model maintenance pulls actuals from the bank feed and any assumption deltas, and outputs a rolled-forward model. The eval runs the closes-the-round-on-paper test and confirms sensitivities are reproducible.

Customer-discovery synthesis takes transcripts (Otter, Read.ai, raw notes) and produces a rolling synthesis updated weekly with named segments, objections, and willingness-to-pay signals. The eval confirms every claim traces to an interview identifier.

The IP register pulls invention disclosures, paper drafts, contractor agreements, and the licence schedule, and outputs a single live register with assignment status per item. The eval runs a monthly diff with the TTO contact log.

The hiring pipeline holds JD drafts, candidates, interview notes, and rubric scores, and outputs a live pipeline view plus a JD-on-demand for the next role. The eval measures time-to-JD from "we need this role" to "JD is live," with forty-eight hours as the target.

The optional seventh is a milestone tracker: a single source of truth for cap-table commitments, board minutes, grant milestones, and data-room version, so the document the board minutes cite is the document in the data room.

A 2026-vintage investor conducting diligence will be looking for a specific set of objects alongside those procedures. The most discriminating is the named workflow before-and-after, best shown live. A prompt, agent, or skill repository under version control. An eval harness, even three golden tasks with pass/fail criteria per workflow. An observability and cost ledger: token-spend and tool-call traces per workflow, not just a tool-spend line in the cash flow. A hiring plan with explicit not-hired roles, mapped to the workflows carrying that work instead. One externally verifiable output, a working investor-update pipeline with data in, narrative out, and an auditable trail. And a trust and authority register: one page per workflow naming what the agent may write, send, or commit without human review.

Tuesday at 10am, with and without

The same board-pack request costs forty-five minutes with the substrate or two working days without it.

Tuesday at 10am, drawn from founder conversations I have had this year (composite, not one specific founder): the abstract argument earns itself in a single concrete contrast.

Without: the founder opens email on Tuesday morning and finds a note from the board chair (updated KPI summary needed by Thursday). The founder opens a chat tab, opens last quarter's pack, opens the bank app, opens the financial model. By midday, they have not yet started reconciling the KPI figures because the model's actuals line did not match what the bank shows. By four in the afternoon, they are rebuilding the cash bridge by hand. Wednesday is gone too. The science work planned for Tuesday and Wednesday slips to the weekend, again.

With: the founder opens the board-pack workflow on Tuesday morning. The workflow pulled the bank feed and the KPI feed overnight. The draft pack is waiting. The founder reads it, flags two items that need founder judgment (a customer reference to update and a hiring decision that has not yet been made), and signs the rest. Total founder time: forty-five minutes. Tuesday morning is back. Science work happens.

That contrast is not a productivity argument. It is a survival argument. The science has to be done. The board pack has to go out. In 2026, one of those can run on substrate; the other cannot. The question is whether the founder has built the substrate.

The arithmetic that makes it fractional

The gain is in the gap, work the founder was previously doing badly, inconsistently, or skipping entirely.

Piece 2 of this series cites METR's own 2025-to-2026 drift as currency-evidence of how fast the agentic landscape has moved. The consequence for spinout operating economics follows from that drift.

A senior operator working fractionally, well below the full-time COO cost a seed-stage spinout would have priced this work at three years ago, can deliver the operating cadence, provided three specific conditions hold: that integration across fragmented tools is the bottleneck (MCP has tightened this condition specifically, because agents now reach multiple sources without bespoke wiring that previously required an engineer); that the procedure is amenable to codification; and that the founder was previously doing the work badly, inconsistently, or skipping it entirely. For work the founder already did well, the delta is approximately zero. The gain is in the gap, not everywhere. By May 2026, the coding-agent surface is no longer scarce on any axis a one-to-three-person academic founding team is likely to hit first.

This is not a weekend project. Delivering that quality fractionally requires deliberate scaffolding built over a quarter: understanding the spinout's science, its IP position, its founding team's communication style, the institutional relationships it carries, the gaps in its commercial story. The scaffolding is the work. But once it exists, the ongoing maintenance (the board pack done before the meeting, the financial model updated when assumptions shift rather than reconstructed from scratch) is tractable at a fraction of the cost it previously demanded. Same team, larger company.

What compounds when you build it once

The knowledge layer underneath every workflow accretes context; by month nine, the substrate knows the company.

The reason to build the substrate rather than assembling good tools is not the tools. It is what accretes underneath them.

The substrate compounds because each codified workflow returns founder time, and the knowledge layer underneath every workflow accretes context: the company's KPIs, the founder's voice, the lead investor's stylistic preferences, the IP narrative. Tom Blomfield, in a 2026 YC talk on building self-improving companies, names the precondition for this accretion in one line: "every single thing that happens, if it is recorded, it happened to the AI. If it did not get recorded, it did not happen to your intelligence". The corollary for a founding team is operational rather than philosophical: an interaction that lives only in a Slack thread, an off-the-record call, or a founder's head does not become substrate. The discipline that turns a knowledge layer into something that compounds is the discipline of recording, transcripts, notes, decisions, exceptions, at the moment the work is done. That accretion makes the next workflow cheaper to stand up than the last. By month nine, a workflow that would have taken a fortnight to codify in month one is a Tuesday afternoon. The substrate has become the operator the founder cannot afford to hire.

What compounds is not abstract productivity. It is the substrate's growing knowledge of this specific company, making each subsequent workflow cheaper to build and faster to reach quality. By the third or fourth workflow, the knowledge layer already carries the context that the first workflow had to be told. That is the structural argument for building early: not that the first workflow pays back quickly (it may not), but that every subsequent workflow is cheaper than the one before it, and the gap between what the substrate can produce and what a fresh chat session can produce widens with each quarter.

A context-mismatch flag is worth naming explicitly. Blomfield's audience is post-product-market-fit YC companies; the SpinUp Forge audience is pre-diligence UK academic spinouts. The architectural convergence between this piece's substrate and Blomfield's five-layer framing is intellectual, not contextual. The "first eighteen months", the "one-to-three-person founding team", the "TTO timeline" are the contextual anchors that hold; YC's contextual anchors (demo day, Series A, post-PMF scale) do not transfer. That said, the rate at which AI-native operating practice is spreading from US software companies to other founder cohorts is fast enough to make this architectural pattern a likely baseline investor expectation within the next twelve to eighteen months. A UK academic spinout that has not begun to build the substrate by then is not just lagging the YC cohort; it is unprepared for the next institutional conversation. The argument is therefore not that academic founders should copy YC's playbook, it is that the assessment standard the YC playbook reaches first is the assessment standard their own investors will reach by the time they raise.

Two named levers

Neither requires new legislation; both require a decision that the execution gap is partly a management problem.

Neither requires new legislation.

The first is for TTOs and universities. Publish a standing Operator-in-Residence licence template: pre-negotiated, board-approved, with a documented equity band calibrated against the USIT for Software framework's 5-10% range for the software equivalent, and a six-month review gate. A TTO that has done this work once, in writing, removes the ad hoc renegotiation that currently makes inserting an external operator between licence execution and CEO hire difficult in practice. The USIT for Software framework exists; it needs a single additional schedule.

The second is for Innovate UK, in the context of the Venture Builder Pilot (the EOI for which closed 22 May 2026; results are expected in early June 2026). Ring-fence at least 10-25% of the £150,000 unit for a named operator-cost line: explicit permission, in the grant terms, to spend that portion on a non-academic operator rather than further technical milestones. This is a small change in eligibility wording. The programme's own stated purpose (closing the gap between validated customer discovery and investment readiness) already implies it is partly a commercial-execution problem. The unit economics are in the document. The cost line is not yet there. The stage-2 programme is nine months of investability-building only, not technical milestones, not further research. If the operator-cost line were present in the grant terms, those nine months would be precisely when to build the substrate described in this piece, and the founding teams that completed the programme with that line funded would arrive at the investor conversation with artefacts rather than intentions.

ARIA's Activation Partners Cohort 2 (applications closed 21 May 2026) is funding AI-in-Science infrastructure at the research end of the pipeline. What it is not funding (and what has no equivalent at scale) is the venture-build end: the substrate that lets a spinout move from a signed licence to a legible company, systematically. I leave the gap where it lands.

Running a larger company than the headcount admits to

From outside, this spinout is a two-person team; from inside the board pack, it is a company that ships on cadence.

From the outside, this spinout looks like a one-to-three person team. From inside the board pack, it looks like a company that ships investor updates monthly, maintains a versioned financial model, runs a rolling customer-discovery synthesis, keeps an IP register the TTO can read, and reaches each milestone on the date the cap table named. That cadence is not free. It still costs money. Done well, though, it costs a fraction of what an equivalent full-time operating function would have cost a seed-stage spinout three years ago. The arithmetic is conditional on three things: that integration across fragmented tools is the bottleneck, that the procedure is amenable to codification, and that the founder was previously doing the work badly, inconsistently, or skipping it entirely. For work the founder already did well, the delta is approximately zero. The gain is in the gap, not everywhere. That gap is also where the substrate compounds, because each codified workflow returns founder time and the knowledge layer underneath every workflow accretes context that makes the next workflow cheaper to stand up than the last.

The case is to recognise the substrate, and to invest in it deliberately, early. That investment compounds in one direction. The founding teams who do it first will not be ahead because they used a particular model. They will be ahead because they built documented procedures early enough to have somewhere to put each improvement, and the company those procedures support is running a larger company than their headcount lets them admit to.

Sourcing and assumption notes: The USIT for Software signatory count ("over 50 UK universities" by November 2024) counts framework signatories, not an audited measure of equity-structuring practice. The Empirical Ventures £10 million BBB Regional Angels commitment is sourced from the Tech.eu March 2026 report. The Midlands Engine £10 million over five years / £2 million per year figure is from the GOV.UK announcement linked above. The two-track time-axis diagram is illustrative, Piece 3 does not measure either track quantitatively; the load-bearing point is the widening gap at the investor-conversation edge, not the precise endpoint values. The 0.2 FTE-2026 fractional-operator estimate is a calibrated estimate conditional on the three criteria stated in the text. METR 2025 RCT and the February and May 2026 updates are discussed in detail in Piece 2 of this series; METR drift is cited here as currency-evidence of how fast the tool surface has moved, not as standalone productivity justification for the FTE arithmetic. The OfS 45% deficit projection is from the OfS 2025/26 financial-sustainability modelling. The Tom Blomfield and Garry Tan quotations are verbatim from the primary YC venues linked in Sources; Blomfield's audience is post-PMF YC companies and the surrounding prose names the context-mismatch flag explicitly so the architectural convergence is read as intellectual rather than contextual. TTBEO 2026 came into force 30 April 2026 per SI 2026/369; contextual reference only, not load-bearing for any specific claim here.

← Back to SpinUp Forge