A day at XT26 — what fintech engineers are actually saying about AI

So I spent Thursday at XT26 over at Tower Hill, and I’m finally getting round to writing it up — coffee in hand, brain still slightly fried from the day. If you haven't come across XT26 before, it's a one-day conference put on by JUXT, aimed squarely at engineers working in financial services. It was easily worth the trip, and the through-line across the day was so consistent that I wanted to get my thoughts down before they faded.

The short version: not every talk was about AI, but the ones that were kept circling the same things. How do you actually use it effectively? How do you work out whether you can trust it? And the dawning realisation that it doesn’t so much create new problems as amplify the ones we already had — governance, hiring, productivity measurement, the cost of putting anything into production. Nobody had tidy answers, and I think that’s the honest place to be right now.

It was lovely to catch up with my manager Chris Perry and an ex-Morgan Stanley colleague of mine, Rakesh Nair, who I hadn't seen in ages. Half the value of a conference like this is the chats in between the talks, and this was no exception.

AI in equities — the promise is real, so are the problems

The first talk I caught was Will Bassett from HSBC, who runs equities and cross-asset financing tech. He's got something like 1,200 engineers under him, so when he says something's working or not working, it carries a bit of weight. His framing was that the capability of agentic coding tools has genuinely moved — that bit isn't hype — but the gap between what the tools can do in a demo and what an enterprise can safely roll out is huge. Governance, cost, security, vendor relationships, training, hiring. None of those bits are sexy, none of them are solved.

The graduate-hiring question landed hard for me. If juniors aren't writing the easy code any more, where do the next generation of senior engineers come from? Nobody in the room had a good answer, and I don't think we should pretend that we do.

Close to the metal — modern low-latency development

Tom Dellmann from Chronicle Software gave a properly old-school talk on low-latency systems, and after a morning of AI strategy this was a lovely palate cleanser. His core message was that low-latency engineering is about thinking in percentiles, not averages. Your p50 looks fine and your p99 is eating your lunch. It's tail latency that costs you money.

He went deep on the stuff you don't normally hear at conferences any more — serialisation tradeoffs, off-heap memory, eliminating garbage collection, CPU pinning, kernel tuning. Memory really does have to be treated as a first-class concern. But the bit I keep coming back to is actually a sideways thought. A lot of the reason firms still build these systems on the JVM is just skillset — Java is what people already know, what's easy to hire for, what the existing codebases are written in. With AI in the mix, that calculus might genuinely change. We could finally start picking the right tool for the job — Rust for this bit, something else for that bit — rather than defaulting to whatever language the team is most comfortable with. That feels like a genuinely interesting unlock, and one I hadn't really considered before Tom's talk pushed me to think about it.

Spectroscopy for software

Henry Garner, JUXT's CTO, gave one of those talks where I scribbled half a page of notes and then stopped writing because I just wanted to listen. Early on he referenced the METR study, where developers using AI assistants reported feeling around 20% more productive but were measured as being roughly 19% slower. It's the kind of result that makes you stop and check your own assumptions about how much the tooling is actually helping.

His real argument, though, was about legacy code. He defined legacy code as code that nobody has a theory of any more, which is a definition I want to steal. He drew a line back through Peter Naur's old "Programming as Theory Building" essay and forward to the question of whether AI can help us recover a theory of a system from the code itself — a sort of spectroscopy for software, where the model gives you a new representation of the thing you're already running.

A big part of the talk was Allium, an open-source tool JUXT have been working on that puts that idea into practice. The bit that really got me was when Henry showed it being used on the Apollo Guidance Computer source code — yes, that Apollo — and it surfaced a bug nobody had spotted. I thought that was a brilliant example, and honestly a pretty cool way to demonstrate what the tool can do.

Platform as a product in a 300-year-old bank

Abby Bangser from Syntasso and Joel King from NatWest took us through what it actually looks like to introduce internal platforms in a bank founded in 1727. They had 135 different deployment patterns across the bank at one point. One hundred and thirty-five. The mental image of that alone made me laugh and then wince.

Their journey was the now-familiar one — centralised ops to a DevOps free-for-all to a more grown-up platform-engineering model — but Joel was refreshingly honest about how long it actually takes, and how much of the work isn't technical at all. Getting a new application deployed used to take 8 to 12 weeks. They've got it down to less than an hour. That's the kind of number that changes how a business behaves. Abby's broader point, drawn from her work on the CNCF platform engineering whitepaper, was that the platform itself has to be treated as a product, with users, a roadmap, and a feedback loop — not as a side-of-desk infrastructure project.

Why coding agents fail to boost productivity

Nik Tkachev from JetBrains had the unenviable job of being honest about why agentic coding hasn't translated into the productivity numbers everybody hoped for. His answer was uncomfortably good. The gains are real but marginal at the individual level; the costs, on the other hand, are visible and rising — per-developer pricing has gone from something you didn't notice to something a department lead has to justify.

The bit that landed for me was his point about "in-thinking" time. When you're driving an agent rather than writing code yourself, you're spending your attention budget evaluating its output, deciding whether to accept, reviewing diffs. That isn't free, and at the moment we're not even measuring it properly. His advice was evolutionary, not revolutionary — start with autocomplete, move to multi-step, only then think about full automation. Skip the steps and you build distrust faster than you build value.

The slide above stuck with me. Nik laid out a team maturity model — L1 Exploring (a few individuals experimenting), L2 Scaling (most of the team uses AI but there's no shared setup and no shared practice), L3 Practicing (shared workflows, someone owns them), L4 Optimising (agents as a team capability, metrics-driven). His point was that most teams, including a lot of pretty serious ones, are stuck at L2 right now. Honestly, my own team is somewhere in that L2 bucket too. Everyone's using something, no two people are using it the same way, and we're trying to figure out how to get to L3 without burning everyone out in the process. It was oddly reassuring to see that drawn out on a screen and realise we're not alone.

Extracting reliable software from LLM loops

River Keefer from Antithesis made what I thought was the most quietly radical case of the day. Property-based testing — the idea you write a spec describing what should be true across a whole family of inputs, rather than picking a handful of examples — turns out to be a really good way of catching the things LLMs get subtly wrong.

He showed a lovely example where he asked a model for a bin-packing algorithm, was given a clever first-fit approximation, and only realised it wasn't an exact solution when his property-based test failed on a generated input. He paraphrased Dijkstra's 1978 essay on the "foolishness of natural language programming" — and honestly, watching a packed room nodding along to a 47-year-old paper was its own kind of moment.

Dark modules, cobots, and architecting for AI

Sam Newman closed out the bit of the day I saw, and honestly he was the highlight for me. Great stage presence, a real personality up there, and the kind of speaker who can land a serious point with a one-liner. If you ever get the chance to see him live, take it.

His framing was that we keep talking about AI as a uniform thing — either you let it write all the code or you don't — when actually the more sensible question is which modules of your system you let it own. He borrowed from robotics. Industrial robots work behind a cage, alone, dangerous, fast. Cobots work alongside humans, slower, safer, collaborative. The argument was that the same range applies to AI in our codebases — some modules can be "dark factories" where the AI runs unsupervised, others want a surgical cobot relationship where you're working hand in glove, and some you keep as straight human work.

The bit I really enjoyed was when he turned his eye on the humble pull request. We've all been doing PRs for so long that we barely question them any more, but he actually went back to first principles and asked: what are PRs for? He listed four reasons.

Correctness, which is really a job for your tests rather than a code review. Shared learning, where reviewer and author both pick up something. Alignment to strategy and practice, making sure the code fits the wider direction of travel. And an auditable unit of work, which is genuinely important for compliance and traceability. The question he then put to the room was the awkward one. If a chunk of your code was written by an AI, which of those four still apply? The AI isn't learning in the human sense. The auditability is still useful. The alignment is on you. So is the PR process you're running over AI-generated code actually serving any of those purposes, or are you just performing the ritual? I genuinely don't know what the answer is for my own work yet, but I haven't been able to stop chewing on the question since.

Modular architecture, he argued, is the best hedge we've got against not knowing how this all plays out. The maturity that matters isn't your tooling — it's your team's ability to think critically about what you're actually doing.

What I came away with

If there was a single thread tying the day together, it was this. The interesting questions about AI in software aren't about what the models can do — they can do a lot, and the rest will improve. The interesting questions are about us. How we govern it, how we cost it, how we measure productivity honestly, how we structure our systems so the bits we want humans on stay with humans. Almost every speaker, in their own way, was making a case for clearer thinking and better boundaries rather than for raw acceleration.

I'd love to hear how other people are dealing with this where they work. Are your teams seeing real productivity wins from agentic tools, or are you stuck in the same evaluation purgatory the rest of us seem to be in? Drop me a comment below — I'd genuinely like to compare notes.

One thought on “A day at XT26 — what fintech engineers are actually saying about AI”

Prashant Gandhi says:

23 June 2026 at 16:58

Thanks for joining us at XT26 Jas. Sorry we didn’t get a chance to connect in person at the event but looks like you got a lot out of it. For me, the biggest takeaway was that organisations need to spend time in building the decision infrastructure alongside the velocity infrastructure.

Loading...