CodeWall says it hacked McKinsey’s AI platform. Here’s what holds up — and what doesn’t.

This reflects my personal assessment of publicly available reporting and CodeWall’s published blog post. I was not involved in the testing, I do not have access to McKinsey’s internal facts or forensic findings, and my views should be read as commentary and opinion rather than statements of verified fact.

A security startup called CodeWall claims its autonomous agent compromised McKinsey’s internal AI platform, Lilli, within two hours and gained unauthenticated read-write access to a production database containing tens of millions of consultant conversations. The vulnerability appears credible. The claimed scope of impact is not fully evidenced. The primary CodeWall post is here: codewall.ai/blog/how-… Independent reporting by Jessica Lyons in The Register is here: www.theregister.com/2026/03/0…

What is likely true

The attack chain CodeWall describes — publicly exposed API documentation, unauthenticated endpoints, SQL injection through unsafely handled JSON keys and IDOR chaining — is plausible and technically sound. JSON key injection is an uncommon vector. Most security testing tools and methodologies focus on input values, not field names. If Lilli’s backend parameterized values while concatenating keys directly into SQL, that would create a blind spot many assessments could miss.

McKinsey’s response supports the credibility of the finding. In The Register, journalist Jessica Lyons reported that McKinsey acknowledged the issues, patched them within hours and said its forensic review found no evidence that client data or confidential information were accessed by the researcher or any unauthorized party. That report also quotes CodeWall CEO Paul Price on the company’s use of an autonomous agent.

The prompt-layer risk CodeWall highlights is also substantive. If Lilli’s system prompts — the instructions governing how the AI behaves — were stored in the same database to which the agent had write access, an attacker could alter AI behaviour at scale without a traditional code deployment and potentially outside standard release controls. Many organizations have not explicitly modelled this threat, and prompt-layer integrity controls remain immature in many environments.

What is overstated or unproven

CodeWall claims 46.5 million chat messages, 728,000 files, 57,000 user accounts and hundreds of thousands of AI configurations were accessible. The blog provides no proof-of-concept payloads, no hashes, no screenshots and no evidence showing privilege boundaries. It is unclear whether those figures represent records the agent actually retrieved, database row counts inferred from metadata or something in between.

More importantly, the blog conflates three categories that any security professional should keep separate: what was theoretically reachable, what was actually accessed and what was verified as exfiltrated. CodeWall emphasizes reachability. McKinsey’s statement addresses investigated access. Both could be true at the same time, but the blog does not clearly distinguish between them.

The two-hour timeline also deserves scrutiny. Blind SQL injection is typically slow because extraction happens incrementally. The post suggests verbose error messages may have accelerated discovery, which implies the path may have combined error-assisted identification with later blind or semi-blind extraction. That is plausible, but the article does not provide enough technical detail to substantiate a claim of full production read-write access within two hours and 15 iterations.

The assertion that a modified prompt “leaves no log trail” is also too absolute. Whether prompt tampering is detectable depends on the target’s database audit logging, configuration versioning and anomaly detection. Mature organizations may log or detect these events. The blog presents the point too categorically.

What is concerning about the disclosure itself

Autonomous target selection

CodeWall presents the fact that its agent independently chose McKinsey as a target as a feature. An AI system deciding whom to attack — even if limited to organizations with disclosure policies — raises serious questions about operator control, authorization and liability. That issue deserves careful scrutiny, not celebration.

Unresolved scope authorization

The blog cites McKinsey’s HackerOne responsible disclosure policy as justification, but neither the blog nor independent reporting confirms whether Lilli’s production infrastructure was explicitly in scope for that programme. A disclosure policy is not blanket authorization to enumerate a production database. McKinsey’s public policy is referenced by CodeWall here: hackerone.com/mckinsey-…

Rushed disclosure

The issue was discovered Feb. 28, 2026. The public blog was published March 9. McKinsey may have patched quickly, but rapid remediation is not the same as a completed forensic review, variant analysis and confirmation that the vulnerability had not previously been exploited by others. Nine days is a compressed window for all of that.

The published timeline also appears to contain a date inconsistency issue discussed in commentary around the post. If there was a typo in an earlier version, it is minor. Even so, in a report making very large claims, editorial sloppiness weakens confidence.

What security leaders should take away

This is a conventional application security failure on a platform that happens to run AI workloads. The described attack path — exposed documentation, missing authentication, SQL injection, verbose errors and IDOR — is textbook web and API security. Framing it as an “AI platform hack” is effective marketing. Technically, it is a severe application security failure with AI-specific consequences.

Two lessons are worth acting on regardless of the blog’s evidentiary gaps.

First, treat your AI prompt and configuration layer as a crown-jewel asset. If system prompts reside in the same data store as operational data, and that store is reachable through any injection or access-control flaw, you have created a single point of compromise that can silently alter AI behaviour at scale. Apply integrity controls, versioning and monitoring accordingly.

Second, audit for JSON key injection. If any application accepts JSON in which field names are dynamic, and those names are later used in query construction — whether SQL, NoSQL or ORM-generated queries — standard scanning tools may miss it. That requires targeted review.

The bottom line: CodeWall likely found a serious vulnerability. Its blog overstates what was proven, blurs critical distinctions between access and exfiltration, and leaves unresolved questions about authorization and disclosure discipline. The strategic lesson is real, but it is about secure architecture, access control and prompt integrity — not a new class of AI exploit.

Sources and named parties referenced: CodeWall; McKinsey & Company; Paul Price, CEO of CodeWall; Jessica Lyons, The Register.

Ethics statement

This article is intended to support informed discussion about a publicly reported security incident involving CodeWall’s claims about McKinsey’s AI platform, Lilli. It aims to distinguish clearly between CodeWall’s published assertions, McKinsey’s public response, independent media reporting and the author’s professional interpretation. Where facts remain unverified, disputed or incomplete, that uncertainty is stated rather than assumed away. This article does not endorse unauthorized testing, autonomous target selection or activity that exceeds clearly defined responsible disclosure boundaries.

Disclaimer

This article is provided for general information, commentary and discussion purposes only. It is not legal, security, privacy, compliance or other professional advice, and it should not be relied upon as such. The analysis is based on publicly available information at the time of writing, including CodeWall’s blog post, McKinsey’s public statements and independent reporting. The author was not involved in the testing, does not have access to McKinsey’s internal systems, logs or forensic findings, and cannot independently verify all technical or factual claims made by the parties involved. Any errors or omissions are unintentional. The views expressed are those of the author in a personal capacity and do not represent the views of any employer, client, partner or affiliated organization. Generative AI tools were used to assist with research and editing.

Keywords : #CyberSecurity #AppSec #AI #AIAgents #AISecurity #LLMSecurity #PromptSecurity #PromptInjection #ResponsibleDisclosure #VulnerabilityDisclosure #BugBounty #HackerOne #SQLInjection #IDOR #APISecurity #WebSecurity #SecurityResearch #ThreatModeling #SecureByDesign #SecurityLeadership #RiskManagement #DigitalTrust #InfoSec #SecurityGovernance #DataSecurity #CloudSecurity #RedTeam #BlueTeam #CyberRisk #McKinsey