TL;DR overview
- Claude Fable 5 built a capable Java module in 13 minutes with concurrency handling and its own test suite, but SonarQube Cloud's quality gate caught a HIGH-severity security vulnerability and insufficient coverage.
- The model defended against path traversal in filenames while missing an insecure temporary directory on the same upload feature, which is a gap between training-data patterns and OS-level domain knowledge.
- The findings included some of the same categories SonarQube catches in developer-written code: duplicated strings, deprecated APIs, insecure API calls. Existing quality infrastructure works for AI-generated pull requests without AI-specific configuration.
- AI coding agents introduce bugs non-deterministically; you can't predict which vulnerabilities will appear on a given run.
Claude Fable 5 launched on June 9, 2026, as Anthropic's most capable coding model, and we wanted to see what its output looks like when it works without a quality feedback loop. We gave it microsoft/gctoolkit, a real open-source Java codebase with JPMS modules, and asked it to build a REST API module from a single prompt while running no static analysis and no quality gate during the session. The model produced 1,222 lines of working code in roughly 13 minutes, and when we scanned the pull request with SonarQube Cloud, the quality gate failed because of a HIGH-severity security vulnerability and insufficient test coverage.
The experiment
We used Claude Code with the claude-fable-5 model in a clean session with no quality tools: no SonarQube MCP Server, no Agentic Analysis, and no CLAUDE.md rules file. The model worked entirely from training knowledge and what it read by browsing the codebase during the session, which uses JPMS modules that require precise dependency and export declarations. We gave it this prompt:
Add a REST API module to gctoolkit that lets users upload a GC log file via HTTP and get back the analysis results as JSON. Include endpoints for uploading a log, getting pause time stats, and heap occupancy data.
In about 13 minutes, the model consumed approximately 165,000 output tokens across 45 tool calls and produced 15 files (10 source, 2 test, 3 config/doc) totaling 1,222 lines. It read the codebase, chose a framework, built the module, wrote and ran tests, smoke-tested the running server with curl, and created the pull request in a single autonomous pass. This is one experiment with one task on one codebase, not a benchmark. We ran the task twice to test reproducibility, both results are discussed in non-deterministic failure modes. Sonar's LLM Leaderboard evaluates code quality and security across models at larger scale.

The quality gate failed

An insecure temporary directory in the file upload handler (java:S5443, HIGH severity, security impact) drove the security rating to D. Coverage landed at 76.7% against an 80% threshold because the model wrote tests that validated behavior (does the API return the right JSON?) while leaving enough branches unexercised to miss the bar, and without quality gate feedback during the session it had no way to know the target. SonarQube Cloud found 10 issues total across the pull request.
What the model actually built
Fable 5 created a proper JPMS module (com.microsoft.gctoolkit.restapi) with correct requires, exports, and provides declarations, discovered gctoolkit's existing Vert.x dependency and reused it at the same version (5.0.12) rather than introducing a new framework, and built new aggregation classes that retain individual pause durations for percentile computation because it recognized the sample module's PauseTimeSummary only tracks a running total. It wrote nine tests across two classes including a full end-to-end flow covering upload through query, and then went beyond the automated test suite by starting the running server, uploading a real GC log via curl, and verifying the JSON responses before committing. The 0.0% duplication score on new code confirmed that the module's 1,222 lines contained no copy-paste artifacts.
The insecure temporary directory
In the upload handler, the model creates a staging directory for incoming GC log files before analysis:
// RestApiServer.java, lines 114-122
private void handleUpload(RoutingContext ctx) {
Path workDirectory = null;
try {
workDirectory = Files.createTempDirectory("gctoolkit-restapi"); // ← S5443
Path logFile = stageUploadedLog(ctx, workDirectory);
AnalysisResult result = analysisService.analyze(logFile, logFile.getFileName().toString());
ctx.response().putHeader("Location", "/api/logs/" + result.getId());
respondJson(ctx, 201, describe(result));
} catch (BadRequestException e) {Files.createTempDirectory("gctoolkit-restapi") looks reasonable because the method name says "temporary" and the prefix includes the application name, but the single-argument variant of java.nio.file.Files.createTempDirectory(String) always delegates to the operating system's default temporary directory (/tmp on Linux, /var/folders/... on macOS, %TEMP% on Windows). On Linux, /tmp is world-writable with the sticky bit set, meaning any local process can create files there, and a TOCTOU (time-of-check-time-of-use) race condition is possible for processes running as the same user. The risk is highest in containers where a shared /tmp volume is mounted across processes. macOS and Windows use per-user temp directories by default, but containers and CI environments often share a single /tmp across processes.
Between the time the directory is created and the time the uploaded file is written into it, an attacker on the same host can exploit a race condition by creating a symlink at the expected file path before the application writes, redirecting the output to an attacker-readable location. GC logs can contain JVM command-line arguments including database connection strings and API keys. An attacker could also pre-populate the directory with a crafted file, causing GCToolKit to parse attacker-controlled content instead of the real upload. The risk is highest when the REST API runs on a shared host, in a container with a shared /tmp volume, or in a CI/CD environment where multiple processes share the temp directory.
SonarQube Cloud flagged this as java:S5443 (HIGH severity, security impact), mapping it to OWASP Top 10 2021 A1 (Broken Access Control), CWE-377 (Insecure Temporary File), and CWE-379 (Creation of Temporary File in Directory with Insecure Permissions).

On the same upload flow, the model implemented sanitizeFileName() to handle null inputs, path traversal characters, whitespace, and degenerate cases like "...". It built a careful defense against malicious filenames (a thoroughly documented attack vector in training data) while missing the insecure temp directory on the same feature, and the difference is that path traversal is a code-level input pattern the model has seen thousands of times while OS-level race conditions in publicly writable directories require domain knowledge about how operating systems handle concurrent file access. Files.createTempDirectory("gctoolkit-restapi") compiles, passes every test, and would likely survive code review because the vulnerability only manifests under adversarial conditions on a shared host. SonarQube traces the createTempDirectory(String) call to the rule automatically because the single-argument variant defaults to the system temp directory.
The compliant fix from the S5443 rule definition uses either a secure parent directory or restrictive POSIX permissions:
// Compliant: specify a secure parent directory
File.createTempFile("prefix", "suffix", new File("/mySecureDirectory"));
// Compliant: set restrictive POSIX permissions
FileAttribute<Set<PosixFilePermission>> attr =
PosixFilePermissions.asFileAttribute(PosixFilePermissions.fromString("rwx------"));
Files.createTempFile("prefix", "suffix", attr);What the model got right about thread safety
GCToolKit's Javadoc warns that the API is not thread safe, and the model read this documentation and built a multi-layer concurrency strategy tailored to the library's specific contract rather than applying a generic synchronization wrapper. In its own summary:
Thread safety: GCToolKit's API isn't thread-safe and fills the registered aggregation instances in place, so AnalysisService creates fresh instances per request, serializes analyses (synchronized + ordered Vert.x blocking handler), and freezes results into immutable JSON at analysis time so reads are lock-free.
Thread safety required understanding a constraint that was explicitly documented in the codebase's Javadoc, and the model translated it into a concrete design. The insecure temp directory required understanding a constraint that lives in the operating system's security model, outside any source file the model could read during the session.
The remaining nine findings
Beyond S5443, SonarQube Cloud flagged nine code smells.
The duplicated string constants (S1192) would cause partial-update bugs if the JSON field name or API path ever changed, and extracting them into constants is a one-line fix per instance. The eager logger concatenation (S3457) matters most for the two calls at Level.FINE in the temp directory cleanup path, where string building runs on every failed file deletion even though FINE is disabled by default in java.util.logging and the message is never written, so the application allocates and discards string objects for zero benefit whenever cleanup hits a deletion failure under production load.
For S1874, the model explicitly read DateTimeStamp.java earlier in the session by grepping for the class definition, which means it presumably saw the @Deprecated annotation on getTimeStamp() before using the deprecated method in two places anyway. Models appear to process method signatures and return types more strongly than deprecation annotations when selecting which APIs to call, which is worth accounting for when reviewing AI-generated code that interacts with unfamiliar libraries.
Non-deterministic failure modes
We ran this experiment twice with the same model, codebase, and prompt. Both runs chose Vert.x, both failed the quality gate with 9-10 issues, and the maintainability findings (empty constructors, deprecated APIs, and string literal duplication) recurred across both runs, but the higher-severity findings were mutually exclusive because the first run introduced a concurrency bug (java:S2445, synchronizing on a method parameter) while handling temp files safely and the second run avoided the concurrency issue but introduced the insecure temp directory. The model generates different higher-severity bugs on each run while repeating the same structural patterns.
Every finding in this experiment falls into a category that tests alone cannot catch because the issues require conditions testing doesn't simulate: adversarial actors on a shared host for S5443, library deprecation cycles measured in months or years for S1874, production-scale request volume for the S3457 logger concatenation where a disabled log level turns string building into pure waste. A deterministic quality gate catches whatever comes through regardless of which specific vulnerabilities appear on a given run.
The missing feedback loop
Fable 5 didn't know the quality bar because it had no runtime access to the project's quality rules and no way to check its work against them. Training knowledge was strong enough to produce a working JPMS module with thoughtful concurrency handling and filename sanitization, but not specific enough to catch an insecure temp directory API that maps to OWASP A1. With a feedback loop through the SonarQube MCP Server or SonarQube Agentic Analysis, the model would have had the finding and the documented fix before the pull request.
For teams already running SonarQube, these are familiar findings. Duplicated string constants, insecure temp APIs, empty constructors, eager logger concatenation, and deprecated method calls are the same categories of issues that SonarQube catches in developer-written code every day, and the same quality gate caught them here without any AI-specific configuration.

