Pendraic Academy

Choosing an AI model for fiction writing: Claude, GPT-5, and Gemini compared

The honest answer to "which AI model is best for fiction writing" is: it depends on the task, and a serious workflow uses more than one. That is not a hedge. It is the single most useful insight a novelist can take into model selection. Drafting fresh prose is a different job from polishing a paragraph, which is a different job from auditing a 90,000-word manuscript for continuity. The frontier models have different strengths at each. A workflow that uses Claude for everything pays for capabilities it does not need on the cheap tasks and gets a slightly worse output on the structured ones. A workflow that uses GPT for everything saves money on drafting and loses voice fidelity that matters.

Here is the comparison by task, with current versions and current pricing, so you can build the right portfolio.

The three models in scope

As of mid-2026 the three frontier options for fiction writing in BYO-AI workflows:

Anthropic's Claude Sonnet 4.6. Roughly $3 input, $15 output per million tokens. 200,000-token context window.
OpenAI's GPT-5. Roughly $1.25 input, $10 output per million tokens. 400,000-token context window.
Google's Gemini 2.5 Pro. Pricing similar to GPT-5, slightly lower on input. 1,000,000-token context window for the long-context variant.

All three are capable of producing fiction prose. None of them is the right choice for every part of the pipeline. Pricing moves; check the provider pages before committing. The relative strengths described below have held across the last few generations of frontier releases and are unlikely to reverse overnight.

Drafting fresh prose

This is the part most writers care about. Cold-start scene generation, given a scene plan and voice profile.

Claude Sonnet 4.6 currently produces the most naturalistic prose with the best voice fidelity when given a strong voice prompt. The sentence rhythm tends to vary more out of the box. The model is willing to write quiet scenes without padding them. It handles ambiguity and emotional restraint better than the alternatives. It is also more willing to admit when a request is unclear, which on a long project is a feature, not a bug.

GPT-5 is faster on raw generation and produces more uniform output. The prose is clean and competent. It tends toward the middle distance: workmanlike sentence rhythm, vocabulary safe, fewer surprises in either direction. For genre fiction with plot-forward demands, this can be exactly what you want. For literary work where voice does most of the lifting, the uniformity can flatten things you do not want flattened.

Gemini 2.5 Pro sits between the two. The prose is reasonable and slightly more clinical than Claude. The model is occasionally more rigid in its interpretation of the scene plan, which produces faithful execution and sometimes loses the texture you would have asked for if you had thought to. Useful for tight plotting; less useful for atmosphere.

The honest summary for drafting: Claude is the default first pick for prose that should sound like a specific writer. GPT-5 is the right pick for plot-heavy work where uniform competence matters more than voice. Gemini is the right pick when you need to draft against a very long context (more on that below) and prose voice is secondary.

Polishing and line edits

A different job. You are not writing new material. You are tightening sentences, swapping verbs, fixing rhythm, removing AI-tells, restoring voice where the draft drifted.

GPT-5 is fast and decent here. The model is good at small surgical changes when given specific instructions. It is also cheaper, which matters for bulk operations like polishing every scene in a 90,000-word manuscript.

Claude is better when the polish involves voice preservation. If your polish brief is "remove em dashes, ban these phrases, vary the rhythm, but do not change the voice," Claude reads the voice instruction more reliably and is less likely to drift the prose toward its own default register.

Gemini is competent at line edits but not differentiated. If you are already using one of the other two for drafting, there is rarely a reason to introduce a third just for polish.

The right portfolio move here is GPT-5 for cheap structural polish (removing AI-tells, normalizing punctuation, fixing rhythm) and Claude for voice-preservation passes where the line edit has to stay close to the writer's own register.

Structural and long-context audits

This is the task most writers underestimate. Reading the whole manuscript, looking for continuity errors, contradictions, unfired Chekhov's guns, character drift, timeline problems. A long-context model can do this in a single pass that no human can match.

Gemini 2.5 Pro has the longest context window in the comparison set. A 100,000-word manuscript fits comfortably in its working memory with room left for instructions and prior audit notes. For "read the whole book and tell me what is inconsistent," Gemini is currently the right choice on pure context-size grounds.

Claude is also strong here with its 200,000-token window, which is enough for most novels at single-pass length. The audit quality at Claude length is excellent. The model is willing to say "I could not find what you asked about," which on an audit pass is critical. A model that confabulates findings is worse than no audit at all.

GPT-5's 400,000-token context is comparable to Claude in practice for novel-length audits and the model is solid at structured-finding output. Slightly less willing than Claude to admit when it cannot find something, which is worth noting if you are running the audit unsupervised.

The portfolio move here: Gemini for the longest manuscripts where every chapter must fit in a single context, Claude for novels at standard length where audit quality and honest "I do not know" responses matter, GPT-5 as a fast second-opinion pass.

Cost-sensitive bulk operations

Some pipeline steps are not creative. Generating placeholder names. Extracting entities from prose to populate a Story Index. Running a final ban-list scan. These steps are bulk and tedious, and their output is structured or low-stakes.

GPT-5 wins on raw $/token for output-heavy work. Gemini is similar. Either is the right choice for bulk operations. Claude is overkill on cost for tasks that do not need its voice fidelity.

A workflow that uses Claude only for the high-voice tasks (drafting, voice-preservation polish, nuanced audits) and routes the bulk operations to GPT or Gemini will cost noticeably less than a Claude-only workflow without sacrificing where it matters.

Structured output and tool calls

If your pipeline involves the model returning JSON, function calls, or structured artefacts that downstream code needs to parse, this matters.

GPT-5 is currently the most reliable for structured output. The model follows JSON schemas with low drift and rarely emits malformed output.

Claude is close behind. The model's structured output has improved noticeably across recent releases and is now production-viable for most fiction workflows.

Gemini occasionally drifts on structured output, especially under load. The model is fine for free-form work and slightly less predictable when the contract demands strict schema adherence.

For a writing platform that runs structured-output pipelines (planning calls, scene synopsis writes, Story Index updates), GPT-5 is the safe default. Claude is a close second. Gemini works but requires more validation and retry logic to be production-stable.

When to use which: a concrete portfolio

A working novelist's BYO-AI portfolio, by task:

Voice profile capture and drafting of voice-sensitive prose: Claude Sonnet 4.6.
Plot-forward drafting where uniform competence matters more than voice: GPT-5.
Structural polish (rhythm, punctuation, ban-list passes): GPT-5.
Voice-preservation polish: Claude.
Long-context audits of full manuscripts for continuity: Gemini 2.5 Pro for the longest projects, Claude for standard novel length.
Structured-output pipelines (planning, registry updates, scene metadata): GPT-5, with Claude as fallback.
Bulk operations (entity extraction, naming, summary generation): GPT-5 or Gemini on cost grounds.

That mix uses each model where its strengths line up with the job and avoids paying premium rates for tasks that do not need them.

Why BYO-AI is the architecture that makes this work

A workflow that depends on three models cannot live inside a tool that supports only one. The fixed-tier subscription products in the fiction-writing space typically commit you to a single backend chosen by the platform. That is the wrong shape for a serious project.

BYO-AI (bring your own key) is the architecture that lets you pick the right model per job. You connect your Anthropic key, your OpenAI key, your Google key, and the platform routes each call to the model best suited for that step. You see actual provider costs. You can swap models when better versions ship. You keep control of where your prose goes and what gets billed.

Pendraic is built around this. The platform supports direct provider clients for Anthropic, OpenAI, Google, and OpenRouter, so a single project can route drafting to Claude, polish to GPT, and long-context audits to Gemini without leaving the application. The model choice is per task, configurable, and visible. The breakdown of how BYO-AI compares with Managed AI as a fallback is on the BYOK vs Managed AI page.

A note on what not to do

Do not commit to a single model "because it is best." The honest answer is that no single model is best at everything fiction writing requires. The hype cycles will continue to declare a new champion every six months. The actual practice does not change much. Use the right tool for the job and keep your workflow flexible enough to swap providers when the prices and capabilities shift.

Also: do not pick a model based on benchmark scores from non-fiction contexts. The benchmarks that get marketed (math, code, reasoning) tell you almost nothing about fiction prose quality. The only test that matters is running your own scene plans through the candidate models and reading the output. Five scenes is enough to feel the difference.

Where Pendraic comes in

Pendraic's multi-backend architecture lets a single project use Claude, GPT-5, and Gemini concurrently, with each call routed by step. The drafting layer can hit Claude for voice. The polish layer can hit GPT for cost. The audit layer can hit Gemini for context length. The writer connects their own provider keys and pays providers directly. No double-billing layer, no rerouting through a gateway, no hidden mark-up.

If you want to run the portfolio approach on your own manuscript without wiring three APIs by hand, sign in and connect the keys you already have. The platform handles the routing. You see every call on your own provider dashboard.

The best model for fiction writing is the one that is correct for the step you are on. That answer requires the architecture to support it. Pick that architecture first, then pick the models.