NEWS

Google Requires Gemini in Coding Interviews as AI Code Hits 75%

Published

2 months ago

May 22, 2026

Three-quarters of all new code at Google is now generated by AI and approved by engineers. That figure, shared by chief executive Sundar Pichai in an April 22 blog post, is the premise behind a pilot rolling out in the second half of this year: software engineering candidates for junior and mid-level roles on select US teams will use Gemini during a live coding round, graded on a criterion the company calls “AI fluency.”

Meta, Canva, Shopify, and Rippling all moved first on AI-assisted interviews, and each gave candidates a choice of model. Google’s version requires one specific model. That design decision, its effect on candidate preparation, and whether it becomes the broader industry template are the questions the pilot’s first performance data will answer.

Three Rounds Redesigned at Once

The code comprehension round draws the headlines, but per an internal document reviewed by Business Insider, two other parts of Google’s interview loop are changing at the same time. Taken together, the three changes retire the old premium on memorized algorithmic patterns and substitute a premium on situated judgment.

Code comprehension with Gemini: Candidates work through a multi-file codebase in a 60-minute CoderPad session with three panels: a file explorer, a code editor, and an AI chat window. The model can respond in the chat but cannot directly edit files. The session progresses through bug fixing, core implementation, and optimization phases.
Redesigned behavioral round: Google’s “Googleyness and Leadership” interview, long built around personality and culture-fit questions, now incorporates a technical design conversation based on the candidate’s own prior engineering work. The shift moves the round from how someone thinks in theory to how they thought through a problem they already shipped.
Open-ended engineering challenge for junior hires: One traditional technical round is replaced by a session built around a deliberately ambiguous problem. Candidates must define scope and ask clarifying questions before writing any code. Candidate accounts compiled by Exponent’s analysis of Google’s AI-assisted coding format found that some new-graduate prompts consist of a single vague sentence, the interviewer providing no additional context after presenting it.

Brian Ong, Google’s vice president of recruiting, confirmed the pilot’s direction to Business Insider. “We’re always evolving our interview processes to ensure we’re recruiting and hiring the best talent,” Ong said. “As a part of that, we’re rolling out a pilot for software engineering interviews to be more reflective of how our teams are operating in the AI era.” Google Cloud and the platforms and devices unit are first in the pilot queue.

Broader rollout depends on what early performance data shows, and Google has not given a timeline for that evaluation. The choice to start with cloud engineering teams is deliberate: those teams work with customers whose own production systems are increasingly AI-augmented, making an AI-fluency screen a practical fit before it becomes a universal hiring standard.

Google adds Gemini to software engineering interviews to test AI fluency skills.

The 75 Percent Mandate

The number driving the format change is not a soft aspiration. Pichai’s April disclosure was specific: AI-generated code accounts for three-quarters of all new code produced internally at Google, up from roughly half that figure the previous autumn. OpenAI president Greg Brockman has placed his own company in similar territory, saying AI coding tools went from generating around 20% of code to around 80% “over the course of December” alone. At both organizations, and at every employer where the majority of initial code comes from a model rather than a developer typing, the productive engineer is the one who can direct, evaluate, and correct that output.

75% of new code at Google is AI-generated and engineer-approved, as of April 2026
H2 2026 is the planned launch window for the pilot across select US engineering teams
80% of code at OpenAI is AI-generated, per Greg Brockman at a Sequoia Capital event
8 in 10 hiring managers now prioritize AI skills, sometimes above additional years of experience, per multiple recruiting industry surveys

The model has been expanding rapidly across Google’s product lines, and Google I/O 2026 underlined that scope, shipping a new model generation across search, mobile, and a wave of new hardware categories. Requiring candidates to demonstrate proficiency with the company’s AI assistant in interviews fits neatly with requiring engineers to use it once they are on the payroll.

Google Arrived Last to Its Own Party

Google’s pilot is not the industry’s first AI-assisted interview format, and its competitors have operated more open models. Canva announced in June 2025 that backend, frontend, and machine learning engineering candidates are expected to use tools including Copilot, Cursor, or Claude, with questions designed so they “can’t be solved with a single prompt” and require iterative reasoning and requirement clarification. Meta launched an AI-enabled coding round in October 2025 through CoderPad, giving candidates access to GPT-4o mini, Claude Haiku, Gemini 2.5 Pro, and Llama 4 Maverick. Shopify and Rippling have allowed candidates to bring their preferred AI copilot into live sessions.

Company	AI Model(s) Permitted	Format Scope	Primary Evaluation Focus
Google (pilot, H2 2026)	Gemini only	Code comprehension, junior/mid-level, select US teams	AI fluency, prompt engineering, output validation
Meta (since October 2025)	GPT-4o mini, Claude Haiku, Gemini 2.5 Pro, Llama 4 (candidate’s choice)	Replaces one of two onsite coding rounds; software and ML engineering roles	Code quality, AI-assisted debugging, verification, communication
Canva (since June 2025)	Copilot, Cursor, Claude (candidate’s choice)	All engineering roles	Engineering judgment under ambiguity; iterative reasoning required
Shopify / Rippling	Candidate’s preferred AI copilot	Live coding sessions	Productivity and communication with chosen toolchain

Google’s weight in the engineering hiring market is the differentiating factor. When Canva changes its interview format, engineers who target Canva adjust their preparation. When Google changes, the question becomes whether the whole field adjusts with it. A Meta internal message, reported by Hello Interview, framed the rationale plainly: the AI-enabled format is “more representative of the developer environment that our future employees will work in, and also makes LLM-based cheating less effective.”

The model-choice policy is where the two approaches visibly diverge. Every other employer in the table above gives candidates agency over which AI system they use during the interview. Google does not.

The Gemini Lock-In and Who Pays for It

Two concerns sit beneath the progressive framing of Google’s pilot, and neither is addressed by the company’s public statements alone.

The model restriction is the sharper one. A candidate who has built two years of daily fluency with GitHub Copilot or Claude enters Google’s interview at a tool disadvantage that has nothing to do with the quality of their underlying engineering judgment. Familiarity with the model’s output tendencies, its calibration on ambiguous prompts, and its code-style defaults is a preparation variable distributed unevenly across the candidate pool. As one analysis from staffing firm Kore1 noted, tool familiarity shifts faster than hiring cycles: “A year ago everyone was on Copilot. Six months ago Cursor took the IDE share.” Locking the interview to one tool creates a snapshot test of one vendor’s position at a single moment in time, and that snapshot is not evenly distributed across applicants.

The structural incentive behind the choice is worth naming. The single-model requirement turns every candidate who prepares for the pilot into a Gemini user before they even apply. When pilot data eventually connects fluency scores to six-month on-the-job performance, the loop closes: preparation demands the company’s model, employment uses it, and the interview system reinforces both. Meta’s open-model approach does not generate that vendor-specific feedback loop for any single AI provider.

The integrity question runs parallel to the equity one. An open-ended AI-assisted format requires guardrails against candidates receiving outside guidance, passing proprietary coding problems to third parties, or having someone else steer their prompts in real time. Audit logs of model interactions, standardized candidate environments, and proctored sessions are the standard mitigation toolkit. Whether Google’s pilot infrastructure includes all three has not been confirmed in any public disclosure from the company.

LeetCode’s Obsolescence Problem

DS&A (Data Structures and Algorithms) prep culture built a substantial commercial layer around the gap between what software engineers do on the job and what traditional hiring asked them to demonstrate. LeetCode, the dominant algorithm-practice platform, reports over 35 million registered users and monetizes through subscriptions priced at roughly $35 a month or $159 a year, with company-specific problem sets organized by employer and difficulty driving the majority of premium conversions. Bootcamps made DS&A pattern recognition a core curriculum pillar. Books built on aggregated FAANG interview research sold on the premise that a finite, learnable pattern set mastered through repetition reliably passes the coding screen.

A rougher tier exists beneath the legitimate prep market. Products that advertised screen-share-proof AI overlays and hidden process names built their pitch entirely on the premise that AI use in interviews is banned. Their selling proposition is circumvention. When an approved AI model is already on the table, that pitch collapses: the scoring question shifts from whether a candidate used AI to how skillfully they used it, which is a question those products were never designed to answer.

DS&A knowledge does not disappear from relevance; algorithms still underlie the debugging and optimization work the code comprehension round tests. But grinding through hundreds of dynamic programming problems to recognize a pattern under whiteboard pressure is a different preparation from learning to direct, critique, and iterate on AI-generated code in real time, and Google’s format explicitly evaluates the latter.

What a Passing Score Looks Like

Defining “AI fluency” as a hiring criterion is harder to do than to announce. The rubric Google is reportedly building watches for four specific behaviors: whether candidates form explicit hypotheses before prompting; whether they write targeted queries rather than generic ones; whether they validate and critique AI output rather than accept it at face value; and whether they iterate when the first response is incomplete or wrong.

I guess this is like asking a kid to take a math test without a calculator.

Emily Cohen, head of people and operations at AI coding startup Cognition, made that remark to Business Insider about technical interviews that ban AI use entirely. The calculator analogy has real limits: a calculator does not hallucinate wrong outputs, and it does not produce plausible-looking code that fails on the second edge case. A language model does both. The candidate skill this format surfaces is treating AI output with the same skepticism a senior engineer applies to a junior colleague’s pull request: assume it mostly works, test for where it doesn’t, and own the final result regardless.

Meta’s interviewers watch every AI interaction during their version of this round in real time, seeing the exact prompts, responses, and how each candidate handles the output.

The format is built to surface a specific profile: someone who uses the model to generate a first-pass draft, manually tests edge cases, identifies where the output fails, and explains the reasoning behind each correction. If the pilot’s early cohort data connects those behaviors to measurably stronger six-month on-the-job performance, Google’s scale means every major tech employer faces pressure to build similar formats within a year, and the single-model requirement becomes the template rather than the outlier. If the data is thin, or the format proves easier to game than the rubric anticipates, the whiteboard era ends not with a clean new benchmark but with a prolonged argument over what engineering skill means when the tools change faster than the hiring processes designed to measure them.