NEWS

Google Is Letting Software Engineers Use Gemini in Job Interviews

Published

2 months ago

May 22, 2026

Google is piloting an AI-assisted interview format for software engineering candidates, requiring them to use Gemini, the company’s own AI model, during a live coding round that tests how well they can direct the tool, not how well they can code without it. Beginning in the second half of this year, the pilot targets junior and mid-level roles on select US teams, and interviewers will grade candidates on a rubric the company calls “AI fluency.” The backdrop is a figure Sundar Pichai, Google’s chief executive officer, shared in an April 22 blog post: three-quarters of all new code written at Google is now AI-generated and approved by engineers, up from half that figure the previous autumn.

When Google redraws the hiring floor, the rest of the industry takes note. The pilot’s initial scope is narrow; its implications for how software engineers prepare, interview, and get hired are not.

Inside Google’s Code Comprehension Round

According to an internal document reviewed by Business Insider, candidates who enter Google’s AI-assisted pilot will use Gemini during the “code comprehension” round, a session dedicated to reading, debugging, and optimizing an existing codebase. Interviewers will not hand candidates a blank editor and a well-worn algorithmic problem. They will hand candidates live code and access to Gemini, then observe what the candidates do with both.

Brian Ong, Google’s vice president of recruiting, confirmed the pilot’s direction to Business Insider. “We’re always evolving our interview processes to ensure we’re recruiting and hiring the best talent,” Ong said. “As a part of that, we’re rolling out a pilot for software engineering interviews to be more reflective of how our teams are operating in the AI era.”

The internal document labels the format “human-led, AI-assisted.” Interviewers are being asked to score a new criterion: “AI fluency,” covering prompt engineering, output validation, and iterative debugging. Candidates who appear to hand off their reasoning to Gemini rather than directing it have received negative feedback in early accounts gathered by Exponent’s analysis of Google’s new AI-assisted coding round. Directing Gemini well is what earns a passing score; treating it as a ghostwriter does the opposite.

Google Cloud and the company’s platforms and devices unit are first in the pilot queue. Broader rollout depends on what early performance data shows, and Google has not given a timeline for that evaluation.

H2 2026 — planned launch window for the pilot across select US engineering teams
Junior and mid-level roles — the initial candidate cohort targeted by the new format
Gemini — the only AI model candidates are permitted to use during the coding round
“Human-led, AI-assisted” — Google’s internal label for the new interview structure

Google Gemini AI-assisted coding interview tests software engineers on AI fluency.

Three Interview Changes at Once

The AI-assisted coding round draws the most attention, but it arrives alongside two simultaneous structural changes to other parts of the interview loop, per the same internal document.

Code comprehension with Gemini: Candidates engage with an existing codebase, using the AI assistant to surface bugs, test hypotheses, and optimize logic. Evaluation centers on how candidates direct the model, not whether they can code without it.
Redesigned behavioral round: Google’s “Googleyness and Leadership” interview, long focused on personality-style questions, now incorporates a technical design conversation built around the candidate’s own prior project work.
Open-ended engineering challenge for junior hires: One traditional technical round is replaced by a session built around a deliberately ambiguous problem. Candidates must define scope and ask clarifying questions before writing any code.

Candidate accounts gathered by Exponent found that one new-graduate interviewer prompt consisted of a single vague sentence, with the interviewer providing no additional context after presenting it. The question being asked was not whether the candidate could solve it immediately; it was whether they would slow down, scope the problem, and clarify constraints before touching the keyboard.

Taken together, the three changes remove the premium on memorized algorithm patterns and substitute a premium on situated judgment: knowing when to ask, when to use the AI assistant, and when to verify what it returned rather than assuming the output is correct.

Big Tech Was Already There

Canva, the web-based design platform, announced in June 2025 that it expects candidates for backend, frontend, and machine learning engineering roles to use tools including Copilot, Cursor, and Claude during technical interviews, redesigning its questions so they “can’t be solved with a single prompt” and require iterative reasoning and requirement clarification. Meta launched an AI-enabled coding round in October 2025 that gives candidates access to a range of AI models including GPT-4o mini, Claude Haiku, Gemini 2.5 Pro, and Llama 4 Maverick via a built-in CoderPad interface, replacing one of the two traditional algorithmic coding rounds at the onsite stage. Shopify and Rippling have explicitly allowed candidates to bring their preferred AI copilots into live coding sessions. A Meta internal message, reported by Hello Interview, described the rationale plainly: the new format is “more representative of the developer environment that our future employees will work in, and also makes LLM-based cheating less effective.”

Company	AI Model(s) Permitted	Scope	Primary Evaluation Focus
Google (pilot)	Gemini only	Code comprehension, junior/mid-level, select US teams	AI fluency, prompt engineering, output validation
Meta	GPT-4o mini, Claude (Haiku/Sonnet), Gemini, Llama 4 (candidate’s choice) since October 2025	Replaces one of two onsite coding rounds; SWE and ML roles	Code quality, AI-assisted debugging, verification, communication
Canva	Copilot, Cursor, Claude (candidate’s choice)	All engineering roles since June 2025	Engineering judgment under ambiguity; iterative, not single-prompt solutions
Shopify / Rippling	Candidate’s preferred AI copilot	Live coding sessions	Productivity and communication with chosen toolchain

Google’s entry, though not the origin point of this trend, carries a weight in the engineering hiring market that Canva and even Meta do not when setting candidate preparation expectations across the broader industry.

Measuring AI Fluency, Not Recall

Defining “AI fluency” as a hiring criterion is harder to do than to announce. The rubric Google is reportedly building watches for four specific behaviors: whether candidates form explicit hypotheses before prompting, whether they write targeted queries rather than generic ones, whether they validate and critique AI output rather than simply accept it, and whether they iterate when the first response is incomplete or wrong.

A candidate who pastes a problem description into an AI assistant and submits whatever comes back will score poorly under this framework. A candidate who uses the model to generate a first-pass draft, manually tests edge cases, identifies where the output fails, and explains the reasoning behind each correction demonstrates the kind of AI fluency the format is built to surface. Using the model as a thinking partner, with the candidate’s own reasoning driving every decision, is the target behavior. Meta’s interviewers watch every AI interaction in real time, seeing the exact prompts, responses, and how the candidate handles each output.

I guess this is like asking a kid to take a math test without a calculator.

Emily Cohen, head of people and operations at AI coding startup Cognition, made that comment to Business Insider about technical interviews that ban AI use entirely. The calculator analogy has real limits: a calculator does not hallucinate wrong outputs or produce plausible-looking code that fails on the second edge case. But the directional argument is hard to dismiss.

Greg Brockman, president of OpenAI, has said that AI now generates around 80% of code at his company, placing OpenAI and Google in the same territory on what their engineers actually produce each day.

At both organizations, and across every employer where AI generates the majority of initial code, the productive engineer is the one who can direct, evaluate, and correct that output. Testing candidates on the ability to recall algorithm patterns from memory, in isolation, is a mismatch with that working reality that large employers are finding increasingly difficult to defend.

The LeetCode Economy Under Pressure

DS&A — Data Structures and Algorithms — prep culture has built a substantial commercial layer around the gap between what software engineers do on the job and what they are traditionally asked to perform in a hiring interview. LeetCode, the dominant algorithm-practice platform, monetizes that gap through premium subscriptions and company-specific problem sets organized by difficulty and employer. Bootcamps have made DS&A pattern recognition a core curriculum pillar. Books built from research across FAANG interview experiences have sold millions of copies on the premise that there is a finite and learnable pattern set — mastered through repetition — that passes the coding screen.

A rougher tier exists beneath the legitimate prep market. Products that explicitly market “invisible” AI assistance for live coding sessions, advertising screen-share-proof overlays and hidden process names, have built their pitch entirely on the premise that AI use in interviews is banned. Their selling proposition is circumvention. Google’s format strips that premise away. When an approved AI model is already on the table, the scoring question changes from whether a candidate used AI to how well they used it.

Rote algorithmic pattern memorization loses value in a format where working alongside AI is the expectation, not the violation. DS&A knowledge does not disappear from relevance; algorithms still underlie the debugging and extension work. But grinding through hundreds of classic problems to recognize a dynamic programming pattern under whiteboard pressure is a different preparation than learning to direct, critique, and iterate on AI-generated code in real time, and the latter is now what Google says it will test.

Integrity, Equity, and the Gemini Lock-In

Two concerns sit below the progressive framing of Google’s pilot, and neither is addressed by the company’s public statements alone.

The integrity question is structural. An open-ended AI-assisted format requires guardrails to prevent candidates from receiving guidance beyond the permitted session, from passing proprietary coding problems to outside parties, or from having a third party steer their prompts in real time. Audit logs of model interactions, standardized candidate environments, and proctored sessions are the standard mitigation toolkit. Whether Google’s pilot infrastructure includes all three is not established in any public disclosure from the company.

The model restriction is the sharper concern. Meta gives candidates a selection of AI systems during its AI-enabled round. Google requires only Gemini. A candidate who has built two years of daily fluency with GitHub Copilot or Claude enters Google’s interview at a tool disadvantage, regardless of the quality of their underlying engineering judgment. Familiarity with Gemini’s output tendencies, its calibration on ambiguous prompts, and its code style defaults represents a preparation variable distributed unevenly across the candidate pool. Google’s stated goal is to hire for practical skills; requiring a specific proprietary model adds friction that practical skill alone does not clear. There is also a structural incentive worth naming: requiring just as Canva’s engineering careers page made AI a condition of its interview, Google’s Gemini requirement turns every candidate who practises for the pilot into a Gemini user before they even apply.

If the pilot’s early cohort data connects AI fluency scores to meaningfully stronger six-month on-the-job performance, the industry-standard question resolves fast at Google’s scale, and expect every major tech employer to build similar formats around their own preferred models within a year. If the data is thin, or the format proves easier to game than the rubric anticipates, the whiteboard era does not end with a clean new benchmark; it ends with a prolonged argument over what “engineering skill” means when the tools keep changing.