Yesterday, I published a new Signal & Friction article titled “The End of Local by Default”
It was meant to surface a structural shift I keep seeing; one that helps explain why local market brands are quietly disappearing from AI-generated answers, even when users never asked for anything global. I tried to explain that Google takes into account our current location, the language, and search history to return results more geographically relevant to us, and why that does not happen in AI search.
But the more I’ve sat with the Peec.ai research and, more importantly, press and industry pundit amplification of an English bias, the more I’ve realized something important:
It’s not that simple! And, like a dog with a bone, I haven’t been able to let it go.
I totally agree that the examples and data points in the research have gravitated the assumption of an “English bias,” and while the data clearly shows English playing a significant role in AI research fan-outs, stopping there misses the deeper mechanics at work.
This post is about slowing the conversation down and unpacking what’s actually happening under the hood, and why and how query intent, category structure, and superlatives like “best” quietly reshape a market-leading brand’s eligibility to be in the result set long before ranking ever begins.
What Peec.ai’s Data Shows and What It Likely Means
The Peec.ai research has been widely summarized by some as “ChatGPT switches to English.” There are at least 5 headlines over the past two days that make that assertion.
To understand what’s actually happening, we need to separate measured facts from reasonable inference, and then look at how query intent, category effects, and superlatives interact during AI reasoning.
Fact 1: ChatGPT Uses English as a Supporting Research Layer
Peec.ai analyzed more than 10 million user prompts and 20 million query fan-outs generated by ChatGPT.
Two observations are clear:
- ~78% of non-English prompts include at least one English-language fan-out
- ~43% of all fan-outs generated from non-English prompts are performed in English
This does not mean:
- The final complete answer is English
- English dominates every step of reasoning
- Native-language sources are ignored
It means English is frequently used in the research process, even when the user asks in another language. That summary is directionally accurate but incomplete.
Fact 2: ChatGPT Starts in the User’s Language
Peec.ai also shows that ChatGPT does not immediately default to English.
For non-English prompts:
- The first fan-out is typically in the user’s language
- Subsequent fan-outs may mix native language and English
This indicates intent: the system is attempting local grounding first. English is introduced later, not as a default starting point. This debunks the sensationalized clickbait headline that claims ChatGPT is overly English-biased.
Fact 3: This Behavior Is Consistent Across Non-English Markets
Peec.ai filtered their dataset to include only cases where the query language matched the user location (e.g., Spanish queries from Spain, German queries from Germany).
Even with this control:
- No non-English language fell below 60% session-level English inclusion
This tells us the behavior is systemic, not an edge case or artifact.
Reasonable Inference 1: Query Intent Determines When English Is Invoked
Not all fan-outs serve the same purpose.
Some appear to answer:
- What does this concept mean?
- How is this category typically evaluated?
- What criteria define “best” in this space?
These are conceptual questions, not logistical ones.
Conceptual vs. Logistical Questions (A Crucial Distinction)
This distinction matters more than most discussions acknowledge.
- Conceptual questions focus on understanding and are characterized by definitions, categories, comparisons, and evaluation frameworks.
Examples:- What makes a cosmetics brand “high quality”?
- What criteria are used to rank software companies?
- What does “best” usually mean in this category?
- Logistical questions focus on potential action, with explicit attention to geographic location, availability, constraints, and next steps.
Examples:- Which cosmetics brands are sold in Spain?
- Where can I buy this product near me?
- Which auction portal should I use in Poland?
Traditional search engines quietly converted many conceptually oriented queries into practical answers by applying location, language, and server-layer constraints.
AI systems do not.
When a prompt lacks explicit logistical constraints, the system treats it as conceptual by default and optimizes for defining the concept correctly before worrying about situational or geographical relevance.
That’s the pivot point where English often enters the reasoning process.
English is the most efficient corpus for:
- Definitions
- Taxonomies
- Comparative frameworks
- Widely accepted evaluation criteria
This explains why English is often introduced after initial language grounding.
Reasonable Inference 2: Category Effects Matter More Than Language Alone
Certain categories may trigger English fan-outs more aggressively:
- Highly standardized categories (software, cosmetics, tech)
- Categories dominated by rankings and comparisons
- Categories with strong global brand narratives
Others, especially locally regulated or service-based categories, likely rely more heavily on native-language sources.
This suggests category structure, not language preference, plays a major role.
Reasonable Inference 3: Superlatives Actively Reshape the Eligibility Gate
One of the most revealing insights from the Peec.ai analysis isn’t just that English fan-outs occur, but it is how the meaning of the query evolves across those fan-outs. This IS where the magic and the geographical dysfunction happen.
In their Spanish cosmetics example discussed earlier, Peec.ai’s tooling shows the actual background search queries ChatGPT generated while researching the answer.
Step 1: The First Fan-Out Widens the Lens
The original user query was:
“¿Cuáles son las mejores marcas de cosméticos?”
The first fan-out, however, was in English:
“best cosmetic brands skincare makeup top brands”
This does two things:
- Translates the query
- Introduces “top brands”
At this point, the system has already shifted from “cosmetics brands” to “top cosmetic brands”, which naturally biases the evidence pool toward a wider and, dare I say, global rankings and comparison lists.
This is the proverbial bone that I could not let loose.
- Why did the system choose English at that first step?
- What in the prompt forced a deviation from the native language?
That “why” matters because the first fan-out sets the evaluation frame for everything that follows.
Once the system defines the problem as “top cosmetic brands” in a global, English-language corpus, subsequent fan-outs, even when they return to Spanish, operate inside an already widened lens.
Peec.ai’s data shows where the bias enters the process. The open question is why that entry point exists and why it has such an outsized impact on what follows. The data itself doesn’t answer that. But several plausible explanations fit both this behavior and what we know about how large language models reason — without requiring intent, preference, or any deliberate design bias.
Most likely, the first English fan-out reflects a combination of four structural dynamics:
- English as a pivot language for defining evaluative concepts
When the system encounters an unconstrained superlative like “best,” it first needs to establish what “best” usually means in that category. English provides the densest, most standardized corpus for defining evaluation frameworks. - Canonical phrase stabilization
Phrases like “best brands” or “top cosmetic brands” exist in highly stable, repeated forms in English. Normalizing the query into those canonical patterns makes downstream comparison and synthesis easier. - Superlatives triggering a global baseline check
Before attempting any localization, the system appears to establish a global reference point, essentially asking, “What does the broader conversation consider ‘best’ here?” - Risk minimization at the first reasoning step
At maximum uncertainty, the system favors the corpus with the highest probability of usable, comparable material. Statistically, that’s English.
None of these requires the model to “prefer” English.
These simply explain why English may become the scope-setting substrate at the very start of the reasoning process.
Step 2: The Second Fan-Out Expands the Frame Further
The second fan-out returns to Spanish but with a crucial change:
“Mejores marcas de cosméticos globales alta calidad”
(“Top global high-quality cosmetic brands”)
The user never asked for:
- Global brands
- International consensus
- Cross-market leaders
Yet the system introduced “global” and “high-quality” on its own.
This is the pivotal moment.
Step 3: The Eligibility Gate Is Now Set
By the time synthesis begins, the system is no longer answering the original prompt:
“What are the best cosmetics brands?”
It is answering:
“What are the top global, high-quality cosmetics brands?”
That reframing disqualifies local-only brands before ranking even begins.
A Spanish cosmetics brand that:
- Primarily operates in Spain
- Lacks extensive international documentation
- Appears infrequently in global ranking lists
is now ineligible — not because it’s inferior, but because it does not satisfy the expanded definition the system created.
This is the eligibility gate forming in real time.
Mini Rant – I have written more than a dozen articles on this idea of eligibility gates and have yet to get any traction on these critical factors in appearing in AI-generated results. A few friends told me that it might be too much for those simply wanting a simple solution to replicate the previous gameification of the ten blue links.
Reasonable Inference 4: English Is a Defensibility Shortcut, Not a Preference
Peec.ai suggests authority signals and risk minimization as contributing factors, to which I enthusiastically agreed in the other article.
Both point to the same underlying mechanism:
English content is statistically easier to defend.
More citations.
More repetition.
More comparative material.
When the system is uncertain, it reaches for the most defensible substrate, not the most local one.
What This Does Not Mean
It does not mean:
- ChatGPT prefers English answers
- Localization is broken
- Local content is ignored
- The research is flawed
It means:
When intent is conceptual and constraints are missing, English becomes the system’s validation layer.
Why This Matters
The real risk isn’t that ChatGPT uses English internally.
The risk is that:
- Users assume locality is implicit
- Brands assume the system will infer context
- And exclusion happens before ranking even begins
This is the end of local-by-default, not because AI is broken, but because our assumptions are. For your reading pleasure, I offer my Search Engine Journal article, “Why Global Search Misalignment is an Engineering Feature and A Business Bug,” which explains that this is a feature, not a bug, of AI search.