OSINT Β· June 2026

3,736,742 Google internal URLs. We opened none of them.

We read the paths, not the contents. The mere existence of a URL is intelligence: what it names reveals a team, a system, a machine.

3 736 742internal URLs
7 542unique hosts
2 377printers
917engineers
Powered by RESONEO

No wall was crossed. Almost every one of these addresses answers with a redirect to an internal login portal: the content stays shut. But the path itself is public. And a path that reads superroot/youtube/youtube_controversial_query_blacklist already says plenty, even locked.

Table of contents

1 The principle: the URL is the intelligence 2 2024: what we had already taken apart 3 The lexicon of Google components 4 The engine, laid bare 5 The control plane: where a ranking change ships 6 Google's editorial blacklists, by name 7 The human machine behind ranking 8 AI Mode & Gemini, caught in pre-prod 9 The auction trading floor 10 2,377 printers that talk too much 11 The physical world, mapped from hostnames 12 917 engineers, 917 home pages 13 go/: the bookmarks that name the projects 14 google3, mapped by its paths πŸ”Ž Search the 3.7M URLs
1

The principle: the URL is the intelligence

We don't read what the page says. We read its address.

A URL names things. Before any content, it announces a host, a folder, a file. ranklab.teams.x20web.corp.google.com reveals a ranking test bench; badurls_demoteindex reveals that "demote" is kept distinct from "remove". The page body can stay shut: the naming has already leaked.

That is the method's deliberate limit, and its strength. We work on public data with standard OSINT techniques. We do not know what these pages contain. We know they exist, and how their authors named them.

What we did NOT do

No internal access, no login, no bypass, no content opened. Nearly every address answers with a 302 redirect to login.corp. We stop at the door; we only read what is written on it.

2

2024: what we had already taken apart

The starting point: our reconstruction of how Search works, after the Google Leaks.

In 2024, following the Google Leaks, we reconstructed how Google Search works: the components, their codenames, the crawl β†’ indexing β†’ scoring β†’ serving chain. Here is the infographic we published. It serves as the reading key for the next section: the lexicon.

RESONEO 2024 infographic: how Google Search works

Read the full analysis: Google Leak, part 6: How does Google Search work.

3

The lexicon of Google components

134 terms from the Google Leaks, checked against the 3.7M URLs and the code paths. Tap a pill to open its definition.

Each term's badge says what the 2026 extraction could confirm: present in the archived URLs, present only in code paths, or known solely from the 2024 leak.
seen in the archived URLs seen only in the code never public (2024 leak) still to confirm

Infrastructure & development

Crawl, fetch, rendering

Indexing

Annotation, embedding, topicality

Fusion & information retrieval (IR)

Scoring & ranking

ML models & deep learning

Serving & front-end

Query understanding & expansion

Twiddlers (re-ranking)

Click-data & user signals

Evaluation & quality (Quality Raters)

Signals & attributes (PerDocData)

4

The engine, laid bare

The ranking components our 2026 paths confirm, one by one.

Ascorer, the main IR scorer, appears in plain text in a LIVE debug flag: eng-hip-ascorer. Above it, the Twiddlers (189 occurrences in the URLs) re-rank results under SuperRoot (474 occurrences). The GWS pipeline reads tier by tier through asdebugger: qrewrite β†’ optq β†’ sr_bns β†’ gws_bns.

Two Googlebot lists coexist: badurls_spamindex (kill) and badurls_demoteindex (demote). A youtube_controversial_query_blacklist wired into SuperRoot attests to per-vertical editorial intervention. And hosts like signalslookup, ranklab.teams.x20web, raterhub (EWOK) or hc-ai-mode-staging name the ranking test benches.

Sample of ranking, scoring and serving paths
5

The control plane: where a ranking change ships

Experiments, kill-switches, launches: the lever above the engine.

Method note

For short tokens (sge, magi, ewok, sxs…) a plain grep is poisoned by base64 hidden in upxsrf= auth tokens. Every count below is anchored on host names, not substrings. That is what makes them defensible. That Google hid its rater platform behind noise-looking tokens is part of the story.

Above the engine, one tier drives it all: A/B experiments and kill-switches. Mendel is the master platform (internal product Mendel Insights); Finch is its Chrome-side twin, where each study is a .gcl file filed by lifecycle state (launched/). There literally is a KillSwitchExample.gcl. Everything flows through launch.corp.google.com/launch/$1, the central tracker, whose real IDs we recovered.

In prod, rollouts.corp ramps changes in waves (Progressive Borg Rollout, pbr_view=stages). And a single querydebugger.corp.google.com/eval/unified URL holds a full Search experiment config frozen in a string: a pinned GWS binary, named flags, and the marker __data_rollout__…__launched__:true, the exact flag that says "this experiment is live in prod".

And these aren't abstractions: here are real, named, dated AI Mode experiments, gathered through a separate OSINT channel, not from the 3.7M-URL list. Each name gives away a feature in flight: actionable entities, a research agent, ground transportation.

Recovered AI Mode experiments: names and lifecycle states

Real AI Mode experiments, captured separately. We read their name and state: ::Launch (live), ::Experiment (under test), ::Control (control group), without ever opening the feature.

Control plane: Mendel, Finch, launches, rollouts
6

Google's editorial blacklists, by name

A leak meant to reveal only a filename reveals its edit history.

Some decisions aren't algorithms: they're hand-edited files dropped in google3/googledata/. Some URLs carry the edit history of youtube_controversial_query_blacklist: 42 distinct revisions whose tokens encode the terms being added (…?cl=…/40mandalay, /40shooter, /40paddock, /40crisis, /40bomber). The list fills up during the Las Vegas shooting (Oct 2017). The filename was meant to hide everything; it reveals the contents.

In the same folder sit two Googlebot lists, badurls_spamindex (kill) and badurls_demoteindex (demote), and a plain-HTTP manual-action workflow: spam-policy.prom.corp.google.com/{submit,approve,reject}. The anti-spam org chart shows: webspam.corp, adspam.corp, spamops.corp/QueryAide, fraudbin.corp/fraud/takeNextTicket.

The OSINT angle

Censorship leaves a version trail. Each cl=… revision is a word added by hand.

Blacklists, Googlebot lists and anti-spam workflow
7

The human machine behind ranking

16,000 raters, their platform and their verdicts, by the hosts.

The lexicon names EWOK and Needs-Met as concepts; here are the hosts that serve them. raterhub.corp.google.com is the Quality Raters' platform (the very one enforcing the public "Quality Rater Guidelines"). And eval-analytics.corp.google.com/querygroup?experimentId=…&queryGroupIndex=0 exposes the machinery: a named experiment ID sliced into query-groups, the exact unit raters score. The arrow is complete: rater β†’ query-group β†’ experiment verdict.

Around it orbit the human "golden set" (goldendata-geq.googleplex.com, gwstest-gold, a side-by-side test GWS), the annotation factory (merlot-ops-labelstream-ui, crowdcompute.googleplex.com) and satisfaction surveys (gutssurveys.corp, GUTS = Google User Satisfaction).

Rater platform, eval, golden set and annotation
8

AI Mode & Gemini, caught in pre-prod

The hottest SEO topic, leaked by its staging hosts.

A pre-prod hostname is pure intelligence: it attests the product exists, its internal codename and its dogfooding, while the body stays shut. hc-ai-mode-staging.corp.google.com/?entry_point_id=1 names "AI Mode", the conversational search surface reshaping SEO. Next to it, aistudio-preprod exposes a model slug (gemini-2-5-flash-image) and bundled demo apps.

You can even reconstruct the release pipeline from suffixes alone: geminidataanalytics exists as a clean five-rung ladder, dev β†’ autopush β†’ staging β†’ preprod β†’ prod. And bard.corp.google.com/sitemap.xml stays indexed after the Gemini rebrand: an OSINT fossil. In code, abuse/llm/agents/adi_agent names an LLM agent built to police LLM abuse.

AI Mode, AI Studio hosts and the Gemini ladder
9

The auction trading floor

For SEA: the reserve-price levers, seen from inside.

A single host, adxdashboard.corp.google.com, exposes the page tree of Ad Exchange's operator console: the reserve floor (/pub/reserve_price_opt_summary.html), the real-time auction (/rtb/rtb_dashboard.html) and publisher payout (/pub-mon-payout.html). SEA pros only see this loop's output; here is the machine.

The organic/paid seam is named: SearchAdsViewerRenderingUi is the component that paints ads onto the SERP; displayadssearch and unity-adsense-search are where ad-serving queries the search stack. And anti-spam has an ads twin: adspam.corp, with a verbatim signal, MobileAppQueryIpClickStats28Days (per-IP click stats over 28 days, click-fraud detection).

Ad Exchange console, ad-serving and fraud signals
10

2,377 printers that talk too much

All on *.printer.in.goog, hand-named by employees.

Every printer carries a name chosen by a team. Put end to end, 2,377 of them sketch an internal culture, and sometimes a geography. You meet the AIs out to destroy humanity (hal9000, skynet, glados, ultron, jarvis), video games (mario, zelda, sonic, tetris), James Bond (goldfinger, jaws), food (sushi, taco, croissant), and a nod to The IT Crowd: 01189998819991197253.

The OSINT angle: the names geolocate

Many printers encode a location: 24th-floor, 25th-floor, access-sf1, 3cc-reception. Cross-referenced, they let you map floors and buildings from the printers alone. There are even named cameras on the same domain (3ccsecurity-cam).

The 2,363 unique *.printer.in.goog names
11

The physical world, mapped from hostnames

Beyond printers: campuses, datacenters, badges, GPS.

Printers named floors; campusmaps goes further. 166 captured URLs expose a canonical address grammar COUNTRY-CITY-BUILDING-FLOOR-ROOM across 7 cities (Mountain View, NYC, Tokyo, ZΓΌrich, Sydney…), sometimes with latitude/longitude. The ctype= gives the place type away: CONFERENCE, UX_LAB, MICRO_KITCHEN. Worse: a query q=type:person+near:email@google.com resolves a named person to a seat, floor and coordinates.

Structured printer hostnames map a datacenter's security topology: au-syd-erk1a-1-security-truck-entry (the truck gate), …-soc (security operations center), …-security-kiosk. You meet badge lobbies (nacho-badgelobby), cameras (3ccsecurity-cam) and YouTube HQ as a smart building (building/datacenter/kiosk/nest.corp.youtube.com).

US-MTV Γ—26
US-NYC Γ—7
JP-TOK Γ—3
US-SVL Γ—3
CH-ZRH Γ—2
US-SFO Γ—2
US-MSN Γ—2

The OSINT angle

Corporate email β†’ building, floor and GPS. The internal map turns into a geolocated directory.

Campusmaps, place codes and security hostnames
12

917 engineers, 917 home pages

www.corp.google.com/~login: the unintentional directory.

Each engineer has a personal space under www.corp.google.com/~login. Prototypes, slides and notes get dropped there. We rebuilt the list of 917 of these spaces. Nothing was forced open: it all stayed in plain sight, you just had to know where to look. We cite the mechanism here (917 distinct logins), not the identities.

A telling detail: there is ~daepark/public/mustang-suggest/. A home page that names Mustang, the central ranking system. The content stays shut; the path places the person inside the machine.

Sample of archived /~login home pages
13

go/: the bookmarks that name the projects

The internal URL shortener, the toolchain and the codename zoo.

Google's famous go/ links are internal bookmarks. Their slugs name projects, and some map straight onto the section-3 lexicon. Ranking/quality side:

  • go/wrs-render-quality (WRS = Web Rendering Service, the headless render for indexing)
  • go/ymyl-classifier-dd
  • go/spamtokens-dd
  • go/udr/superroot (UDR = formerly WebRef, already in the lexicon)
  • go/result-set-convergence
  • go/web-signal-joins
  • go/video-centroid-domain-score
  • go/attentional-entities
  • go/article-understanding-project
  • go/iql-shopping-ids (IQL = Intent Query Language, already in the lexicon)

Around them, the legendary toolchain shows host by host: critique (code review), cs / codesearch, cider (the IDE), buganizer, moma (internal Google). Plus the codename culture: 757 single-word hosts (bahamut, valkyrie, aristocat…), and the cherry: memegen.corp.google.com, a meme generator running as official corp infrastructure.

Cross-check with the lexicon

Several go/ slugs confirm lexicon terms: udr, iql, superroot, ymyl. An engineer's bookmark is existence-proof of the project.

50 distinct go/ and goto/ slugs
Internal tooling hosts and codenames
14

google3, mapped by its paths

The monorepo tree, inferred folder by folder.

The archived code paths sketch the google3 tree. The top folders observed: third_party (213), googledata (149), file/colossus (89), quality (26). Under those roots, the names give the teams away.

google3/
β”œβ”€ crawler/trawler/crawl & fetch
β”œβ”€ indexing/crawler_id/scope/alexandria/indexing
β”œβ”€ docjoins/ascorer/IR scoring
β”œβ”€ quality/
β”‚ β”œβ”€ nsr/site quality (NSR)
β”‚ β”œβ”€ navboost/craps/click signals
β”‚ β”œβ”€ twiddler/re-ranking
β”‚ β”œβ”€ realtime/boost/real-time freshness
β”‚ └─ rankembed/mustang/ranking embeddings
β”œβ”€ superroot/impls/gws/routing & serving
β”œβ”€ repository/webref/entities, Knowledge Graph
β”œβ”€ learning/multipod/pax/JAX framework (Gemini)
└─ abuse/llm/agents/adi_agent/anti-abuse LLM agent

learning/multipod/pax points to Pax, the JAX framework behind Gemini; abuse/llm/agents/adi_agent names an anti-abuse LLM agent. The lever that drives everything else also shows: Mendel and Finch (A/B tests and kill-switches), via experiments.corp.google.com.

Sample of the google3 tree

Download the full corpus

All 3,736,742 internal URLs, in plain text, in a single file. Only the paths are exposed, the bodies stay behind login.corp. We're releasing it to the SEO/OSINT community.

Download the .zip (79 MB)

3,736,742 lines Β· ~622 MB uncompressed

Other sources and studies

June 2026 3,729,456 Google internal URLs, without opening a single one June 2026 What Google is really building June 2026 Inside Pinterest's algorithm June 2026 How Chrome classifies websites internally May 2026 Tomorrow's AI phone, seen from inside a Google APK May 2026 Ranking of the top Google Preferred Sources Apr 2026 Inside Brave Search: the invisible infrastructure of genAI Apr 2026 How ChatGPT Search works? Full reverse engineering Mar 2026 Reverse-engineering Chrome's hidden AI model: Gemini Nano v3 Mar 2026 TikTok: Inside the world's most addictive algorithm More stuffs...
RESONEO