We read the paths, not the contents. The mere existence of a URL is intelligence: what it names reveals a team, a system, a machine.
No wall was crossed. Almost every one of these addresses answers with a redirect to an internal login portal: the content stays shut. But the path itself is public. And a path that reads superroot/youtube/youtube_controversial_query_blacklist already says plenty, even locked.
We don't read what the page says. We read its address.
A URL names things. Before any content, it announces a host, a folder, a file. ranklab.teams.x20web.corp.google.com reveals a ranking test bench; badurls_demoteindex reveals that "demote" is kept distinct from "remove". The page body can stay shut: the naming has already leaked.
That is the method's deliberate limit, and its strength. We work on public data with standard OSINT techniques. We do not know what these pages contain. We know they exist, and how their authors named them.
What we did NOT do
No internal access, no login, no bypass, no content opened. Nearly every address answers with a 302 redirect to login.corp. We stop at the door; we only read what is written on it.
The starting point: our reconstruction of how Search works, after the Google Leaks.
In 2024, following the Google Leaks, we reconstructed how Google Search works: the components, their codenames, the crawl β indexing β scoring β serving chain. Here is the infographic we published. It serves as the reading key for the next section: the lexicon.
Read the full analysis: Google Leak, part 6: How does Google Search work.
134 terms from the Google Leaks, checked against the 3.7M URLs and the code paths. Tap a pill to open its definition.
The ranking components our 2026 paths confirm, one by one.
Ascorer, the main IR scorer, appears in plain text in a LIVE debug flag: eng-hip-ascorer. Above it, the Twiddlers (189 occurrences in the URLs) re-rank results under SuperRoot (474 occurrences). The GWS pipeline reads tier by tier through asdebugger: qrewrite β optq β sr_bns β gws_bns.
Two Googlebot lists coexist: badurls_spamindex (kill) and badurls_demoteindex (demote). A youtube_controversial_query_blacklist wired into SuperRoot attests to per-vertical editorial intervention. And hosts like signalslookup, ranklab.teams.x20web, raterhub (EWOK) or hc-ai-mode-staging name the ranking test benches.
Experiments, kill-switches, launches: the lever above the engine.
Method note
For short tokens (sge, magi, ewok, sxsβ¦) a plain grep is poisoned by base64 hidden in upxsrf= auth tokens. Every count below is anchored on host names, not substrings. That is what makes them defensible. That Google hid its rater platform behind noise-looking tokens is part of the story.
Above the engine, one tier drives it all: A/B experiments and kill-switches. Mendel is the master platform (internal product Mendel Insights); Finch is its Chrome-side twin, where each study is a .gcl file filed by lifecycle state (launched/). There literally is a KillSwitchExample.gcl. Everything flows through launch.corp.google.com/launch/$1, the central tracker, whose real IDs we recovered.
In prod, rollouts.corp ramps changes in waves (Progressive Borg Rollout, pbr_view=stages). And a single querydebugger.corp.google.com/eval/unified URL holds a full Search experiment config frozen in a string: a pinned GWS binary, named flags, and the marker __data_rollout__β¦__launched__:true, the exact flag that says "this experiment is live in prod".
And these aren't abstractions: here are real, named, dated AI Mode experiments, gathered through a separate OSINT channel, not from the 3.7M-URL list. Each name gives away a feature in flight: actionable entities, a research agent, ground transportation.
Real AI Mode experiments, captured separately. We read their name and state: ::Launch (live), ::Experiment (under test), ::Control (control group), without ever opening the feature.
A leak meant to reveal only a filename reveals its edit history.
Some decisions aren't algorithms: they're hand-edited files dropped in google3/googledata/. Some URLs carry the edit history of youtube_controversial_query_blacklist: 42 distinct revisions whose tokens encode the terms being added (β¦?cl=β¦/40mandalay, /40shooter, /40paddock, /40crisis, /40bomber). The list fills up during the Las Vegas shooting (Oct 2017). The filename was meant to hide everything; it reveals the contents.
In the same folder sit two Googlebot lists, badurls_spamindex (kill) and badurls_demoteindex (demote), and a plain-HTTP manual-action workflow: spam-policy.prom.corp.google.com/{submit,approve,reject}. The anti-spam org chart shows: webspam.corp, adspam.corp, spamops.corp/QueryAide, fraudbin.corp/fraud/takeNextTicket.
The OSINT angle
Censorship leaves a version trail. Each cl=β¦ revision is a word added by hand.
16,000 raters, their platform and their verdicts, by the hosts.
The lexicon names EWOK and Needs-Met as concepts; here are the hosts that serve them. raterhub.corp.google.com is the Quality Raters' platform (the very one enforcing the public "Quality Rater Guidelines"). And eval-analytics.corp.google.com/querygroup?experimentId=β¦&queryGroupIndex=0 exposes the machinery: a named experiment ID sliced into query-groups, the exact unit raters score. The arrow is complete: rater β query-group β experiment verdict.
Around it orbit the human "golden set" (goldendata-geq.googleplex.com, gwstest-gold, a side-by-side test GWS), the annotation factory (merlot-ops-labelstream-ui, crowdcompute.googleplex.com) and satisfaction surveys (gutssurveys.corp, GUTS = Google User Satisfaction).
The hottest SEO topic, leaked by its staging hosts.
A pre-prod hostname is pure intelligence: it attests the product exists, its internal codename and its dogfooding, while the body stays shut. hc-ai-mode-staging.corp.google.com/?entry_point_id=1 names "AI Mode", the conversational search surface reshaping SEO. Next to it, aistudio-preprod exposes a model slug (gemini-2-5-flash-image) and bundled demo apps.
You can even reconstruct the release pipeline from suffixes alone: geminidataanalytics exists as a clean five-rung ladder, dev β autopush β staging β preprod β prod. And bard.corp.google.com/sitemap.xml stays indexed after the Gemini rebrand: an OSINT fossil. In code, abuse/llm/agents/adi_agent names an LLM agent built to police LLM abuse.
For SEA: the reserve-price levers, seen from inside.
A single host, adxdashboard.corp.google.com, exposes the page tree of Ad Exchange's operator console: the reserve floor (/pub/reserve_price_opt_summary.html), the real-time auction (/rtb/rtb_dashboard.html) and publisher payout (/pub-mon-payout.html). SEA pros only see this loop's output; here is the machine.
The organic/paid seam is named: SearchAdsViewerRenderingUi is the component that paints ads onto the SERP; displayadssearch and unity-adsense-search are where ad-serving queries the search stack. And anti-spam has an ads twin: adspam.corp, with a verbatim signal, MobileAppQueryIpClickStats28Days (per-IP click stats over 28 days, click-fraud detection).
All on *.printer.in.goog, hand-named by employees.
Every printer carries a name chosen by a team. Put end to end, 2,377 of them sketch an internal culture, and sometimes a geography. You meet the AIs out to destroy humanity (hal9000, skynet, glados, ultron, jarvis), video games (mario, zelda, sonic, tetris), James Bond (goldfinger, jaws), food (sushi, taco, croissant), and a nod to The IT Crowd: 01189998819991197253.
The OSINT angle: the names geolocate
Many printers encode a location: 24th-floor, 25th-floor, access-sf1, 3cc-reception. Cross-referenced, they let you map floors and buildings from the printers alone. There are even named cameras on the same domain (3ccsecurity-cam).
Beyond printers: campuses, datacenters, badges, GPS.
Printers named floors; campusmaps goes further. 166 captured URLs expose a canonical address grammar COUNTRY-CITY-BUILDING-FLOOR-ROOM across 7 cities (Mountain View, NYC, Tokyo, ZΓΌrich, Sydneyβ¦), sometimes with latitude/longitude. The ctype= gives the place type away: CONFERENCE, UX_LAB, MICRO_KITCHEN. Worse: a query q=type:person+near:email@google.com resolves a named person to a seat, floor and coordinates.
Structured printer hostnames map a datacenter's security topology: au-syd-erk1a-1-security-truck-entry (the truck gate), β¦-soc (security operations center), β¦-security-kiosk. You meet badge lobbies (nacho-badgelobby), cameras (3ccsecurity-cam) and YouTube HQ as a smart building (building/datacenter/kiosk/nest.corp.youtube.com).
The OSINT angle
Corporate email β building, floor and GPS. The internal map turns into a geolocated directory.
www.corp.google.com/~login: the unintentional directory.
Each engineer has a personal space under www.corp.google.com/~login. Prototypes, slides and notes get dropped there. We rebuilt the list of 917 of these spaces. Nothing was forced open: it all stayed in plain sight, you just had to know where to look. We cite the mechanism here (917 distinct logins), not the identities.
A telling detail: there is ~daepark/public/mustang-suggest/. A home page that names Mustang, the central ranking system. The content stays shut; the path places the person inside the machine.
The internal URL shortener, the toolchain and the codename zoo.
Google's famous go/ links are internal bookmarks. Their slugs name projects, and some map straight onto the section-3 lexicon. Ranking/quality side:
Around them, the legendary toolchain shows host by host: critique (code review), cs / codesearch, cider (the IDE), buganizer, moma (internal Google). Plus the codename culture: 757 single-word hosts (bahamut, valkyrie, aristocatβ¦), and the cherry: memegen.corp.google.com, a meme generator running as official corp infrastructure.
Cross-check with the lexicon
Several go/ slugs confirm lexicon terms: udr, iql, superroot, ymyl. An engineer's bookmark is existence-proof of the project.
The monorepo tree, inferred folder by folder.
The archived code paths sketch the google3 tree. The top folders observed: third_party (213), googledata (149), file/colossus (89), quality (26). Under those roots, the names give the teams away.
learning/multipod/pax points to Pax, the JAX framework behind Gemini; abuse/llm/agents/adi_agent names an anti-abuse LLM agent. The lever that drives everything else also shows: Mendel and Finch (A/B tests and kill-switches), via experiments.corp.google.com.
All 3,736,742 internal URLs, in plain text, in a single file. Only the paths are exposed, the bodies stay behind login.corp. We're releasing it to the SEO/OSINT community.
Download the .zip (79 MB)3,736,742 lines Β· ~622 MB uncompressed