15 SEO Myths Busted by Leaked Google Data

  • Matthew Woodward
  • Updated on Jun 24, 2024

The most interesting thing about the leaked Google docs is being able to compare the leak with Googles public statements.

For example:

Google stated that they do not use a spam score-

john mueller tweet google doesnt use spam score

But that’s not strictly true…

…because we can see that a spam score does exist within the leaked docs:

spam score

Weird, right?

And it doesn’t stop there!

Because I’ve used the leaked Google data to bust 15 other SEO myths below.

📈 Increase your traffic with the 28 Day SEO Challenge Now.

Myth #1: Google Doesn’t Use Chrome Click Data

Let’s kick it off with a big one…

Google has often denied that click data from Chrome users has been used to inform rankings in the SERPs.

In fact, they’ve been pretty vocal about it:

john mueller tweet about ctr

Not only this…

Gary Illyes went on to publically slam both Rand Fishkin, and the notion of CTR being used to influence search rankings in his Reddit AMA

“Dwell time, CTR, whatever Fishkin’s new theory is, those are generally made up crap. Search is much more simple than people think.”
– Gary Illyes

These kinds of statements have been reinforced multiple times throughout the years by other senior members of Google’s team.

What The Leak Says About Click Data

But the leaked API docs seem to tell a different story…

Contrary to Google’s claims, the leak suggests that Google does use Chrome data to evaluate site-wide authority and user engagement metrics.

The API module “NavBoost” (which appears 50 times in the leaked API) is described as a “re-ranking based on click logs of user behaviour.”

This alone is pretty crazy.

It suggests that user interactions by the Chrome browser are indeed being tracked and used to evaluate websites.

Here are some more metrics and systems that could be using Chrome data for rankings:

  • NavBoost – (Mentioned above) Bundling queries based on click logs.
  • badClicks, clicks, goodClicks – Click data categorized as good, normal, or bad.
  • unsquashed MobileSignals – Mobile-specific click data.
  • signals – Utilizes user behavior signals (clicks, etc.) to adjust rankings.
  • SiteEngagementScores – Generalised user engagement scoring.
  • SiteClicks – Click-through rate for an individual site.
  • TopicClicks – Click-through rate for topic-specific pages.
  • ImageQuality ClickSignals – Quality signals from image clicks.
  • chromeCounts – Collects interaction data from Chrome.
  • chromeInTotal – Tracking site-level Chrome views.
  • instantChromeViews – Tracking views of AMP pages through Chrome.

That’s a lot of click and engagement data being captured, and in case you missed it, they track mobile-specific click data too.

Here’s the takeaway:

Even though Google downplays the significance of click data, the leak suggests they’re using it (a lot) behind the scenes to tweak the search rankings.

Time for a new browser?

Myth #2: Google Doesn’t Have an Internal Site Authority Metric

Google has continuously denied the existence of a “domain authority” type metric.

They have publicly said that ONLY individual pages are evaluated based on their PageRank.

In the video below, John Mueller said they don’t have a website authority score:

Not only that…

On Twitter (in a now deleted Tweet) he also said:

John Mueller page authority tweet

Sounds pretty definitive, right?

But these statements aren’t exactly true…

What The Leak Says About Internal Site Authority

The Google algorithm leak confirms the existence of a metric called “siteAuthority“.

The siteAuthority metric seems to imply that Google ranks websites based on their perceived authority.

They may even play a significant role in Google’s ranking algorithm.

We don’t know for sure the impact (if any) siteAuthority has on rankings.

But again…

Google is clearly collecting data and, at the very least, measuring each website’s site authority.

To be clear – ‘siteAuthority’ is likely very different from Moz’s DA ‘domain authority’ and Ahrefs DR ‘Domain Rating’. Both of these authority metrics are heavily linked based.

It’s also worth mentioning that there is a PageRank score as well as a site level one:

Google’s ‘siteAuthority’ measurement is more likely to be a combination of factors – Not based on just one, here are some examples of possible components:

  • sitePr – PageRank or site authority.
  • pagerank2 – Another version of PageRank.
  • homepagePagerankNs – Showing a higher homepage ‘PageRank’ could be another authority signal.
  • IndexingDocjoiner DataVersion – Pulls in data from lots of pages for the site’s overall authority and expertise on a topic.
  • authorityFeedback – Uses knowledge graph and entities to grade authority.

Myth #3: There Is No Sandbox

This myth was busted years ago through individual testing, so it came as no surprise. But it’s nice to see that the leak affirms SEOs were on the right track.

For context, the Google sandbox stops new websites from ranking until they have proven themselves trustworthy.

Most SEOs believe it lasts for about 3-6 months.

Google has always denied that any kind of “sandbox” exists.

Here’s what they’ve said in the past:

John Mueller - There is no sandbox tweet

What’s more?

Check out John Mueller’s response here to a question about Google having a new Google sandbox in the algorithm.

His answer doesn’t exactly fill you with confidence, right?

And now we might know why!

What The Leak Says About The Sandbox

The leak revealed a metric called “hostAge.”

Not to be confused with your website being taken hostage, this metric seems to show that Google measures a domain’s age.

It also strongly suggests a sandbox mechanism has been built into the algorithm.

John Mueller has also answered “No” in the past when he was asked if domain age matters for rankings.

And that might be true.

But why would Google collect data about a domain age if they don’t use it?

Myth #4: Human Quality Raters Do Not Flag Websites

Human quality raters are hired by Google to evaluate the quality and relevance of search results.

They follow strict guidelines outlined in the Search Quality Evaluator Guidelines.

Think of these guidelines as a detailed document that tells quality raters how to evaluate websites correctly for search.

Google has always been upfront about the existence of search quality raters and their roles.

BUT…

In this Google Hangout, John Mueller said quality raters are not specifically flagging sites and that not something that we’d use there directly.

Is that true, though?

Short answer: It doesn’t look that way.

What The Leak Says About Human Quality Raters

The term “EWOK” appears multiple times in the API, referencing the quality raters platform.

Not only that, but some elements show that information gathered by quality raters is stored, such as:

  • isCorrectRange
  • ratingSource
  • raterCanUnderstandTopic

But some big attributes uncovered during the leak show the existence of “Golden Documents.”

  • Golden – This is a flag that quality raters give specific website pages.
  • isReadable – Signals the quality of content shown to human raters, potentially influencing how algorithms check content quality for ranking.

This likely means that that content can be labeled as high-quality and ultimately get preferential treatment in rankings.

Pretty crazy, right?

The bottom line here is this:

There is a good chance if a quality rater visits your website, their opinion could affect your rankings.

I actually published about the impact a search engine evaluator can have on your site in 2019, but it was getting too much heat at the time so I deleted it.

Myth #5: Google Does Not Favour Content Freshness

I’ve always believed that freshness played an important role in content rankings.

Almost every time I update old posts, they get a rankings boost.

Google has emphasized the importance of fresh content, but they’ve been a bit vague about how important it is.

What’s more?

They’ve gone as far as saying that Google doesn’t favour fresh content.

Check out this now deleted tweet from John Mueller:

john mueller tweet google doesn't favour fresh content

The truth is that freshness does seem to matter – maybe more than most of us realise. In fact, I have a dedicated module to content freshness in the 28 Day SEO Challenge.

What The Leak Says About Content Freshness

The leaked docs reveal that Google looks at 127 attributes linked to content freshness and date-based signals:

Here are some attributes backing up statements that content should be maintained:

  • adjustmentInfo, date – Indicates when a page was last ‘substantially’ updated (e.g., larger blocks of content).
  • timeAnnotations – Likely used to track content freshness, a known factor in ranking.

This shows that Google measures when content was last updated.

That probably means that Google prioritises “fresh” content in the organic search results.

There appears to be layers to this algorithm that treat time-sensitive and news-specific content differently:

  • time sensitivity – Considers time-related topics, reducing a page’s time to rank.
  • NewspaperSpecific – Recent news articles from trusted sources might be favored in search results.

Nothing really changes strategically for us based on this information. But it’s good to know that Google actively measures freshness.

My advice?

Keep your content fresh and update important pages regularly. Follow the process in the 28 Day SEO challenge to do that.

Myth #6: Great E-EAT AI And Human Content Are Treated The Same

Google loves to say that content, whether produced by AI or humans, is evaluated and rewarded equally.

According to Google, the key is demonstrating great E-E-A-T:

  • Experience
  • Expertise
  • Authoritativeness
  • Trustworthiness

what is EEAT in SEO

Google has also said in the past that AI content is perfectly fine as long as it meets quality standards.

Here’s what their Search Guidance document says:

Google Search Central - AI content creation guidelines

This literally opened the floodgates to thousands of websites spamming the algorithm with AI-generated content.

The irony?

Google made the same mistakes by publishing their own AI overviews telling people to eat rocks:

eat rocks

But what people seem to forget is that Google can do a 180 at any time.

And it looks like they have done that with AI content.

They just didn’t tell anyone!

Look at the fallout from the March 2024 Core update – Websites using AI content were destroyed.

The leaked API documents might give us some insight into why that is…

What The Leak Says About AI Content

The term “golden” appears as a flag in the API leak, indicating a “gold-standard” document.

What does it mean?

The attribute notes indicate that the flag gives extra weight to human-written content over AI-written content.

It could also give preferential treatment when it comes to rankings.

To what extent, we don’t know. And how Google measures that difference we also don’t know.

What we do know, is that Google have rolled back their own AI overviews.

Myth #7: Google Does Not Whitelist Websites

Google has insisted for years that it doesn’t manually intervene in rankings.

The algorithm creates an even playing field. The best content will rank.

That’s why people trust Google’s search results so much.

Here’s what John Mueller has said in the past:

john mueller tweet the web is not manually ranked

But the leaks tell a different story…

What The Leak Says About Whitelists

It reveals that Google maintains whitelists for specific content types.

For example, certain trusted sites seem to have gotten special treatment during elections or for COVID-related information.

In the leaked API documents, there are two specific terms:

  • isElectionAuthority – Could boost ranking for election-related searches.
  • isCovidLocalAuthority – This may influence ranking for pandemic-related queries.

API Leak election whitelist

These seem to indicate that whitelists for elections and COVID-related news existed. It’s possible these sites received more visibility and high rankings when covering these topics.

There are also attributes that suggest ranking multipliers could be applied to sites that have been vetted.

  • isEditorial – Checks if the content has been manually reviewed or curated.
  • authorityFeedback – Checks information from authoritative sources like the knowledge graph known entity authorities.

Even though Google claims to be impartial, the leak shows that it has the power to whitelist any site it wants.

It’s likely they already have.

Scary stuff.

Myth #8: Backlinks Are Not Important

Backlinks have long been considered a top ranking factor.

But more recently, Google has consistently downplayed their importance.

Here are a couple of statements to chew on:

john mueller tweet links as a ranking factor

Then two years on…

John Mueller stated the following in a live Q&A at the brightonSEO conference:

“But my guess is over time, [links] won’t be such a big factor as sometimes it is today. I think already, that’s something that’s been changing quite a bit.” – John Mueller

What The Leak Says About Backlinks

Links are important, and they’re here to stay.

This is particularly true for high-quality and diverse backlinks from authoritative sites. This also lines up with the relevance of key ranking elements like PageRank.

The leaked docs mention over 136 attributes focused on inbound links, some key attributes include:

  • linkWeight Indicates a scoring system of a link’s strength in ranking.
  • PageRank – A score considering various factors, links being a significant part of it.
  • penguinPenalty – Clarity that the penguin penalty is still grading link anchor text.

Here are some more snippets that likely highlight the importance of inbound links and their quality:

  • linkData – Links within body content surrounded by relevant text are a clear signal.
  • contentlink – Identifies links within external or internal content may be a key factor.
  • anchor, homepage – How Google works out the relevance and authority of links pointing to a website.
  • NumBackwardLinksInWoS – A count of the number of citations (backlinks).

My guess is that backlinks will continue to play an important role in search rankings. How else will Google decipher quality?

Myth #9: Anchor Text Ratios Do Not Mean Good Rankings

Now we are getting technical.

But technical is important!

John Mueller claimed in this deleted Tweet that anchor text ratios, word count and link count are not quality indicators for search rankings.

john mueller tweet quality indicators for anchor text

But that isn’t exactly true.

What The Leak Says Anchor Text Ratios

The leak seems to show that diversity in backlinks is absolutely critical.

It’s not just the sheer number of backlinks that matters most, but the variety and quality.

Something quality link building services have been saying for a while.

Diverse backlinks from multiple domains signal to Google that the content is widely trusted and referenced.

That’s what a quality backlink profile looks like.

If numbers don’t indicate quality, why are there 44 attributes specifically measuring metrics around anchor counts like:

  • numIncomingAnchors – Number of incoming links (anchors).
  • AnchorSpamInfo – This module focuses on analyzing the anchor text profile of a document to find spammy patterns.
  • AnchorStatistics – These metrics provide more granular details about anchor text distribution.

The leak contains a whole module featuring anchor text spam demotions – and it’s extensive to say the least.

It also indicates that anchor text plays a significant role in backlink quality, with over 76 anchor-based attributes.

My advice?

You should follow this tutorial to choose the right anchor text for your links

Myth #10: Don’t Disavow Bad Links, We Handle It

I always get nervous when Google says, “Don’t worry, we’ll take care of it.”

They don’t have a great track record of following through. I prefer to take matters into my own hands.

And when it comes to disavowing bad links, maybe you should too…

Here’s what Google has said in the past:

John Mueller - toxic links are made up by SEO tools

That’s not all…

John Mueller also backed up his past statements in a ‘recently deleted’ comment on Reddit:

Johns deleted reddit comment

Those are pretty strong statements.

What The Leak Says About Bad Links

Contrary to Google’s public stance, the leak shows that bad links do matter.

There are multiple demotions for spammy links, along with scoring systems that could penalise sites with spammy outbound link signals and anchor text.

For example, the attributes “phraseAnchorSpamPenalty” and “spamrank” appear in the API docs.

These seem to indicate penalties for sites associated with spammy links and anchor text.

Not a place you want to be, others include:

  • anchorCount – Counting internal and external anchors for spammy patterns.
  • penguinPenalty – Penalties applied by the Penguin algorithm.
  • penguinEarly AnchorProtected – Possibly some early protection for anchor-based link attacks.
  • spamProbability – Analyzing the anchor text profile of a page to find spammy patterns.

While Google tells you not to worry about spammy links, the leaks suggest that ignoring them could harm your site’s rankings.

I for one don’t want to rely on the “hope” strategy of Google catching it for me.

Myth #11: Getting Links From Old Authoritative Pages Is Better

Here’s a myth that might come as a surprise…

Google has often emphasised the importance of link quality and relevance.

They’ve also mentioned that links don’t expire – at worst they may be devalued over time.

In a Webmaster Central Office Hours, John Mueller was asked “Do links expire after a while?”

John Mueller immediately responded: “No, they don’t expire…”

That seems pretty straightforward. But it might not be the whole story.

What The Leak Says About Old Links

The leaks suggest that links from newer pages pack more punch than those from older, authoritative pages.

The leaked API documents show two key terms here:

  • Freshdocs: A link value multiplier that may indicate that links from newer web pages are more valuable than those inserted into older content.
  • creationDate: The creation date of a link

The key takeaway here is that as a page becomes less important to Google, the links can seemingly expire or lose their strength over time.

New links seem to have a more significant impact.

Myth #12: Linking Out To High Authority Sites Does Not Help With Your Rankings

Google has always highlighted the importance of citations and references.

And that’s good!

But they’ve never once indicated an internal system for tracking the accuracy and confidence levels of references you add to your content.

I mean, why would they?

They’ve only ever downplayed the idea that linking out to sources boosts SEO:

john mueller tweet linking to high quality sites doesnt boost seo

Not only that…

John Mueller doubled down with this tweet:

john tweet

What The Leak Says About Linking To High Authority Sites

The leaked documents suggest there is indeed an internal scoring system for citations and references, particularly in YMYL (Your Money, Your Life) niches.

They mention metrics like “outlinkScore” and “outlinkDomainRelationship” that appear to evaluate the quality and relevance of outbound links.

Strap in for this one…

Here’s what the modules and attributes suggest:

  • outlinkScore – Indicates the value of outbound links.
  • outlinkDomainRelationship – Evaluates the relationship between the linking site and the linked site.
  • indexingConverter LinkRelOutlinks – Analyzes outbound links on a webpage for quality and relevance to the page’s content.
  • webrefOutlinkInfos – A page’s outbound links could signal its authority and relevance, especially if they point to other high-quality sites.

What does this really mean?

They seem to show that linking to high-quality, relevant external sites CAN boost your page’s authority and trustworthiness.

This is directly opposite to what Google has said in the past.

I know, I know.

While this might not come as a big shock, it should be a wake-up call.

Google is tracking links in and out of your site. Ensure your website only links to reputable websites you would confidently recommend to a friend.

Myth #13: Google Doesn’t Use A Toxic Links Scoring System

Google has always said they don’t use a specific spam score for links.

john mueller tweet google doesnt use spam score

When an SEO asked about Semrush’s “toxic link score” for their company website.

John Mueller answered:

john mueller tweet ignore toxic spam score

Most SEOs know that content and quality links work hand in hand.

The leak supports this.

What The Leak Says About Scoring System

Google seems to have multiple scoring systems for evaluating links.

This indicates a more aggressive approach to link evaluation than they’ve communicated publicly.

In fact, Google’s entire link graph is very sophisticated. Especially when it comes to toxic link scoring.

The leaked API documents show a whole bundle of metrics for link spam:

  • spamScore – Evaluates the spam level of links and is mentioned 20 times in the API docs.
  • spamrank – A measurement between 0 and 65535, with a higher number indicating the probability that it contains links to spammy or low-quality sites.

With a variety of different labels for link penalties:

  • IndexingDocjoiner AnchorStatistics This module has a bunch of attributes that say otherwise, like badbacklinksPenalized and penguinPenalty.
  • phraseAnchor SpamDemoted – This module aims to detect and penalize such spammy anchor text practices.
  • spamPenalty – Based on this analysis, a spam penalty may be applied to the final link calculation.

Interesting, right?

I was pretty impressed with the level that Google goes to measure link quality. To what extent it affects backlinks, we don’t know.

But this clearly contradicts what they have said in the past about having no spam score.

Myth #14: We Don’t Use Authorship

Here’s another myth that’s been circulating for a while…

Google has claimed their ranking algorithms don’t use a specific “authorship score” to rank content.

Back in 2013, John Mueller said on this Webmaster Central:

“We don’t use Authorship for ranking.”

But this stance changed with the introduction of Schema…

In this 2021 Hangout, John Mueller advised that recognising the author is important for Schema.

Which one is true?

What The Leak Says About Authorship

The leaked documents show that Google tracks authorship information. Google stores authors as entities, and authors are an explicit feature in its systems.

  • OceanVolumeImprint – The author’s reputation and authority in their field could significantly impact search rankings.
  • OceanDataDocinfo WoodwingItemMetadata – Metadata about an article’s author and category could be used to assess expertise, authority, and topical relevance.
  • isAuthor, isPublisher – Signals the role of the entity in the document (author, publisher), potentially impacting credibility assessment.
  • document AuthorPresence – Does analysis of document-level information like authorship authoritativeness

This means authorship is a clearly defined and intentionally integrated component within Google’s systems.

It also implies that Google’s algorithms specifically recognise and utilise information about authors as a factor in their ranking process.

This would explain how Google tracks the “expertise” in EEAT.

If that’s true, the author of your content may impact how content is evaluated for quality and relevance.

Bottom line:

Authorship is back.

Myth #15: Domains And Subdomains Are Treated The Same

Google has long maintained that subdomains and domains are treated equally in their rankings.

In 2017, John Mueller said the following in response to a question about whether subdomains or subfolders are better for SEO:

“Google web search is fine with using either subdomains or subdirectories.[…] We do have to learn how to crawl them separately, but for the most part, that’s just a formality for the first few days. So, in short, use what works best for your setup.”John Mueller

These statements imply that Google views subdomains similarly to their main domains regarding SEO and crawling.

According to Google, there’s no significant difference in how they are treated for ranking purposes.

John Mueller- subdomain or domain

This is another classic Google “don’t worry, we’ll handle it for you” statement.

It’s like the toxic links in Myth #10.

But the leak seems to show thats not actually the case.

What The Leak Says About Domains & Subdomains

There are two key things here:

  • Separate Evaluation: Google evaluates subdomains separately from the main domain.
  • Ranking Impact: This separate treatment could mean subdomains may not benefit from the authority of the main domain as much as we thought.

The second point is potentially a big one…

It could mean you would need to build the authority of your sub-domain AND main domain separately.

That directly contradicts what Google has said in the past and is something to keep an eye in the future.

Wrapping It up

The Google search leak has given us a rare glimpse inside Google’s systems.

The leak reveals a complex system with over 14,000 ranking attributes which is incredible to dig through.

Especially when you contrast that with Googles public statements!

From seemingly false statements to complete 180 degree turns, Google has gone back and forth like a ping pong ball over the years.

However, one thing is clear:

You should take what Google says with a huge pinch of salt!

In reality, it’s always best to test on your own or at least follow the tried and tested processes from other SEOs around the world

Want to learn more about the leak?

Download our complete Google search leak analysis, which unpacks 22 things that stood out to me most.

Link Building

Link building you will be proud of.

Learn more

SEO Agency

We take full control of your traffic.

Learn more

Learn Portal

Free SEO tutorials to increase your traffic.

Learn more

What Are Your Thoughts?

Leave a Reply

Increase Your Search Traffic
In Just 28 Days…

CLICK HERE TO GET STARTED I’ll show you how step by step

Featured In: