May 31, 2024

What the Leaked Google Search API Means for SEOs

Thousands of leaked Google Search API documentation unveiled stunning insights into Google's search algorithm and confirmed several theories of SEO professionals.

The leaked information contains over 2,500 pages and appears to be from Google’s internal "Content API Warehouse" but was uploaded to GitHub on Mar 27, 2024, for unknown reasons. The Google GitHub repository containing the original documentation was taken down on May 7 but you can still see the copy of the leaked Google Search API hosted on HexDocs.

The founder of EA Eagle Digital, Erfan Azimi, divulged the documentation to Rand Fishkin on May 24. Although the source isn't affiliated with Google, he appears to be knowledgeable of the inner workings of the company, and the bombshells he dropped coincide with the information from the DOJ vs Google antitrust trial.

Can we verify the authenticity of the API leak? The chances are slim and none. Google representatives have already released a statement on May 29 saying the documentation lacks context. We also will never know which API modules were retired and which are still in use.

Update: On a statement on May 30, 2024, Google confirms that the API is indeed legit but urges caution on making assumptions on how the whole ranking system works.

What can we learn from the leaked information

Verified or not, the leaked Google Search API is a game-changer for SEO professionals. This key piece of information helps us better understand the Google search algorithm and evolve our SEO strategies.

Here are our key takeaways:

User behavior is a ranking factor

The leaked API documentation references features like:

google navboost api attribute

Where does Google get this data? Where else but from browser cookies and signed-in Chrome users. Based on the DOJ case testimony, the Google Chrome browser was built with the intent of collecting data for this purpose.

We don't know how but Google can filter out the clicks they want to count and their length and use those data in their ranking systems. But this just means user intent, CTR, bounce rate, and time on site affect a website's rankings.

If you want your page to rank, you should share it on social media so people can see it and give Google data on how users behave on that page.

Google grades your website

Contrary to John Mueller and Gary Ilyes' statements, Google uses siteAuthority attribute which could be similar to Moz Domain Authority. While we don't know the exact computation or application in search rankings, we now have proof that it exists.

Sitelinks are based on a website's most popular pages

Using data from signed-in Chrome browsers, Google counts the clicks on a website and uses that data to determine the page to be included in the sitelinks feature.

There really is a sandbox period for new websites

google api leak: hostage attribute
google api leak: smallpersonalsite attribute

Domain age is a ranking factor and Google doesn't favor small websites. Those are some of the reasons why big brands dominate search results. One way for a new website to compete with legacy domains is to gain a following and become a name in its niche. But that is easier said than done.

Author bio is necessary

google api leak: isauthor attribute

The author and isAuthor are the only attributes connected to EEAT (Experience, Expertise, Authoritativeness, and Trustworthiness). It makes sense because there's no way for Google to accurately measure EEAT intangibles.

That being said, make sure your posts have an author bio and utilize sameAs schema so Google can track authorship across the web and within a website.

Data from human quality raters train Google algorithms

google api leak: quality raters

We've known since 2013 that Google has personnel tasked with grading websites on the internet, you can even read their guidelines here.

Some modules infer that the information gathered by these teams is used as training data. Heck, the website grades may even be baked into the algorithm itself, but we'll never know.

Whatever the case is, it's safe to say that if a website passes the human quality rater guidelines, it probably is ranking well on Google search.

Redirecting a page to an irrelevant target never works

google api leak: urlhistory attribute

Google stores and keeps track of the changes in its indexed pages, just like the Wayback Machine. When an indexed page is redirected to an off-topic page, page rank will not be passed. We think it is a measure to counter expired domain SEO tactics.

Rankings of indirectly related keywords are intertwined

The NavBoost query examines user behavior during and after the main query. For example, if users search for "Sergei Ivanov" but quickly change their query to "Serphead" and click serphead.com from the search results, serphead.com will eventually rank for the "Sergei Ivanov" keyword.

Domains with exact-match keywords will never rank

Domain names that exactly match generic search queries (e.g. seo-link-building.com) are considered spam and are outright demoted.

Google can whitelist websites

During the COVID-19 pandemic, Google used whitelists to favor certain websites in search results for COVID-related queries. Likewise, during the US democratic elections, Google demoted specific sites for election-related information.

Whether this is a good or bad thing, it's not up to us to decide but this piece of information shows that Google is capable of tinkering with its algorithm to show the results they want.

Low-quality links are ignored

Based on click data, Google can label links as low, medium, or high-quality. Links clicked hundreds or thousands of times are marked as high-quality and pass ranking signals. The value of a link is also decided by how much Google trusts the homepage of the linking domain.

On the other hand, links that are not clicked on by users are marked as low-quality and are ignored. Google also demotes links with mismatched anchor texts.

This means that John Mueller is telling the truth that low-quality links are a non-factor and don't have to be disavowed and that toxic link score is really a made-up metric by some SEO tools.

If you're actively building links, aim for websites with actual human visitors instead of going for ones with high DA and low traffic. Use SEMRush, Ahrefs, or SimilarWeb to estimate the monthly organic traffic of these websites.

Building too many links too fast is bad

google api leak: anchor text spam

Google identifies spikes in anchor text volume to counter spam or blackhat SEO attacks. Links acquired in this period are then demoted.

If you're building links for a website, you may want to do it gradually and consistently for links to be counted and bring value. Don't forget to diversify your anchor texts to avoid demotions.

Google can tell how fresh content is using dates

google api leak: semanticdate attribute

The freshness of content really is a ranking factor and Google can associate dates with pages. Ensure your content has consistent dates in the page title, content, publish date, and lastmod attribute in the XML sitemap.

Link formatting is important

Google is tracking the average weighted font size of link anchor texts. We don't know how important this is but it wouldn't hurt to change the font color and use bold or underline to make the links stand out from the rest of the text.

Poor website navigation and structure can cost you

google api leak: navdemotion attribute

From the navDemotion attribute, we can deduce that pages with poor navigation will be outranked by pages with good UX.

There is no optimal word count

google api leak: numtokens attribute

Google counts the number of punctuations in a document and truncates content that is too long. What this means is page content should be straight to the point otherwise, the most important information will be ignored.

Let's say you're writing a pizza recipe, you don't have to write about who invented pizza or other fluff to hit a certain word count suggested by SEO tools. Provide the juicy stuff write away and give the users what they want.

You get punished for keyword stuffing

google api leak: keywordstuffingscore attribute

Google can tell if you're cramming keywords in your content and grades it accordingly. If you're using SurferSEO or other tools that encourage keyword stuffing, you'd better cancel your subscription or at least ignore the suggestion of keyword frequency.

Don't take statements from Google spokespersons as gospel

Lastly, details from the leaked Google Search API disprove statements by company reps. Google employees will always protect their trade secrets like a mama bear defends her cub. Take their word with a grain of salt, especially when it comes to the ranking system.

Keep studying the Google Search API

The Serphead team will continue studying the leaked Google Search API and update this post regularly with new findings. We also encourage fellow SEO practitioners to do the same and share their insights with the SEO community.

But for now, let us follow a holistic approach to SEO. Focus not only on keywords or links but also on user experience as it seems UX signals have more weight these days.

Sergei Ivanov

Sergei Ivanov

Co-Founder and CEO of Serphead

Sergei has more than a decade of experience blending data-driven insights with SEO strategies to enhance online visibility and user engagement. He has been providing practical guidance for businesses and individuals navigating the digital landscape since 2012.