Imagine If We Let Google Rank Universities. . . – Le Minh Khai's Southeast Asian History Blog (+ More)

These days there are many universities in Asia that devote incredible amounts of energy towards improving the ranking of their respective university in world ranking systems.

The QS World University Ranking is particularly influential, and its annual announcement of the top 500 universities in Asia is of great importance to many university administrators across the region.

Recently, however, I have been learning about another type of ranking system – Google’s ranking of web pages and websites, and I have been surprised to see that if we were to simply let Google rank universities based on the statistics it gains from university websites, that we could easily end up with the exact same results that emerge in annual world university rankings, results that are the end result of massive commitments of time and human resources on the part of universities.

google

To understand this, we first need to get a sense of how Google evaluates webpages. At the most basic level, Google determines the value of a web page by the number of other webpages that link to it. In other words, a website that is linked to a lot is considered more valuable or authoritative than a website that is not linked to.

From that basic level, Google then considers many other factors. Google considers, for instance, that “.edu” and “.gov” pages are more authoritative then say personal websites. Therefore, it will rank a website higher if it has a link from an established university than if it has a link from “liamkelleyisagenius.com.”

Google now also considers how many “domains” are directing links (also called “backlinks”) to a website. So, for instance, if a website receives 200,000 links from 100 domains, Google will consider that website less valuable than a website that receives 200,000 links from 100,000 domains.

maj

While we know that this is the basic way that Google evaluates webpages (in order to determine how to return results in its search pages), the actual algorithm that Google uses remains a secret.

Further, Google used to make public its ranking of webpages, but it no longer does that. There are, however, services that try to collect the same data that Google has and to produce rankings based on what is known about how Google’s algorithm works. While these findings are not exactly the same as whatever it is that Google knows and finds, they nonetheless can get us close.

I’ve been using one such service, called Majestic.com, and I input the websites for the top 10 universities in the 2019 QS World University Ranking of the top 500 universities in Asia to see if there is any relation between how QS ranks universities and how a search engine like Google would rank the websites of universities.

What I found is that that the two are closely related, and that it would be easy to create an algorithm that would enable Google (or some other service) to calculate the exact same rankings as QS, but to do so based solely on extant data about university websites.

majnus

Majestic has two main rankings: trust flow and citation flow. Citation flow ranks the number of links that point to a site. In coming up with this ranking, Majestic (like Google) does not count various types of links that it does not consider legitimate (this gets technical, and I’m going to skip those details here). So a website with 100,000 links to it might get a higher citation flow number than a site that has 500,000 links if it considers that many of the links on that second site are not “real” links.

Trust flow, meanwhile, is a ranking of how reliable or authoritative a website is, based in part on how many links it gets from other authoritative websites, like .edu and .gov websites, news sites, Wikipedia, etc.

If we look at the top 10 universities in Asia, as ranked by QS, this is what their trust flow and citation flow numbers look like:

trust and citation

And if we add those two numbers together, this is what we would get:

National University of Singapore (NUS) = 134

The University of Hong Kong = 132

Nanyang Technological University (NTU) = 123

Tsinghua University = 122

Peking University = 154

Fudan University = 136

The Hong Kong University of Science and Technology = 124

KAIST- Korea Advanced Institute of Science & Technology = 115

The Chinese University of Hong Kong (CUHK) = 117

Seoul National University = 119

This is pretty close to the QS ranking, but there are some problems. Peking and Fudan have higher trust and citation flow numbers than higher ranked universities, so what (if we were to use the web data here) would justify placing them lower on the list?

Similarly, Seoul National University has a higher trust flow number than the universities immediately above it on the list. So what data could we use to justify putting it below those universities?

Let’s look at some more numbers and see if we can come up with some ways to “deduct” points.

links and domains

We can start by looking at the ratio of referring domains to the total number of links. Let’s imagine that we create an algorithm that deducts points from a ranking if university has a high ratio between the number of links to a university site and the number of domains referring those links.

Were we to do that, we would see the following:

National University of Singapore (NUS) = 93/1

The University of Hong Kong = 175/1

Nanyang Technological University (NTU) = 107/1

Tsinghua University = 295/1

Peking University = 489/1

Fudan University = 503/1

The Hong Kong University of Science and Technology = 113/1

KAIST- Korea Advanced Institute of Science & Technology = 221/1

The Chinese University of Hong Kong (CUHK) = 113/1

Seoul National University = 25,649/1

By looking at this ratio, we can see that Seoul National, Peking and Fudan all have high ratios. Let’s say that our algorithm was set up to deduct points from universities that have a higher link to referral domain ration of 300/1. Those three universities would all get pushed down in the rankings.

language

Let’s now look at the languages of the referring links. The Singaporean universities are largely linked to in English, the Hong Kong universities and KAIST are half English half Chinese or Korean, respectively, whereas Tsinghua, Peking, Fudan and Seoul are almost entirely linked to in Chinese or Korean, respectively.

In our algorithm we could consider English as a sign of “international standing” and could deduct points from universities that are heavily linked in non-English languages.

Everything that I have just said here is imaginary, but I think it should lead us to pause and think. What exactly is it that warrants the massive investment in time, energy, and human resources at universities across Asia and the around the globe to facilitate the ranking process (or ranking “industry”)? Extant website data suggests that all of this could be done just as well by running existing data through a simple algorithm.

Then imagine all of the things that universities could do with the time and energy that this would free up. . . ?

You Might Also Like

Vin(group) University, Melbourne U. and the Future of the Humanities (and Asian Studies)

Baby Boomer Politics and Southeast Asian History/Studies

The Digital Age World Does Not Need Southeast Asian Studies – And That’s the Problem