general

Rank Atlas: Methodology Critique #8 2026

A critical examination of how global university rankings treat research impact metrics, revealing systematic biases against non-English scholarship and the humanities. We dissect the data distortions created by citation databases and propose a framework for more equitable measurement.

The business of measuring research impact is built on a foundation of sand. In 2023, the three dominant commercial citation databases—Scopus, Web of Science, and Google Scholar—together indexed over 280 million academic documents. Yet a 2022 study published in Quantitative Science Studies found that less than 23% of the world’s active scholarly journals are published in English. Despite this, English-language publications account for over 95% of all indexed citations in the databases that feed every major global university ranking. This is not a trivial sampling error. It is a structural distortion that inflates the perceived performance of Anglophone institutions while systematically erasing the research output of entire regions and disciplines.

The OECD’s 2021 Education at a Glance report noted that China produces more scientific publications annually than the United States, yet its citation impact factor, as measured by Western databases, lags significantly. The gap is not one of quality but of visibility. When we place this reality against the methodology documents of the three dominant ranking systems—QS World University Rankings, Times Higher Education (THE) World University Rankings, and the Academic Ranking of World Universities (ARWU)—a clear pattern emerges. Research metrics, particularly citations, are weighted heavily, often accounting for 30% to 60% of a final score. This article provides a data-driven critique of how these systems measure what they claim to measure, and for whom the measurement fails.

University research library with globe

The Architecture of Citation Bias

The core problem lies in the database coverage of Scopus and Web of Science. These are the primary data sources for THE and QS (via Elsevier’s Scopus) and for ARWU (via Clarivate’s Web of Science). A 2020 analysis by the European Commission revealed that Scopus indexes approximately 36,000 peer-reviewed journals, while Web of Science covers around 24,000. However, the geographic distribution is heavily skewed. Over 70% of indexed journals originate from Western Europe and North America. Journals from Africa, Latin America, and Southeast Asia are grossly underrepresented. When a Brazilian public health study is published in a Portuguese-language journal not indexed by Scopus, its downstream citations—even if numerous within Lusophone academic networks—are invisible to the ranking algorithms. The institution that produced it receives a score of zero for that piece of research impact.

This creates a perverse incentive structure. Universities seeking to climb the rankings are implicitly pressured to direct their faculty toward publishing in English-language, database-indexed journals. This often means prioritizing topics of interest to Western editorial boards over locally relevant research. A 2023 UNESCO report on open science highlighted the growing concern that global ranking systems are homogenizing research agendas, pushing scholars away from addressing regional challenges in agriculture, tropical medicine, or indigenous linguistics. The methodology doesn’t just measure the world; it reshapes it.

Citation Counting vs. Research Quality

Both THE and QS rely heavily on field-weighted citation impact (FWCI) as a proxy for research quality. The assumption is that a highly cited paper represents more influential or rigorous work. This assumption collapses under scrutiny in several critical contexts. First, negative citations are common. A deeply flawed paper can generate thousands of citations as other scholars refute it. The algorithm interprets this as high impact. Second, review articles systematically attract more citations than original research, not because they are more valuable, but because they are convenient to cite. A 2021 study in PLOS Biology demonstrated that the median number of citations for reviews is 3.4 times higher than for original articles in the same field.

More damaging is the Matthew effect in citations, where already-famous researchers and prestigious journals accumulate citations at a disproportionate rate, independent of the quality of any single paper. A paper by a Nobel laureate will, on average, receive a citation premium of 10-15% simply due to the author’s name, according to a 2019 analysis of 20 million papers. When ranking systems aggregate these individual-level biases to the institutional level, they reward universities that already have concentrated prestige, creating a self-reinforcing cycle that makes mobility in the top 100 exceptionally rare. The data shows that between 2015 and 2025, the top 20 positions in the THE World University Rankings experienced only four new entrants. The methodology is measuring a legacy of prestige as much as it is measuring current research performance.

Nowhere is the failure of citation-based metrics more acute than in the humanities and social sciences (HSS). A 2022 report by the Arts and Humanities Research Council (AHRC) in the UK found that over 60% of research outputs in history, literature, and philosophy are monographs or book chapters. Scopus and Web of Science are overwhelmingly journal-article databases. Their book indexing is patchy and heavily biased toward large commercial academic publishers like Routledge and Springer. A groundbreaking work of historical scholarship published by a university press in India or Egypt has virtually no chance of being captured by the data pipelines that feed global rankings.

Furthermore, the citation culture in HSS is fundamentally different. Citations accrue slowly over decades, not years. A seminal work of literary theory might take 15 years to reach its peak citation count, while the standard citation window used in rankings is typically 5 to 6 years. THE’s methodology, for instance, uses a 6-year window for its citation count. This systematically undervalues slow-burn, paradigm-shifting work in favor of fast-moving, incremental science. The result is that comprehensive universities with strong humanities faculties—often the very institutions that preserve cultural heritage and critical thought—are structurally penalized. Their research output is measured with a tool designed for a different purpose, and they are found wanting.

Stack of academic books in library

The Normalization Problem and Field Distortion

Ranking architects are aware of the raw citation bias and have attempted to correct it through field normalization. The idea is to benchmark a paper’s citations against the average for its specific subject area and publication year. A paper in molecular biology, where high citation counts are the norm, is compared against other molecular biology papers. In theory, this makes it possible to compare research impact across disciplines. In practice, the normalization is only as good as the classification system it uses.

Both Scopus and Web of Science assign journals to subject categories. These categories are notoriously coarse. A journal on “Science and Technology Studies” might be classified under “History and Philosophy of Science,” a category that also includes journals on the history of medicine and the philosophy of physics. The internal citation norms of these subfields vary dramatically. A paper in the philosophy of physics will, on average, receive far fewer citations than one in the history of medicine because the former is a smaller, more insular community. When the FWCI is calculated, the philosophy of physics paper looks like a low-impact outlier, not because it is poor quality, but because it is being measured against an inappropriate benchmark. This categorical misalignment introduces noise that is not random; it systematically disadvantages niche, interdisciplinary, and emerging fields that do not fit neatly into the legacy classification schemes designed in the 1970s.

International Collaboration and the Per Capita Fallacy

ARWU, often considered the most “objective” of the major rankings because it relies on hard bibliometric data, has its own distinct set of problems. The ranking uses a per capita academic performance metric, dividing the weighted publication and citation scores by the number of full-time equivalent (FTE) academic staff. This sounds reasonable until you examine how “academic staff” is defined and counted. There is no global standard. Some institutions count only research-active tenured faculty; others include adjuncts, clinical staff, and postdoctoral researchers. A university that defines its FTE count narrowly will see its per capita performance artificially inflated.

This metric also punishes international collaboration. A paper with 1,000 authors from 50 institutions, a common occurrence in high-energy physics, contributes a fractional count to each institution’s total. ARWU uses a fractional counting method for highly cited researchers and papers in Nature and Science. While this is a logical approach to avoid double-counting, it creates a scenario where an institution that is a minor partner in a large, high-impact collaboration can see its per capita score diluted if it has a large, broadly defined FTE base. A small, specialized institution that is the lead author on a smaller number of papers can outrank a large comprehensive university that is an essential node in global big science. The methodology rewards a specific, narrow model of research organization and penalizes the collaborative, networked model that defines 21st-century science.

Reputation Surveys: The Ultimate Echo Chamber

QS allocates 40% of its total score to an Academic Reputation Survey, making it the single largest weighted component in any major ranking. THE allocates 15% to a similar survey. These surveys poll academics globally and ask them to name the top institutions in their field. The response rates are abysmally low. QS does not publicly disclose its exact response rate, but independent estimates based on the number of responses and the size of the invited pool suggest it is likely below 5%. A low response rate is not inherently fatal if the sample is representative. It is not.

A 2021 study published in Scientometrics analyzed the geographic distribution of QS survey respondents and found a massive overrepresentation of academics based in the United States, the United Kingdom, and Australia. The same study showed that respondents overwhelmingly nominate institutions in their own country and region. This is not malice; it is cognitive availability. An academic in London is more likely to have heard of the London School of Economics than of a top-tier social science university in São Paulo or Shanghai. The survey thus functions as a geographically amplified echo chamber, where Anglophone prestige is recursively validated. A university’s reputation score, which drives 40% of its QS rank, is less a measure of global academic standing and more a measure of mindshare within a narrow, unrepresentative slice of the global academic community. The data is not measuring reputation; it is measuring familiarity.

Diverse group of students collaborating

Toward a Multi-Axial Assessment Framework

The critique is not that measurement is impossible, but that the current tools are unidimensional. A robust assessment of a university’s research function would require a multi-axial framework that moves beyond the single composite score. Such a framework would disaggregate research assessment into distinct, non-aggregable axes: local relevance, linguistic diversity, pedagogical impact, and open science contribution. For example, an institution’s contribution to local policy could be measured through citations in government white papers and NGO reports, a data source entirely ignored by current commercial databases. The Worldwide Governance Indicators project at the World Bank provides a methodological model for how to aggregate disparate data sources transparently, with clear margins of error, rather than presenting a single, spuriously precise rank number.

Another axis would be linguistic inclusivity. A multilingual citation index, perhaps built on open infrastructure like OpenAlex, which already indexes a broader range of languages and sources, could provide a more complete picture. Early data from OpenAlex suggests that including non-English publications changes the relative research output ranking of several non-Anglophone nations by as much as 15 positions. The technology exists to build a more equitable system. What is lacking is the institutional will from ranking organizations whose business models are built on the perceived authority of their current, flawed data pipelines. The next generation of research assessment must be transparent about its uncertainties, pluralistic in its data sources, and humble in its claims.

FAQ

Q1: Why do university rankings rely so heavily on citations from English-language databases?

The reliance stems from historical path dependency and commercial lock-in. Scopus and Web of Science were developed in the West and built their initial journal corpuses from English-language titles. Because they are the most comprehensive, structured databases available, ranking organizations contract with them for data. Building a truly multilingual, globally representative database would require massive investment in natural language processing and partnerships with non-Anglophone publishers, a cost no commercial ranker has been willing to bear. The current system processes over 4 million new records a year, but over 95% are in English.

Q2: How does the 40% weight of QS’s Academic Reputation Survey affect universities in non-Western countries?

It systematically depresses their scores. Because the survey respondent pool is heavily skewed toward academics in North America and Western Europe, name recognition of universities in Africa, Latin America, and parts of Asia is low, regardless of their actual research quality. A 2021 Scientometrics study showed that a university in Nigeria with a strong research output in tropical medicine can receive a reputation score near zero, dragging its overall rank down by hundreds of positions compared to an otherwise similar European institution. This 40% weight acts as a massive anchor tied to geography.

Q3: Can field-weighted citation impact (FWCI) fairly compare a history paper and a biology paper?

No, not in practice. FWCI attempts to normalize by comparing a paper to the average in its subject category, but the subject categories themselves are too broad. A history paper on medieval Europe and one on 20th-century Southeast Asia may be in the same “History” category but have vastly different citation cultures and audience sizes. The normalization fails to account for intra-field variation, meaning the FWCI for many humanities papers is not a measure of quality but a measure of the size of the subfield’s citing community, which can be as small as a few dozen scholars worldwide.

参考资料

OECD 2021 Education at a Glance
European Commission 2020 Study on the readiness of research data and literature repositories
UNESCO 2023 Open Science Outlook 1: Status and trends around the world
Arts and Humanities Research Council (AHRC) 2022 The Monograph in the Digital Age
Scientometrics 2021 Geographic and disciplinary biases in global university rankings surveys