Measuring Impact: A Brief Guide to Citation Indices

In practice, most mathematicians choose the journals to submit their research papers based on an informal sense of "reputation" that exists in the mathematical community. Essentially, one talks to their friends and colleagues to get a sense of how journals fall into an "excellent", "good", "fair", and "bad" stratification, and one also asks others for their opinions on which is the better of two among various pairs of journals.

Despite the prevalence of this informal notion of reputation, there have also been efforts to quantify the quality of scientific journals in terms of more objective measures, such as metrics based on citations to articles in the journal. The first attempt at doing so was the Impact Factor, proposed by Eugene Garfield in 1955 and first computed and published by Garfield's Institute for Scientific Information in 1964. (The Institute for Scientific Information was later purchased by Thomson in 1992, and now owned by Thomson Reuters.) For decades after its inception, the Impact Factor was the exclusive metric used to evaluate scientific and mathematical journals. Despite its predominance, there have been several criticisms of Impact Factor within the scientific community, including assertions that its reflection of a journal's quality is exaggerated as well as concerns regarding corporate influence. Despite this, there have been essentially no alternatives to challenge the Impact Factor's monopoly until the past decade. Once challenged, however, the floodgates have opened, and in the past 10 years there has been a proliferation of metrics proposed and published as alternatives to Impact Factor. Like the Impact Factor, all of these metrics are based solely on citations to articles published by the journal. However, they use different algorithms to compute how citations contribute to a journal's assigned score, and they use different sources for the data fed into the algorithm.

In 2005 Jorge Hirsch proposed one of the first alternatives to the Impact Factor, now known as the h-index in his honor, in a paper published in the Proceedings of the National Academy of Science. In 2007 Carl Bergstrom and Jevin West, professors at Washington University, established the Eigenfactor Project to compute and publish the Article Influence Score, a new metric based on the PageRank algorithm and using data provided by Thomsen Rueter's Web of Science. In 2008 the company Scimago, in partnership with Elsevier, introduced and began publishing a new metric called the Scimago Journal Index, which is computed similarly to the Article Influence Score, but using data from Elsevier's Scopus database. In December 2016 Elsevier launched a new metric called CiteScore as an alternative to the Impact Factor. In addition to all of these, the American Mathematical Society now publishes their own metric, called a Mathematical Citatation Quotient, using an algorithm similar to Impact Factor and CiteScore, but with data coming from the MathSciNet database.

Navigating and interpreting these metrics can be an exercise in confusion and frustration. The following is meant as a brief introduction to these metrics, including the algorithms used, the databases that supply citation information, and the various companies and other organizations that are involved.

What is a Citation Index?

A bibliometric index is a number produced from an algorithm for the purposes of comparing journals, books, or other publications. Scientific and mathematical journals are often assessed using a particular kind of bibliometric index, known as a citation index, which is based on the citations to articles published by the journal. Higher numbers are typically better, and the implicit idea is that the more a work is cited, the better that work is. (Of course, we can all think of specific situations where this is not true, and we can also expect there will be ways these numbers can be "gamed" and artificially inflated.)

It is important to remember that a citation index provides a single number intended to summarize "how much" a journal's articles are cited. Citation indices are often used as a proxy for the "impact", "quality", "influence", or even "excellence" of a journal. However, it is open to debate how closely a citation index correlates with these qualities. One should also keep in mind the adage "garbage in, garbage out" whenever working with algorithms. Unfortunately, it is common for people to overemphasize the usefulness of citation indices. This is due, in no small part, to the fact our human minds prefer to reduce the complicated comparison of multidimensional attributes to simply determining which of two numbers is larger.

The reality of measuring the "excellence" of a journal is much more complicated and subtle than assigning a single number via an algorithm. While citation indices may be a measure of journal excellence, it is probably best to consider them as providing insight into one particular aspect of that excellence -- specifically, how many authors are aware of the work in the articles published by the journals and influenced enough to cite it. Since no one citation index will give a completely accurate measure of this influence, it is a good idea to look at multiple citation indices in order to gain different perspectives and obtain a more wholistic picture of a journal's influence as measured by citations.^[1]

Finally, one should keep in mind that the numerical value of a particular citation index can vary dramatically from one discipline to another. This can be due to the nature of research in the discipline itself or a consequence of the "culture of citation" that has evolved in that discipline. Because of this, it is usually meaningless to compare the citation index of a mathematics journal to the citation index of a journal in another discipline, such as biology, chemistry, or physics. In general, mathematics journals tend to have very low citation indices compared to other disciplines. For example, the best mathematics journals tend to have impact factors that are an order of magnitude smaller than the best biology journals. Likewise, pure mathematics journals tend to have lower citation indices than applied mathematics journals. Consequently, one needs to take care even when comparing mathematics journals that publish different kinds of mathematics.

Three Things to Keep in Mind when Using a Citation Index

With analyzing and interpreting a particular citation index, there are (at least) three consideration to keep in mind:

What entity is computing and reporting the citation index?

One should always identify and evaluate the motivations of those providing information. This is particularly relevant for citation indices, since many are currently produced by companies with inherent bias or conflicts of interest. It is, of course, better if citation indices are calculated and reported by impartial entities. It is also preferable to have data that can be accessed by third parties (even after paying a fee) so that reported calculations can be reproduced and verified. One should never forget that the overriding goal of a company is to make money. When financial interests come into play, the truth will be compromised. In the pursuit of profit, companies may be incentivized to promote their particular metrics as being the best, charge for access to the citation indices they produce, or restrict access to their data. It is also not out of the question that some companies may lie. Much like the rise of predatory journals in recent years, disreputable companies have emerged that produce falsified citation indices.

What database is used for computing the citation index?

Where does the citation data used by the algorithm come from? Who collected this data, and who decided what (and what is not) included in it? How complete and accurate is this database? Who has the ability to access the database? Can the data be independently confirmed?

What is the algorithm used to produce the citation index?

How exactly is this citation index calculated from citation data? It is best if one can read and understand the algorithm directly, and the more transparent the algorithm, the better. A "black box" that mysteriously produces a number from data fed into it is largely meaningless and also vulnerable to manipulation. Moreover, as the principle of Occam's razor asserts: in the absence of other factors, a simple algorithm is preferred over a complicated one.

A Summary of Commonly Used Citation Index

We shall compare popular citation indices for mathematics journals, addressing the three considerations above as we do so. The following table provides a useful reference as we examine each of the citation indices, and as we compare and contrast them.

(1) Entities that Compute and Report Citation Indices

Thomson Reuters: Thomson Reuters is a Canadian multinational mass media company and the world's largest information company. Previously known as the Thomson Corporation, they purchased Reuters Group in 2008 to form Thomson Reuters.
Clarivate Analytics: Clarivate Analytics is a company focused on data and analytics, with an emphasis on data related to scientific publications. Clarivate Analytics was formerly the Intellectual Property and Science division of Thomson Reuters, but in 2016 Thomson Reuters struck a $3.55 billion deal in which they sold this division and spun it off into an independent company named Clarivate Analytics. Clarivate Analytics maintains a close working relationship with Thomson Reuters.
Google: Google is a well-known American multinational technology company that specializes in internet-related services and products.
Elsevier: Elsevier is a Dutch information and analytics company. Established in 1880 as a publishing company, it was known as Reed Elsevier until 2015. Elsevier also owns and publishes several science and mathematics journals and maintains various journal abstract and citation databases. Elsevier has high operating profit margins (37% in 2018)^[2], and its copyright practices have been criticized by researchers.
Scimago: Scimago (sometimes stylized as SCImago) is an analytics company based in Spain that works closely with Elsevier. There is little public information on Scimago. The Scimago website provides few details on the company, internet searches produce mostly third-party information with few specifics, and there is not even a Wikipedia article dedicated to Scimago.
The Eigenfactor Project: The Eigenfactor Project is an academic research project founded by Jevin West and Carl Bergstrom at the Washington University in St. Louis. They developed and publish metrics known as the Eigenfactor Score and Article Influence Score on their website eigenfactor.org. The Eigenfactor Project receives no funding or other financial compensation from Thomson Reuters or Clarivate Analytics, but they do exchange data with these companies. In particular, all raw data used by the Eigenfactor Project to compute their metrics comes from Thomson Reuters and Clarivate Analytics. In return, the Eigenfactor Project provides Thomson Reuters and Clarivate Analytics with the metrics they have computed and a no-fee license for their use. The Eigenfactor Project also receives funding and data from a number of institutions and organizations, including NSF, NIH, and JSTOR.
American Mathematical Society (AMS): The American Mathematical Society (AMS) is a professional society for mathematicians. The AMS supports a searchable online bibliographic database called MathSciNet, which it launched in 1996.

(2) Databases Used for Citation Indices

Web of Science (WoS) and Journal Citation Reports (JCR): Web of Science (WoS) is a collection of databases and scientific citation indexing service maintained by Clarivate Analytics. For a fee, subscribers may access multiple databases and perform citation searches. Journal Citation Reports (JCR) is an annual publication by Clarivate Analytics that provides certain information about journal citations, including Impact Factors, which are computed by Clarivate Analytics. Journal Citation Reports has been integrated with Web of Science and can be accessed by subscribers from within Web of Science collections. Although Web of Science access is restricted to subscribers, most academics can view Web of Science through their university or college libraries, which often purchase an institutional subscription. Web of Science began as the Science Citation Index, created and owned by the academic publishing service Institute for Scientific Information (ISI), which was founded in 1960 by Eugene Garfield who created the Impact Factor. When the Institute for Scientific Information was acquired by Thomson in 1992, the Science Citation Index was combined with other databases and renamed Web of Knowledge. The name Web of Knowledge was then later changed to Web of Science. Web of Science was subsequently part of Thomson Reuters until 2016, when it was given to the corporate spin-off Clarivate Analytics.
Scopus: Scopus is Elsevier’s abstract and citation database, and it was launched in 2004. Many users find Scopus easier to navigate than Web of Science (an opinion supported by an independent 2006 study). However, one advantage of Web of Science over Scopus is the depth of coverage, with the full Web of Science database going back to 1945 while Scopus only goes back to 1966. CiteScores (Elsevier's alternative to the Impact Factor) are computed by Elsevier using an algorithm similar to that used for the Impact Factor, but with data from Scopus rather than Web of Science. CiteScore values of journals are published on Scopus, and access to Scopus is restricted to subscribers. Much like the Web of Science, most academics have access to Scopus through their university or college libraries, which often purchase an institutional subscription. In addition, the company Scimago uses the Scopus database to compute two metrics: the Scimago Journal Rank (SJR) and h-index. Both the SJR and Scimago's h-index are freely available on the Scimago website.
MathSciNet: MathSciNet is a searchable online bibliographic database created and supported by the American Mathematical Society. MathSciNet was launched in 1996. Access is by subscription only, and is not generally available to individual researchers who are not affiliated with a larger subscribing institution. Unlike other abstracting databases, MathSiNet is carefully maintained and provides an easily searchable database of reviews, abstracts, and bibliographic information. MathSciNet also takes great care to identify authors properly, and its author search allows the user to find publications associated with a given author record even when multiple authors have exactly the same name or the same person publishes under name variants. Mathematical Reviews personnel will sometimes even contact authors to ensure that MathSciNet has correctly attributed their papers. MathSciNet is supported by the AMS and subscription fees.
Google Scholar: Google Scholar is a database and accompanied search engine that indexes the full text and metadata of scholarly literature on the world wide web. Google Scholar was released by Google in 2004. It has a more extensive depth of coverage than Web of Science or Scopus, and it indexes most peer-reviewed online academic journals and books, conference papers, theses and dissertations, preprints, abstracts, and technical reports. However, this depth of coverage has been obtained at the cost of certain inaccuracies, which have led to criticism. Google Scholar has been accused of counting the same citation multiple times when the source is listed in different formats (e.g., the citation coming from a published paper as well as the arXiv preprint of that same paper). There are also documented problems with incorrect field detection, due to symbols and characters in titles (a common occurrence in mathematics papers) or because of authors using name variants (which can cause authors to be assigned to the wrong papers). Google Scholar has also been criticized for not vetting journals and for including predatory journals in its index, both of which inflate citation numbers. Google Scholar's search engine is freely available at scholar.google.com.

(3) Citation Indices and Their Algorithms

Impact Factor and 5-year Impact Factor

The Impact Factor (formally, the Journal Impact Factor) for a given year is the number of citations from all articles published in that year to articles published by that journal in the preceding two years, divided by the number of articles published by that journal in the preceding two years. For example, given journal J, the Impact Factor for 2017 for journal J is equal to:

IF₂₀₁₇ = (# of citations in 2017 to articles published by J during 2015--2016) / (# of articles published by J during 2015--2016)

The number IF₂₀₁₇ may be interpreted as the average number of citations each article published by journal J in 2015 and 2016 received in the year 2017. Of course, the value of impact factor depends on how one defines "article" as well as how one defines "citation". Details on these definitions can be found in the Wikipedia article on Impact Factor.

Impact Factors are reported in the Journal Citation Reports (JCR), an annual publication by the company Clarivate Analytics. One can independently verify the number of citable items for a given journal on the Web of Science. However, the number of citations is extracted not from the Web of Science database, but from a dedicated JCR database, which is not accessible to the general public. Consequently, the commonly used "JCR Impact Factor" is a proprietary value, which is defined and calculated by Clarivate Analytics and cannot be verified by external users. To emphasize their ownership, Clarivate Analytics even requires a particular form for references to their proprietary values:

If you plan to cite JCR, we require that references be phrased as “Journal Citation Reports” and “Journal Impact Factor,” (not just “Impact Factor”) and, the first time they are mentioned, they must also be acknowledged as being from Clarivate Analytics. For example, “Journal Citation Reports (Clarivate Analytics, 2018).” --source: Clarivate Analytics announcement.

Clarivate Analytics also publishes 5-Year Impact Factors. The 5-Year Impact Factor is similar to the Impact Factor, except it is calculated over five years rather than two. Specifically, one adds up the number of citations by articles in the given year to articles published by journal J during the preceding five years, divided by the number of articles published by journal J in that preceding five-year period. For example, given journal J, the 5-Year Impact Factor for 2017 for journal J is equal to:

5YR-IF₂₀₁₇ = (# of citations in 2017 to articles published by J during 2012--2016) / (# of articles published by J during 2012--2016)

The number 5YR-IF₂₀₁₇ may be interpreted as the average number of citations that articles published by journal J in the 5-year period 2012--2016 received in the year 2017. One can also view the usual Impact Factor as a 2-Year Impact Factor.

CiteScore

CiteScore is computed in the same way as the Impact Factor, but the average is taken over the prior three years and uses Scopus (Elsevier’s abstract and citation database) rather than Clarivate Analytics' JCR database. Thus the formula for the the 2017 CiteScore of journal J is

CS₂₀₁₇ = (# of citations in 2017 to articles published by J during 2014--2016) / (# of articles published by J during 2014--2016)

where citation data is taken from Scopus.

The CiteScore metric was launched in December 2016 by Elsevier as an alternative to the predominantly used JCR Impact Factors. Like the JCR Impact Factor, CiteScore is a proprietary value and although one can independently verify the number of citable items for a given journal on Scopus, the number of citations comes from a private database. Since Clarivate Analytics and Elsevier are both companies with the goal of making money, it is accurate to view the Impact Factor and CiteScore as competing products.

We've already mentioned two differences between Impact Factor and Cite Score: (1) Impact Factor uses citations collected from the previous 2-year and 5-year periods, whereas CiteScore uses citations from the previous 3-year period. (2) Impact Factor uses the Clarivate Analytics' JCR database, while Cite Score uses Elsevier’s Scopus database. There is another important difference: the definition of the "number of publications" or "citable items". While JCR excludes certain items it considers to be minor because they make very few citations to other articles (e.g., editorials, notes, corrigenda, retractions, discussions), CiteScore counts all articles without exception. As a result, CiteScore values are typically lower than Impact Factors, due to dividing by a larger number in the denominator. In fact, it is not unusual for a CiteScore value to be less than half of an Impact Factor due to so many additional items being counted as articles.

CiteScore values of journals are calculated by Elsevier and published on Scopus. They are available to all Scopus subscribers.

Mathematical Citation Quotient (MCQ)

The Mathematical Citation Quotient (MCQ) is MathSciNet's alternative to the Impact Factor. The MCQ of a journal is computed over a 5-year period using the same formula as for the 5-Year Impact Factor, but with MathSciNet data in place of Clarivate Analytics' JCR database. MathSciNet is supported by the American Mathematical Society, and the citation data used to compute the MCQ is transparent and visible to anyone with access to MathSciNet. Searching for a journal on MathSciNet will display a page containing a profile for the journal that includes the MCQ with links to all the relevant citation data in the MathSciNet database.

Eigenfactor Score and Article Influence Score

The Eigenfactor Score is computed using a variant of PageRank, the same algorithm used by Google to rank websites for searches. The basic idea is to account for not only quantity, but also quality, of citations. In particular, a citation from an article in a journal with a high score contributes more than a citation from an article in a journal with a lower score. This algorithm can be made precise and the theoretical background can be understood by anyone with undergraduate-level knowledge of Linear Algebra. (For a readable account, see Ch.4 of "Google's PageRank and Beyond" by Amy N. Langville and Carl D. Meyer or visit the Wikipedia article on PageRank.) Moreover, computing the score for journals amounts to calculating an eigenvector for a connectivity matrix determined by the network of citations. Hence the name Eigenfactor Score.

Eigenfactor Scores are considered more robust than Impact Factors. While a journal with a few citations from highly influential authors may have a low Impact Factor, its Eigenfactor Score is typically higher. Unfortunately, unlike the situation with a website where simply adding more pages to the site doesn't usually influence the number of incoming links, larger journals will have higher Eigenfactor Scores due to publishing more articles. Eigenfactor Scores grow linearly with the size of a journal (e.g., doubling the size of a journal and number of articles will roughly double the Eigenfactor Score). The Article Influence Score is meant to account for this, and it is defined as the Eigenfactor Score divided by the number of articles published by the journal, which is then multiplied by a fixed scaling factor. The Article Influence Score is a number that reflects the influence of the average paper in the journal. Eigenfactor Scores and Article Influence Scores are computed using citation data from the previous five years.

The Eigenfactor Project computes and publishes Eigenfactor Scores and Article Influence Scores on the website eigenfactor.org, where they are freely available. The algorithm they use is described in full detail with pseudocode and source code also avaiable. The data that the Eigenfactor Project uses comes from Thomas Reuters' Web of Science and the JCR database owned by Clarivate Analytics.

Scimago Journal Rank (SJR)

The Scimago Journal Rank is a metric similar to the Article Influence Score. It is computed by Scimago Lab, a for-profit company that works jointly with Elsevier. The algorithm for SJR uses PageRank to determine an "average prestige per article score" similar to how the Eigenfactor Project computes the Article Influence Score. Scimago makes its algorithm for computing SJR publicly known. However, whereas the Article Influence Score uses the past five years of citations from the the JCR database owned by Clarivate Analytics, Scimago instead uses the past three years of citations from Elesevier's Scopus database to compute the SJR. Scimago makes the SJR values freely available on the Scimago website.

h-index

The h-index is a count of the number of articles exceeding a certain threshold. Specifically, the h-index of a journal is defined to be the maximum value N for which the journal has published N papers that have each been cited at least N times. So, for example, suppose we want the h-index for a 3-year window for journal J. We would ask first whether there is at least 1 publication in journal J during this window with at least 1 citation. If the answer is "yes", we then ask whether there are at least 2 publications in journal J during this window that each have at least 2 citations. Continuing, the last integer N for which we can answer "yes" to the question "Are there N articles in the journal each with at least N citations?" is the h-index of journal J for this 3-year window. The h-index also called the Hirsch index or Hirsch number, after its creator Jorge Hirsch, and it is sometimes also capitalized as H-index. Hirsch first proposed the index in a 2005 paper published in the Proceedings of the National Academy of Science.

Many different entities calculate h-indices, and each entity makes its own choice for the window of time used as well as the database from which citation data is drawn. Typical values used for the window of time are 2 years, 3 years, or 5 years. Common choices for the database are Web of Science, Scopus, Google Scholar, and MathSciNet. The value of the h-index will, of course, depend on both the window chosen and which database is used. So one should use care when comparing h-indices computed by different sources.

Both Web of Science and Scopus allow users to search for a journal's h-index and set the desired publication window as part of the search. Scimago publishes a 3-year h-index calculated from Scopus data that is freely available on their website. Google Scholar publishes what it calls an h5-index, which is the h-index using a 5-year window with data used from the Google Scholar database.

FOOTNOTES

[1] Of course, the reality of a struggling academic is that when looking at citation indices, you may be doing so to maximize perception of excellence, rather than attempting to measure excellence itself. For instance, if you want to impress a hiring, tenure, or promotion committee that places emphasis on certain metrics, it makes sense to maximize those metrics when choosing the journals you submit your work to --- particularly if doing so doesn't compromise any aspects of your research or its dissemination.

[2] See p.88 of the 2018 RELX Group Annual Report.