[Goanet] {Dilip's essays} What citation numbers can be tweaked to suggest

Dilip D'Souza Sun, 18 Dec 2022 11:48:44 -0800

Dec 9

Baba Ramdev is one of India's many "godmen", but one who has risen to power
and prominence in the last 10-12 years. He is politically close to the
party and people in power in India now, and who knows, maybe that's helped
him rise.

One of his endeavours is Patanjali, that produces all manner of ayurvedic
drugs, but also toothpastes and shampoos and so forth. Not long ago, our
newspapers carried a large Patanjali ad which made a startling claim ...
well, take a look at the attached image.

So I went digging to try to confirm this claim. What I found left me
somewhat confused, and perhaps that will apply to you too. But what there's
no confusion about is that the "top 2%" is wrong. More like "bottom 5%".

Take a look: What the number of citations can be made to suggest by
tweaking them,
https://www.livemint.com/opinion/columns/what-the-number-of-citations-can-be-made-to-suggest-by-tweaking-them-11669922167402.html

Let me know if you know something about citation indices and this
particular use of them. Let me know any thoughts about Patanjali.

cheers,
dilip

----

What citation numbers can be tweaked to suggest

Heard of the Erdös number? It's named for the great Hungarian mathematician
Paul Erdös. As I wrote in this space some years ago: "He worked with other
mathematicians ... and that's where the idea for the Erdös number came
from. If you collaborated with him on a paper, your Erdös number is 1.
(About 500 such mathematicians). If you collaborated with someone who had
collaborated with him, it's 2. (Over 9000). And so on. Your number measures
what you might call your 'collaborative distance' from the man. (Erdös
himself? 0, of course.)"

The number is really like a tongue-in-cheek tribute to the man, not a
serious measure of achievement. And yet it touches on a serious question:
is there a way to measure how good, or effective, a scientist is? A low
Erdös number means you have worked with some serious mathematicians, so
that does indicate that you have some worth as a mathematician yourself.
But still, it really is just in the nature of a tribute to a man who
touched so many.

There is, though, a more serious measure that's often used: how many times
a paper you have authored has been cited in other papers by other
scientists. You can probably tell that this "citation index" carries some
weight. For if another scientist cites something you have researched and
written up, it means that scientist found your work relevant and useful in
his work. And if several scientists cite your paper, it means you produced
something of some relatively wide relevance. If your paper continues to be
cited long after it is published, maybe even after you're dead and gone,
that speaks of the lasting impact of your findings.

Of course, this idea of an index can be tweaked. For example, what's the
calibre of the citations of your paper? Should a reference in Mint, for
example, count the same as a reference in Nature, or Scientific American?
Does a Mint reference mean that a wider audience than just academia is
reading your paper? If so, surely that suggests a broader understanding and
appeal? Besides, how good are the references you yourself cite - meaning,
how many of the important results in your field are you aware of while you
do your research?

Considerations like these go into the calculation of various metrics -
called h-index, g-index and more. The h-index, for example, is calculated
thus: of all a scientist's published papers, if some number "h" of them
each have h or more citations, and the rest have h or less citations, then
her h-index is h. So let's say researcher Sharvari has published 30 papers.
7 of them have each been cited at least 7 times each; the other 23 are each
cited 7 or fewer times. Sharvari then has a h-index of 7.

To give you a quick idea, Erdös has a h-index of 76, Albert Einstein 92.
Einstein's E=mc2 paper has been cited nearly 500 times; his special
relativity paper over 2000 times.

Like all statistical measures, these are used and interpreted in different
ways. Certainly Einstein should figure at or near the top of any ranking of
scientists. But considering the sheer number of scientific journals that
there are, some no doubt of dubious merit, it should hardly be a surprise
to find unknown or unexpected names on such rankings. These might be solid
scientists who are just not widely known; they might also be not-so-solid
ones who find ways to inflate their citation indices.

And in fact, one recent paper addresses exactly the use and misuse of such
citation metrics, apparently seeking a way to rationalize them ("September
2022 data-update for 'Updated science-wide author databases of standardized
citation indicators', John PA Ionaddis, Elsevier, 10 October 2022,
https://elsevier.digitalcommonsdata.com/datasets/btchxktzyw/4). It looks at
citation data for about 200,000 scientists around the world and attempts to
rank them according to a composite - called the c-score - of various
citation indices like the h-index. As the paper explains, "The c-score
focuses on impact (citations) rather than productivity (number of
publications)."

It's not clear to me how many scientists, whether on this list or not, pay
attention to a list like this. I suspect the more serious ones simply go
about their work, uninterested in such rankings. But be that as it may, how
did Ionaddis choose those approximately 200,000 scientists?

Let me quote his paper: "The selection is based on the top 100,000
scientists by c-score ... or a percentile rank of 2% or above ... 200,409
scientists are included in the single recent year dataset."

Never mind "single recent year". The language seems calculated to obfuscate
this process: in particular, how does 100,000 become 200,409?

Perhaps it's the percentile rank of 2%? That may be, because that number
takes in those who are above the bottom 2% in the rankings (what "2
percentile" means); in other words, the 200,000 make up the top 98% of
scientists in the overall list. To me, this seems the most reasonable
interpretation of this paper.

This is all leading somewhere, I promise. I was prompted to do this digging
by a recent large newspaper ad for Patanjali products featuring this claim:
"Pujya Acharya Balkrishna ji and the Patanjali (PRI) among top 2%
scientists in the world." Patanjali's website sent me to the Ionaddis paper
above, and from there to two enormous Excel files which contain data about
all these scientists.

The paper's only mention of 2% is "percentile rank of 2% or above". So "top
2% scientists" in the ad should actually be "top 98% scientists". Or,
equivalently, "scientists ranked higher than the bottom 2%."

Neither of which, you'll agree, is quite as eye-catching as "top 2%".

But wait, where exactly in the rankings is Acharya Balkrishna? With a
c-score "rank" of 367,268, he appears in the 192,004th row, of a total of
200,409.

That is, 95.8% of these scientists are ranked higher than Acharya
Balkrishna. That is, his percentile rank is 4.2. That is, 4.2% of the
scientists rank below him. Give that a thought.

--
My book with Joy Ma: "The Deoliwallahs"
Twitter: @DeathEndsFun
Death Ends Fun: http://dcubed.blogspot.com

--
You received this message because you are subscribed to the Google Groups
"Dilip's essays" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to dilips-essays+unsubscr...@googlegroups.com.
To view this discussion on the web, visit
https://groups.google.com/d/msgid/dilips-essays/CAEiMe8o2if8UkT_sxjBAh3JHALZEiPHeR4dwEiFrkKvYOFU8RA%40mail.gmail.com.

[Goanet] {Dilip's essays} What citation numbers can be tweaked to suggest

Reply via email to