How to get free programmatic access to citations counts for a given paper?

If you are an R user, you can use the scholar package. This package allows you to analyze data from Google Scholar and obtain citations, publication list and even perform predictions of the h-index. An example R code:

library(scholar)
id = "xJaxiEEAAAAJ" # Isaac Newton's id
cit=get_citation_history(id)
barplot(cit[,2],names.arg = cit[,1])

Isaac Newton's citation history


I was hunting around for something like this recently to create citation count "badges" for papers. I came across scholar.py by Christian Kreibich. It uses BeautifulSoup to parse Google Scholar HTML output.

  • Extracts publication title, most relevant web link, PDF link, number of citations, number of online versions, link to Google Scholar's article cluster for the work, Google Scholar's cluster of all works referencing the publication, and excerpt of content.
  • Python module
  • Command-line tool prints entries in CSV format, simple plain text, or in the citation export format.

For example:

./scholar.py -c 1 --author "Hutchison" -t -A "Avogadro"

     Title Avogadro: An advanced semantic chemical editor, visualization, and analysis platform.
       URL http://www.biomedcentral.com/content/pdf/1758-2946-4-17.pdf
      Year 2012
 Citations 743

Wikidata has some incomplete citation information and you can access it programmatically via XML dumps, RDF dumps, web API and the SPARQL endpoint called Wikidata Query Service.

In Scholia at https://tools.wmflabs.org/scholia/, we use Wikidata Query Service to generate citation counts and citation list, see an example for "The Alzheimer's disease-associated amyloid beta-protein is an antimicrobial peptide" here: https://tools.wmflabs.org/scholia/work/Q21090025

If you follow the link on the page you can get to the SPARQL queries. For the SPARQL query that generate the "Citations to the work" table with citation count, the SPARQL query currently reads:

#defaultView:Table
# List of works that is cited by the specified work
SELECT ?citations ?publication_date ?citing_work ?citing_workLabel 
WITH {
  SELECT (MIN(?date) AS ?publication_date) (COUNT(?citing_citing_work) AS ?citations) ?citing_work 
  WHERE {
    ?citing_work wdt:P2860 wd:Q21090025 .
    OPTIONAL {
      ?citing_work wdt:P577 ?datetime .
      BIND(xsd:date(?datetime) AS ?date)
    }
    OPTIONAL { ?citing_citing_work wdt:P2860 ?citing_work }
  }
  GROUP BY ?citing_work
} AS %result
WHERE {
  INCLUDE %result
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,da,de,es,fr,it,jp,nl,no,ru,sv,zh" . } 
} 
ORDER BY DESC(?citations) DESC(?date) 

You can obtain the information programmatically, e.g., in Python

import requests

query = """
#defaultView:Table
# List of works that is cited by the specified work
SELECT ?citations ?publication_date ?citing_work ?citing_workLabel 
WITH {
  SELECT (MIN(?date) AS ?publication_date) (COUNT(?citing_citing_work) AS ?citations) ?citing_work 
  WHERE {
    ?citing_work wdt:P2860 wd:Q21090025 .
    OPTIONAL {
      ?citing_work wdt:P577 ?datetime .
      BIND(xsd:date(?datetime) AS ?date)
    }
    OPTIONAL { ?citing_citing_work wdt:P2860 ?citing_work }
  }
  GROUP BY ?citing_work
} AS %result
WHERE {
  INCLUDE %result
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,da,de,es,fr,it,jp,nl,no,ru,sv,zh" . } 
} 
ORDER BY DESC(?citations) DESC(?date) 
"""

response = requests.get('https://query.wikidata.org/sparql',
                        params={'query': query, 'format': 'json'})
data = response.json()['results']['bindings']
format = lambda paper: paper['citations']['value'] + ' ' + paper['citing_workLabel']['value']

>>> print("\n".join([format(paper) for paper in data[:5]]))
75 The genetics of Alzheimer disease
25 Alzheimer's disease - a neurospirochetosis. Analysis of the evidence following Koch's and Hill's criteria
23 Mild cognitive impairment: pathology and mechanisms.
22 Immunotherapeutic approaches for Alzheimer's disease
18 Amyloid-β peptide: Dr. Jekyll or Mr. Hyde?