Automate academic funding process with Python programming

This article provides an overview of the possible cases to automate an academic subsidy process using python programming. The possible readers of this article are policy officers, program coordinators and parties interested in academic funding process. It also contains information about the monitoring of the relevant data, regularly requested by the board of a funding organization.

Central question: “How can a funder make funding process efficient with automation?”

The process of automation is given hereunder for a general funding process and is limited to a selective possibilities.

PLANNING PHASE

This phase describes examples of requests by policy officers, for example, about survey analysis and about data connection possibilities with external parties.

Analyzing texts

The analyses of discussions, notes, meetings, minutes during the planning phase are important for the planning of a subsidy round. This includes clustering sentences by topic and topic modeling based on the most common words. The board or the public can ask for an investigation in a specific theme (Covid). In such cases it is important to search for the keywords in the subsidy provider database. In some cases a bureaucrat or a politician might ask funding agency to pay attention to a certain part of the country. The scientific trend calculation is also important for new funding rounds to find out about keywords that are emerging.

Connecting data with external parties

You can use a python code to associate the subsidy data from the subsidy provider database with external sources (career cycles of applicants).

SUBMISSION PHASE

This phase describes examples of retrieving scientific output of an applicant from a database, checking similarity between two applications and assigning discipline to the applicants.

Check similarity between two applications: The comparison between applications is important for two reasons; to check for (self) plagiarism and for clustering applications with (almost) similar subjects.

Corpusdiff is a website built by Leipzig University where you can upload applications summaries for comparison. You will receive a score of deviation between the applications.
CompareD is a tool developed by the ‘text and data mininig’ group of Joint Research Center, European Commission. The tool compares subsidy applications submitted to various subsidy providers and the European database documents.

Assign Discipline to the Applicants

An applicant must choose a discipline when submitting his/her application. The choices are not always accurate and play a decisive role in the result. The division of applications into different disciplines by an algorithm can reduce the workload of the policy officers and they can check the choices further. You can use the clustering from previous funding rounds for the clustering for the current round. There is a code that recommends the discipline of an application based on texts in the title and summary with Spacy or TensorFlow algorithms. The accuracy of this method is dependent on disciplines assigned from past rounds.

Import scientific output

It is possible to import scientific output from researchers via digital object identifiers (DOIs) to find a conflict of interest during search for reviewers or committee selection.

ELIGIBILITY PHASE

This phase describes examples with regard to eligibility check or anonymity check of applications.

There are a number of submission rules for each funding round. Reading applications takes a lot of time and energy from policy officers. A policy officer must read the entire text to find irregularities, for example, has the applicant given his/her name in the text correctly or are the sums in the budget table correct? Is the maximum number of words appropriate for the summary? A python script executes this control automatically. Even with limited success of this check, the workload of a policy officer can be reduced. Furthermore, a natural language processing model (NLP) can search the name of the applicant in the application, if you want to know which data repository is used per application, on which license, who is responsible for the plan.

CLUSTERING PHASE

This phase describes automatic clustering of submitted grant applications.

The clustering of grant applications is currently done manually by policy officers, based on summaries or keywords. That takes a lot of time and energy. This clustering is also used to compose an assessment committee. One solution to the time and energy problem is to use a clustering algorithm on different parts of the application (title, summary, keywords or just texts) to classify applications. It is recommended to first cluster the applications with an algorithm and then have the clustering checked further by the policy officers.

REVIEW PHASE

This phase describes how to find reviewers for an application with a python script.

Link reviewers to application information

In the current situation, policy officers look for reviewers to assess applications manually or through professional parties. A python script can make this process more efficient by associating the content of requests with skills of reviewers. You can select the information about reviewers from a subsidy provider database or from the database of the professional party (e.g. Elsevier Expert Lookup) that the subsidy provider uses. The specific knowledge regarding the application from policy officers is not necessary. But the information about the reviewers within the database must be complete. There are two options:

option 1

  1. Classifying applications by discipline (explained in the previous section)
  2. Classify available reviewers in database per discipline
  3. Linking applications with reviewers in terms of disciplines

option 2

  1. Extract keywords from applications
  2. Extract keywords from reviewers’ CV
  3. Compare keywords from both extractions

Place grant applications information in Expert Lookup

Grant providers who use Elsevier Expert Lookup to search for referees for grant applications manually place the information for each grant round in the Expert Lookup. Elsevier Expert Lookup provides a format in which the lum sum proposal information can be uploaded. A python code can convert the lum sum proposals information into the required format with Tkinter graphic user interface or with Dash web browser.

POST AWARD ANALYSIS PHASE

This phase contains applications about award percentages per discipline, subsidy round, year, specific themes, male-female ratio or per institution. The board asks whether an award has been influenced by the nature of the applications and by the cooperation between the various applicants. Below are some examples:

Research areas

The board asks about research areas from specific disciplines.

  1. How often are the different areas used? Then the number of times requested and the average weight is the most interesting.
  2. For each area, what are the top 3 areas that are requested most often with that area?
  3. How many applications cover multiple disciplines and how many fall within one discipline?

Mono- and multidisciplinary, intra- and inter-domain relationship

  1. Are there disciplines that do particularly well in a subsidy round that supports larger partnerships or, conversely, in a subsidy round that makes small projects possible?
  2. How many applications with disciplines from multiple research domains have been submitted? Distinguish between applications with two main disciplines and applications with more than two main disciplines, and distinguish between the different rounds.
  3. Can it be said which combinations of sub-disciplines occur more or less frequently within applications that deal with two or more main disciplines? And are the percentages easy to obtain in an overview?
  4. Comparison between two rounds for applications submitted and applications successful.
  5. Which (sub)disciplines, institutes, keywords can define a scientific field and how great the link is with other disciplines to make an inventory of the breadth of the scientific domain in order to get an overview of

Ratio applicant and application in the rounds

  1. How often do you receive applications from the same researcher? Is this a new topic or a more or less resubmission of a previously rejected application? Based on the title or summary, give an indication
  2. Looking up researchers with specific keywords in the applications
  3. A board asks to look up a keyword in the application title and in the summary of submitted applications. Which researchers are engaged in research on the keyword in question?

Geographical origin and mobility of the main stakeholders

The main stakeholders are: applicants, committee members and referees.

Reviewers burden and availability

A board asks for the success rate of invitation acceptance by reviewers for application assessment and the male/female ratio of reviewers per main disciplines.