The Sustainable Developments Goals (SDGs) of the United Nations refer to big problems in society that need to be addressed right away. We created an analysis that helps to see if scientific work is related to these goals. What are these SDGs, why are they important for science and how does our tool help? To cite the United Nations itself:
The 2030 agenda for sustainable development adopted by all United Nations Member States in 2015, provides a shared blueprint for peace and prosperity for people and the planet, now and into the future. At its heart are the 17 Sustainable Development Goals (SDGs), which are an urgent call for action by all countries - developed and developing - in a global partnership.
The 17 SDGs range from ‘zero hunger’ and ‘gender equality’ to ‘climate action’. They come with clearly defined targets and indicators to measure the progress towards achieving these goals. Although this agenda is a call for action on countries, clearly there is a big role for the scientific community. A lot of research has already been done in these 17 areas, and more will need to be done. However, both for policy makers and scientific institutions it is often not clear if research is related to an SDG. Knowing what research is related to an SDG would benefit informed decision making by policy makers, but it would also help scientists to be aware of the potential societal impact of their work.
Going through large bodies of scientific work by hand in order to determine if they are related to SDGs would take way too much time, so we need an automatic way. One way to do this is by creating Boolean search queries for every SDG, that return articles related to that SDG from a scientific article repository. For example, in order to find texts related to SDG1 (No Poverty) you could look for all texts that contain ‘poverty OR inequality OR low income’. This will find some texts related to the SDG, but you will also find texts containing ‘mathematical inequality’, and you will not find texts that use ‘salary’ instead of ‘income’. This means you have to go back and improve the Boolean query. You might use ‘income inequality’ instead of just ‘inequality’ to remove the mathematical texts from the query.
A number of groups have gone through this process of creating high quality Boolean queries, with the help of university librarians and experts in the field of the specific SDGs (See for example , ). It is interesting to note that different queries for the same SDG can give very different results (). This shows that there are many choices involved in creating such a query and these choices have a big effect on the outcome.
Some advantages of the Boolean query method are that it is clear why a text is or isn’t related to an SDG according to the Boolean query. Moreover, you can relatively easily try to adapt the query to include or exclude a group of papers by adding extra terms to the query. Some disadvantages are that it is a labor intensive process, that requires expert knowledge. Also, it is difficult to make sure that a term in the Boolean query is used in the right context, and it is difficult to include all synonyms of terms in the query.
To deal with these last two problems, we tried to use machine learning methods. Over the last few years, deep learning models have been doing tasks in natural language processing that seemed impossible ten years ago. These models used to require huge amounts of data, but now there are also pretrained versions that you only need to finetune on your specific natural language processing task. In collaboration with the Aurora Network we used the data they obtained when creating Boolean queries. With this data we could finetune a deep learning model that can classify to which SDGs a scientific text is related.
A big upside of this deep learning model is that it takes account of synonyms and context. So if ‘salary’ is used instead of ‘income’, the model should have no problem understanding it. Also, the model is able to learn connections that are impossible to make by just using a Boolean query. The downside is that it can be very unclear how the model reaches a decision. Why does it include one text, but exclude the other? The deep learning model is better at telling if a text is related to an SDG, but the process is more opaque.
We implemented the deep learning model in Impacter. The model analyses the text you provide and returns a list of SDGs that are most related to it, together with the confidence level of the model. This makes it easy to see if your work is related to an SDG and if it might be worth it to mention this explicitly. All in all, the model can make useful suggestions about SDGs to which your work might be related. This can help you see connections between your work and bigger societal problems.
 Jayabalasingham, Bamini; Boverhof, Roy; Agnew, Kevin; Klein, L (2019), “Identifying research supporting the United Nations Sustainable Development Goals”, Mendeley Data, V1. doi: http://doi.org/10.17632/87txkw7khs.1  Vanderfeesten, Maurice, Otten, René, & Spielberg, Eike. (2020, July 2). Search Queries for "Mapping Research Output to the Sustainable Development Goals (SDGs)" v5.0 (Version 5.0). Zenodo. doi: http://doi.org/10.5281/zenodo.3817445  Caroline S. Armitage, Marta Lorenz, Susanne Mikki; Mapping scholarly publications related to the Sustainable Development Goals: Do independent bibliometric approaches get the same results?. Quantitative Science Studies 2020; 1 (3): 1092–1108. doi: https://doi.org/10.1162/qss_a_00071