Maximizing the impact of research data


Published at  2023-06-13 by  Ana Rodrigues

OpenAI's ChatGPT, trained on extensive textual datasets from the internet, serves as a prime example of how research data can yield remarkable outcomes. It utilizes knowledge of many openly availabe machine learning models, and was trained on huge datasets scraped from the internet. However, the recent lawsuits faced by AI companies underscore the need to carefully consider how data is shared and repurposed. A crucial question arises: Do you want your research data to be used for purposes unrelated to its original intent?

Enabling Accessibility and Reusability

If your answer to the above question is affirmative, there are several steps you can take to ensure your research data is better accessible and reusable:

1* Publishing Datasets on Dedicated Platforms: Make use of dedicated platforms such as Zenodo to publish your datasets, ensuring clear metadata and improved discoverability. By providing comprehensive information about your datasets, you enhance their visibility and accessibility to the wider research community.

2* Support Initiative for Open Abstracts through your institution. It promotes the availability of freely accessible abstracts and articles beyond the confines of specific publishers' portals.

3* Sharing Research Code: Share your research code on platforms like GitHub, enabling transparency, reproducibility, and reusability. By making your code accessible, you empower other researchers to validate and build upon your work, fostering a culture of collaboration and advancement.

Steps like these represent valuable research outputs that go beyond the original intentions and scope of the research.

Licensing Considerations

The ChatGPT example highlights the importance of carefully selecting the licensing approach when publishing research data. By choosing the appropriate license, you can define the terms of use and safeguard the integrity of your data, ensuring it is used in line with your original research intentions. In this way you allow others to use your work in an ethical way. For example, code published without any license can not be used by other by default.

Open Data at Impacter

At Impacter, we recognize the value of open datasets and actively use them and try to contribute to them. For example, to facilitate comparisons of research proposals to previous, we leverage CORDIS. To enhance our search algorithms, we utilize the Synergy dataset, comprising of fully labeled systematic review datasets. OpenAlex serves as a source of bibliometric information of research output. We try to give back by contributing to the codebase of ASReview, the organisation behind the Synergy dataset, and by reporting bugs and suggesting improvements to OpenAlex.

The manner in which research data is shared and utilized can have a profound impact on the research community and beyond. It is crucial to carefully consider how data is stored, shared, and licensed. By embracing openness, accessibility, and reproducibility, researchers can amplify the impact of their work.