Page cover

Building data science and AI skills for sustainable research development

Stakeholder needs assessment for the capacity building programs at Ersilia

Executive Summary

This report presents the key findings of a stakeholder needs assessment undertaken by the Ersilia Open Source Initiativearrow-up-right (Ersilia), a non-profit equipping laboratories in the Global South with AI tools for infectious disease research. The assessment was carried out as part of the capacity building programs for Artificial Intelligence (AI) focusing on under-resourced research areas and the scientists who work in them.

Ersilia’s mission is three-fold:

  1. Develop open source AI research software.

  2. Support and contribute to infectious and neglected disease research.

  3. Provide training and capacity building activities on the use and application of AI methods to infectious and neglected disease research.

Here, we highlight the current status of drug discovery for infectious and neglected diseases, our main field of research within the broader Global Health landscape, particularly within our target geographic area of focus, Africa. We also explore how our work intersects with the Sustainable Development Goals (SDGs), advancing towards the goals of the SDG2030 Agendaarrow-up-right.

The key findings of our stakeholder engagement activities revealed a clear and urgent need for long-term AI training, supported by adequate infrastructure and economic resources for the sustainable adoption of these novel technologies.

This report has been made possible thanks to the Roddenberry Foundationarrow-up-right Catalyst Award.

Introduction

The burden of disease

Africa suffers from a disproportionately high burden of disease and is the only continent in which communicable diseases claim more lives than non-communicable diseases. The most important infectious diseases in Africa are acute respiratory infections, diarrhea, HIV/AIDS, malaria and tuberculosis. Collectively, infections are responsible for over 60% of Africa’s total disease burden (measured in DALYsarrow-up-right), causing more than 2.4 million deaths per year [1arrow-up-right]. Such diseases are often referred to as “diseases of the poor”, as they affect the most deprived sectors of society, trapping them in a circle of poverty. The high disease burden in Africa is compounded by the predictions that the continent will double its population by 2070, challenging its already underfunded and limited healthcare system [2arrow-up-right].

circle-info

Throughout the report, Africa is referred to as the WHO Africa Region, comprising 47 countries of which 40 are classified as low and lower middle income (LLMIC) by the World Bank.

In addition to these demographic pressures, this burden is intensified by climate variability, extreme weather events, and long-term climate changes, which intersect significantly with rapid urbanisation, widespread informal settlements, governance complexities, and limited resources, creating considerable health challenges in African urban contexts. Rising temperatures, a prominent climate hazard, are increasingly straining healthcare systems that are already facing substantial challenges from existing disease burdens. However, this risk remains poorly acknowledged within health and urban development plans across the continent [3arrow-up-right,4arrow-up-right]. Additionally, recent health crises such as COVID-19 and urban cholera outbreaks have highlighted the complex, interconnected nature of climate and health-related risks, emphasizing the urgent need for holistic, integrated approaches, including the development of new and affordable drugs.

Despite this urgent need, only 14% of the drugs currently in development globally tackle infectious diseases [1arrow-up-right]. While there has been an improvement since the definition of the 90/10 gap, drug discovery and development is subject to market forces and therefore, focused on more profitable diseases affecting high income countries, neglecting the needs of the more than 1 billion people living in Africa.

Addressing these intertwined issues demands a renewed commitment to research and development (R&D) within countries experiencing the highest burdens. Developing and deploying novel AI and data science tools can accelerate sustainable research in low-resourced settings. By focusing on diseases that disproportionately affect impoverished communities, we do not only target SDG 3arrow-up-right (Good Health and Well-being) but also actively contribute to other SDGs such as SDG 4arrow-up-right (Quality Education), SDG 9arrow-up-right (Industry, Innovation and Infrastructure) and SDG 17arrow-up-right (Partnerships for the Goals) (Figure 1A). To ensure this potential is realized, targeted investment is needed—not only in technology, but also in open science, equitable access, and skills development. These actions are key to enabling AI to meaningfully support global health research and progress towards the SDGs.

circle-info

The 9/10 gap refers to the 1990 Global Forum on Health Research, which stated that less than 10% of worldwide resources devoted to health research were put towards health in Developing Countries, where over 90% of all preventable deaths worldwide occurred.

The importance of scientific research

Scientific and technological research is one of the pillars of long-term economic growth [5arrow-up-right]. While the return on investment for research is often indirect, it has been estimated that every dollar invested in neglected tropical disease R&D can yield up to $405 in economic returns [6arrow-up-right].

Despite the growing disease burden and the critical need for investment in research, Africa is significantly behind in building a strong local scientific ecosystem. The continent allocates an average of 0.6% of its GDP in research, compared to 2.22% in Europe. Strengthening in-country research is crucial for developing locally relevant solutions to the challenges faced by countries in the Global South. It is the only sustainable path toward equitable and long-term development.

Health research and AI

Today, scientific research cannot be decoupled from the advances in AI, which has surged across several fields, promising to revolutionize R&D, boost productivity and automate labour-intensive tasks. In biomedical research alone, AI-backed initiatives have raised billions in venture capital in 2024, and the 2024 Nobel Prize in Chemistryarrow-up-right was co-awarded to the inventors of AlphaFold, an AI tool for protein folding prediction. However, access to these advancements remains profoundly unequal. The AI Preparedness Index [7arrow-up-right], which measures a country’s ability to leverage AI based on skills, infrastructure, and policies, reveals stark global disparities, with African nations consistently ranking at the bottom (Figure 1B).

Figure 1. AI and the SDGs. A. Intersection of Ersilia and the SDG advanced by our mission. B. AI Readiness Index grouped by countries according to the WHO World regions. Box plots indicate median and whiskers 25th/75th quartiles.

We are now at a turning point. AI may be uniquely suited to solve the pressing health needs of the African continent, with its potential to accelerate innovation and research. Yet, if left unaddressed, the rise of AI risks widening the global divide, leaving lower-income economies behind, as previous development waves have done. Ensuring equitable access to AI tools and expertise is crucial to preventing this gap from deepening.

For example, in the field of Global Health research, AI could widen the existing disparities by:

  1. Exacerbating “helicopter research”. As AI becomes routinely incorporated into scientific research, the risks of extending neocolonialist practices that have been long denounced (and mitigated to some extent) surface again [9arrow-up-right,10arrow-up-right]. Scientists in well-funded, Western research organisations benefit from in-house AI expertise that is unavailable to most researchers in the Global South and hence are better positioned to take the lead in research design, priority setting and project goals, or even in using AI methods locked behind paywalls that are inaccessible to their local counterparts.

  2. Propagating biases in research focus. AI is inherently data-hungry. Thus, new AI methods are only expected to emerge first for better studied research areas, for which a wealth of information is already available. For instance, genomic databases are skewed towards Western populations, while African ancestry is severely underrepresented [11arrow-up-right].

  3. Persisting funding inequity. Scientific research funding is primarily driven by the perceived innovative aspect of the proposed study. As AI becomes a centerpiece of many studies, scientists who can integrate AI expertise into their work will be much better positioned to obtain awards and grants and continue their lines of research.

circle-info

Helicopter or parachute research refers to a practice where researchers from wealthier countries conduct studies in lower-income countries with little involvement of local researchers or community member [8arrow-up-right]

circle-check

Needs Assessment

The Ersilia Open Source Initiativearrow-up-right is a Catalan Foundation with the mission to equip laboratories in the Global South with AI tools for infectious disease research. In the last five years, we have established partnerships with leading research institutions across the African continent, co-developing successful drug discovery research programs in malaria, tuberculosis and other diseases. In parallel, we have prioritized capacity building and training on-site, ensuring the long-term sustainability of our initiatives. Our offerings currently include in-person week-long workshops on AI for drug discovery, designed for chemists and biologists, combined with ongoing online support. Through our experience in the field, we have recognised the strong demand and need for this kind of support. As a result, Ersilia is re-evaluating its training programs to better address the needs of researchers in the field.

We conducted a two-tiered stakeholder needs assessment to design a comprehensive, hands-on AI training program tailored to the needs of African drug discovery researchers. First, we gathered a broad overview of African biomedical researchers’ current AI training needs through an anonymous survey with 101 respondents. Next, we conducted six in-depth interviews with selected representatives of the user personas, as defined in the methodology section, to gain deeper insights. The findings from this consultation, detailed below, will shape the development of the future training programs at Ersilia.

circle-info

Scope of the engagement. Ersilia’s core research programs are currently focused on drug discovery in low-resourced countries, shaping the focus of our stakeholder engagement. Our engagements, therefore, targeted researchers in Africa involved in various aspects of drug discovery, including chemists, biologists, and computer scientists. The user personas for the interviews were carefully designed to align with the scope of the survey, ensuring that the gathered information accurately reflects the needs of this research community.

Results

Survey findings

Demographics

The survey received responses from 101 respondents, covering all African regions, with West African respondents overrepresented by 1.5-fold compared to all other regions pooled together. To ensure the analysis remained focused on the target audience, responses from non-African participants were excluded, resulting in a final sample size of 98. The gender distribution among respondents reflects the global trends in research (~31% women) [12arrow-up-right]. Most respondents were MSc, PhD or postdoctoral researchers, demonstrating that the surveyed population had significant academic experience.

Current expertise in AI

All participants stated that they believed AI would increase the speed of their research. While 64.2% of the respondents had already used AI in some capacity, when explicitly asked about research goals, all indicated that they were either planning on (45.5%) or already incorporating (54.5%) AI in their research. To assess technical proficiency, participants were asked to rate both their knowledge of AI and their programming sufficiency (in any language, for example, Python or R), from 1 (non-expert) to 5 (able to work with several AI frameworks and develop and deploy my own models / excellent mastering of a programming language). Surprisingly, the stated median AI knowledge was 3, while programming proficiency was lower, at 2. Since strong programming skills are fundamental to AI development, many respondents perceive AI use primarily as interacting with pre-built tools (e.g., chatbots) rather than developing AI models from scratch or using them at a more specialized level. Indeed, many of the examples given by participants on AI usage relate to Large Language Model (LLM)-based apps to enhance day-to-day activities of researchers, such as grant writing (ChatGPT) or literature review, rather than scientific discoveries enabled by AI.

Over 60% of the respondents had previously attended some form of AI training. Yet, lack of expertise remained the most selected barrier to improving their AI usage (62%), closely followed by lack of infrastructure (computational needs, stable internet connection, unreliable electricity, etc), and cost constraints, reported by 48% and 44% of respondents respectively (Figure 2A).

Figure 2. Barriers and needs to successfully implement AI for scientific research. A. Major barriers highlighted by survey respondents to use AI tools in their day-to-day research. B. Fields of training required by survey respondents to become competent AI users. C. Major areas of support required for successful application of AI tools in research projects according to survey respondents.

Training needs

All participants expressed interest in AI-based training programs and identified key knowledge gaps. The most pressing need was learning how to apply AI models in research (80%), followed by actually training AI models (78%) and developing foundational programming skills (71%) (Figure 2B). When asked about the types of support that would help them incorporate AI into their work, the most frequently selected options were workshops and further training (97%). However, other resources, such as access to free software (76%), networking opportunities (72%), easy-to-use AI tools (64%), and conference participation (60%), were also widely chosen, indicating a broad demand for technical education and professional development opportunities (Figure 2C).

Respondents were asked to describe their ideal workshop/training through a few multiple choice questions and several open-ended questions. The open-ended questions confirmed the multiple-choice results: a six-month structured training program was preferred (39%), whereas one-day events were deprioritized. In the absence of longer-term programs, one-week and one-month-long opportunities also elicited the interest of survey respondents (Figure 3A). A clear outcome from the survey is that financial support is required to attend training (95%), mainly in the form of travel allowance (57%), but as well as to cover computational needs and salary support (Figure 3B). Finally, over 50% of the participants would prefer the training to be hybrid (online and in-person).

Figure 3. Extended information on best practices for AI skills development courses. A. Preferred length of training. B. Usage of financial support during the training.

A complete visualisation of aggregated survey results can be found herearrow-up-right.

Interview findings

Demographics

Following the survey, six one-on-one interviews were conducted with researchers from African research institutions. The participants represented three distinct user personas, each with two representatives, reflecting diverse research backgrounds and AI expertise (see Methodology for more information). The group comprised two female and four male participants, with geographic representation from South Africa, Cameroon, Kenya, and The Gambia. Interviewees included both early-career and experienced researchers specializing in biomedical and computational sciences.

AI experts

From the perspective of researchers actively involved in the development and/or implementation of AI tools in their research groups/communities, the level of AI expertise in Africa continent remains critically low. There is an urgent need for foundational AI literacy to ensure that researchers do not treat AI models as black boxes and can clearly understand their applications, thereby reducing the risk of misuse. Based on their experience, the most effective way to start building AI literacy is by focusing on the application of existing, user-friendly AI-based tools—such as ChatGPT, StarDrop, or the Ersilia Model Hub—rather than attempting to develop new AI models from scratch. This approach is likely to deliver faster, more meaningful results and help accelerate scientific discovery across the continent.Indeed, both interviewees acknowledged the lack of familiarity with basic computational elements such as open source operating systems (Linux) or command-line interfaces (instead of graphical user interfaces), coupled with a need for deeper core programming expertise among the scientific community, would slow progress in AI development on the continent.

Training is essential but not sufficient to bridge this gap. Basic introductory skills development courses, which typically span from two to seven days, should be reinforced with follow up sessions as well as online support in between those sessions. A preference for hybrid learning formats was evident. Chris L. noted, "Ideally, participants should have an initial online session to understand the basics, followed by face-to-face sessions to practically apply these concepts." This preference aligns closely with survey findings favoring immersive training programs over short workshops. To ensure long-term skills retention, support networks and peer-to-peer learning are a great add-on to the capacity building activities, and special emphasis was put on by both interviewees on the availability of mentors or experts as reference figures to solve doubts.

Finally, several additional considerations can enhance training success in addition to these foundational aspects. First, setting pre-course requirements ensures all participants start from a similar baseline, accelerating the overall learning process. This can be easily achieved by pointing participants to entry-level online self-learning courses. Second, trainers need to understand the context where students will be continuing their research and tailor approaches accordingly. For example, using open source tools instead of license-based tools that might be only accessible during the training, or providing hard-drives where data can be stored instead of relying on an internet connection.

New AI Adopters

Both interviewees shared a similar profile: early-career researchers in malaria and tuberculosis biology and drug discovery who had undergone training and completed short research-based training with AI-oriented organisations like Ersilia. This experience provided them a unique perspective on what strategies currently work and what gaps still remain in AI-related training and skills development opportunities for scientists actively engaged in their projects. In both cases, they emphasised that using AI tools accelerated their research, enabling the formulation of scientific hypotheses that led to discoveries that would not have been possible without such tools.

Reflecting the survey outcomes, interviewees highlighted the significant benefits of learning to implement and interpret results from pre-built AI tools rather than delving into AI model building. Moreover, just providing recommendations about alternative free and open source software (compared to licensed tools) during training sessions already made a difference in their day-to-day work. In their own words, - "We really don’t know all the alternatives [to expensive software], and discovering what is out there is difficult."​ (Fatou F) - The major barriers identified echo what survey respondents and AI Experts shared on limited computing power, unstable internet access and lack of mentorship support, requiring systemic solutions from funders and research organisations supported by government policies. Finally, short-term training was highlighted as a good starting point but inadequate without follow up or extended sessions. This prevented them from fully implementing the learnings acquired during training. Additionally, beyond mastering the course content, in-person, longer training was crucial for early career researchers to build professional networks.

Supportive AI Adopter

This third user persona aimed to capture the perspectives of a principal investigator - research leaders who guide scientific rationale and direct their teams towards the right tools, experiments or focus areas. Their primary interest is enhancing AI adoption across their portfolio of projects.

Participants acknowledged a widespread perception of low AI literacy among researchers in Africa, particularly among biologists. Many researchers lack fundamental data science training, basic programming skills, and familiarity with essential computational tools, limiting their ability to utilize AI effectively. As one participant stated, most biological researchers locally are "not exposed [...] to the basic tools needed to employ AI," highlighting significant foundational skills and knowledge gaps.

Participants strongly favored existing, user-friendly AI applications requiring minimal technical expertise. They supported intuitive interfaces, allowing users to input specific data requests without specialized training. For example, Justin K. illustrated this preference by describing an ideal scenario: "You have an interface where you could just have your [chemical] structures entered and then request [...] the potential activity, and the AI gives you all those without you having to know exactly how to maneuver the basics." Consistent with earlier personas, participants emphasized the need for mentorship to facilitate sustainable AI adoption and implementation.

Additionally, participants recognized AI's broad utility in accelerating research across multiple areas beyond drug discovery, including quality assurance and environmental health monitoring. Elizabeth K noted, "Generally, it would be very beneficial in saving time, saving resources, and being precise in terms of whatever it is that one wants to achieve from AI." AI was seen as instrumental in enhancing productivity, accuracy, and timely outcomes across diverse research activities, such as evaluating herbal quality and detecting contaminants like pesticides.

Infrastructure emerged as another critical requirement. Reliable access to computing infrastructure, including hardware, servers, and sustainable power solutions like solar energy, remained essential. Participants identified the lack of financial resources to support infrastructure and training as persistent challenges requiring coordinated solutions.

Conclusions

We set out to complete a stakeholder needs assessment for AI capacity building programs, targeting scientists working in low-resourced settings in Africa. The goal was to identify needs to inform the development of a flagship training program uniquely tailored to address the real and practical needs of local infectious disease researchers on the continent. The assessment was carried out in two phases; first, through a comprehensive survey widely distributed across Ersilia’s network and partner organisations, followed by in-depth interviews of representative user personas key to the successful implementation of the training programs.

As described in the introduction, AI is becoming an integral part of scientific research, and the insights from our surveys and interviews underscore AI's significant potential to enhance research productivity across various scientific disciplines in Africa. Nevertheless, AI literacy among biologists, chemists, and pharmacists on the continent remains low particularly in areas such as AI usage and interpretation and in more advanced computational and foundational AI skills (programming, statistics, and mathematics).

To truly realise the potential of AI in underfunded and under-resourced research projects, we must design structured programs that cover the foundational training gaps and overcome other identified barriers. The major issues that such a program should address to empower researchers to fully leverage AI tools, ultimately advancing scientific research and innovation in the region, include:

  1. The use of AI tools that require low or standard computational capacity, as access to clusters and high-performance computing (HPC) remains limited.

  2. The design of tools that are robust in front of internet outages (for example tools that can run locally entirely) and electricity outages (for example pipelines that can be set to run in remote systems to prevent disruptions).

  3. The showcase of open source alternatives to paying software to ensure continued and independent accessibility.

  4. The offering of organised, in-depth follow-up to introductory courses.

  5. The building of open communities where peers can find support and learn from each other.

In conclusion, there is an urgent and growing need for more comprehensive and effective programs in AI applied to drug discovery and infectious disease research. In response to this need, Ersilia, as a non-profit organization committed to supporting sustainable research practices across the Global South, is developing a tailored year-long training strategy. Conceived as an ‘Incubator’, this program will be designed to run consecutively with the support of local and regional partners in Africa. The Ersilia AI Incubator aims to create a collaborative environment where scientists can combine their domain expertise with AI, explore bold ideas, and apply these tools to real-world research challenges. Beyond individual skill-building, the program will foster lasting networks, strengthen regional collaborations, and help cultivate a vibrant AI community that can continue to grow and accelerate scientific innovation across the continent.

Methodology

Data collection methods

Data were collected through a mixed-methods approach combining surveys and semi-structured interviews to ensure a comprehensive understanding of AI training needs among African researchers.

Survey

An online survey was distributed openly via the following channels: Ersilia’s Monthly Newsletter, Ersilia’s LinkedIn and X (formerly Twitter) accounts, and Slack workspaces from Ersilia and Open Life Sciences. In addition, re-sharing via mission-aligned organisations like the H3D Foundation helped increase the reach. Previous course participants or course applicants who had given their consent were also reached via email. The survey questions are added as Annex I.

Interviews

Six interviewees were selected to represent three user personas, ensuring diverse perspectives on AI training needs. Each persona included two candidates:

  • AI-Experts, an experienced researcher leading AI capacity-strengthening initiatives and seeking ways to enhance AI adoption in their research communities.

  • New AI Adopters, a researcher with foundational AI knowledge from previous Ersilia training, who aims to advance their AI skills for biomedical research.

  • Supportive AI Adopter, a research leader with limited AI expertise, is interested in foundational training, user-friendly tools, and mentorship for AI integration within their teams.

Interviews were conducted via Google Meet in a semi-structured format, allowing flexibility while maintaining consistency across key discussion topics. Questions were tailored to each participant’s persona, guided by initial survey results. Discussions explored AI experiences, challenges, skill gaps, preferred training formats, and mentorship needs. Interviews were recorded with consent and fully transcribed.

Data analysis

Survey

Survey responses were initially collected via Google Forms and then via an Airtable form. All results were consolidated into a single .csv file and analysed with custom-made python scripts. For user-ease, an Airtable Interface has been created to display all single-choice or multiple-choice results and can be found herearrow-up-right.

Interview

Interviews were analyzed using Braun and Clarke’s thematic analysis approach to identify key themes across participant responses [13arrow-up-right]:

  • Familiarization. Transcripts were reviewed multiple times to identify recurring insights, such as frequent mentions of foundational knowledge gaps.

  • Initial Coding. Relevant segments were systematically marked, highlighting statements related to practical AI skills, data accessibility issues, and resource constraints.

  • Theme Development. Codes were grouped into broader categories, including training needs, accessibility of AI tools, and mentorship support.

  • Interpretation. Themes were contextualized with representative participant quotes to reflect experiences and challenges accurately.

Data protection

Survey results have been aggregated and anonymised. Participants were asked for consent to publish the results. Interviewees provided explicit consent for their responses to be used, including direct quotes. Interviews were recorded with permission and fully transcribed.

Acknowledgements

This work has been possible thanks to a Roddenberry Foundation Catalyst Grant awarded to the Fundació Ersilia Open Source Initiative. We’d also like to thank our interviewees Jason Hlozek, Justin Komguep, Mariscal Brice Tchatat, Elizabeth Kigondu, Chris Lennard and Fatou Faal for their earnest responses, as well as all participant surveys who shared their insight with us.

Authors

The survey and interview design, data collection and analysis, as well as report writing, have been performed by Dr. Gemma Turon (employee of the Fundació Ersilia Open Source Initiative) and Ms. Alacia Armstrong (Wonder&Co).

References

1. WHO. Global Observatory on Health R&D. https://www.who.int/observatories/global-observatory-on-health-research-and-development.arrow-up-right (2022).

2. UN. World Population Prospects 2024. (United Nations, New York, NY, 2025).

3. Pasquini, L., van Aardenne, L., Godsmark, C. N., Lee, J. & Jack, C. Emerging climate change-related public health challenges in Africa: A case study of the heat-health vulnerability of informal settlement residents in Dar es Salaam, Tanzania. Sci. Total Environ. 747, 141355 (2020).

4. Nana, M., Coetzer, K. & Vogel, C. Facing the heat: initial probing of the City of Johannesburg’s heat-health planning. S. Afr. Geogr. J. 101, 253–268 (2019).

5. Barrett, P., Hansen, N.-J., Natal, J.-M. & Noureldin, D. Why basic science matters for economic growth. International Monetary Fund (2021).

6. Policy Cures Research. The Impact of Global Health R&D. Preprint at https://impact.impactglobalhealth.org/research-investment (2024).

7. IMF. AI Preparedness Index. International Monetary Fund https://www.imf.org/external/datamapper/AI_PI@AIPI/ADVEC/EME/LICarrow-up-right (2024).

8. Lambert, W. M., Camacho-Rivera, M., Boutin-Foster, C., Salifu, M. & Riley, W. J. Ending ‘domestic helicopter research’. Cell 187, 1823–1827 (2024).

9. Haelewaters, D., Hofmann, T. A. & Romero-Olivares, A. L. Ten simple rules for Global North researchers to stop perpetuating helicopter research in the Global South. PLoS Comput. Biol. 17, e1009277 (2021).

10. Adame, F. Meaningful collaborations can end ‘helicopter research’. Nature (2021) doi:10.1038/d41586-021-01795-1arrow-up-right.

11. Turon, G., Njoroge, M., Mulubwa, M., Duran-Frigola, M. & Chibale, K. AI can help to tailor drugs for Africa — but Africans should lead the way. Nature 265–267 (2024).

12. UIS. UNESCO Institute for Statistics: Researchers (HC) % Female. UNESCO Institute for Statistics https://databrowser.uis.unesco.orgarrow-up-right (2025).

13. Braun, V., Clarke, V., Hayfield, N. & Terry, G. Thematic analysis. in Advanced Research Methods for Applied Psychology 238–248 (Routledge, London, 2024).

Last updated

Was this helpful?