Abstract

When designing evidence-based policies and programs, decision-makers must distill key information from a vast and rapidly growing literature base. Identifying relevant literature from raw search results is time and resource intensive, and is often done by manual screening. In this study, we develop an artificial intelligence (AI) agent based on a bidirectional encoder representations from transformers (BERT) model and incorporate it into a human team designing an evidence synthesis product for global development. We explore the effectiveness of the human–AI hybrid team in accelerating the evidence synthesis process. To further improve team efficiency, we enhance the human–AI hybrid team through active learning (AL). Specifically, we explore different sampling strategies, including random sampling, least confidence (LC) sampling, and highest priority (HP) sampling, to study their influence on the collaborative screening process. Results show that incorporating the BERT-based AI agent into the human team can reduce the human screening effort, i.e., the number of documents that humans need to screen, by 68.5% compared to the case of no AI assistance and by 16.8% compared to the industry-standard case of using a frequency-based language model and support vector machine-based classifier for identifying 80% of all relevant documents. When we apply the HP sampling strategy, the human screening effort can be reduced even more: by 78.3% for identifying 80% of all relevant documents compared to no AI assistance. We apply the AL-enhanced human–AI hybrid teaming workflow in the design process of three evidence gap maps for U.S. Agency for International Development and find it to be highly effective. These findings demonstrate how AI can accelerate the development of evidence synthesis products and promote timely evidence-based decision-making in global development.

References

1.
USAID
,
2016
,
Strengthening Evidence-Based Development
,
USAID
,
Washington, DC
.
2.
115th Congress, U.S.
,
2018
, “
Foundations for Evidence-Based Policymaking Act of 2018
.”
3.
Kraybill
,
D.
, and
Mercier
,
S.
,
2019
, “
How the United States Benefits From Agricultural and Food Security Investments in Developing Countries
,” https://2017-2020.usaid.gov/sites/default/files/documents/1867/BIFAD_US_Benefit_Study.pdf
4.
White
,
K.
,
2019
, “
Publications Output: U.S. Trends and International Comparisons
,” https://ncses.nsf.gov/pubs/nsb20206/, Accessed January 14, 2022.
5.
Donnelly
,
C. A.
,
Boyd
,
I.
,
Campbell
,
P.
,
Craig
,
C.
,
Vallance
,
P.
,
Walport
,
M.
,
Whitty
,
C. J. M.
,
Woods
,
E.
, and
Wormald
,
C.
,
2018
, “
Four Principles to Make Evidence Synthesis More Useful for Policy
,”
Nature
,
558
(
7710
), pp.
361
364
.
6.
Snilstveit
,
B.
,
Vojtkova
,
M.
,
Bhavsar
,
A.
,
Stevenson
,
J.
, and
Gaarder
,
M.
,
2016
, “
Evidence & Gap Maps: A Tool for Promoting Evidence Informed Policy and Strategic Research Agendas
,”
J. Clin. Epidemiol.
,
79
, pp.
120
129
.
7.
3ie
,
2021
, “
Evidence Gap Maps
,” https://www.3ieimpact.org/evidence-hub/evidence-gap-maps, Accessed April 8, 2021.
8.
NLPIE
,
2019
, “
BioMedICUS
,” https://nlpie.github.io/biomedicus/
9.
Bommarito
,
M. J.
,
Katz
,
D. M.
, and
Detterman
,
E. M.
,
2018
, “
LexNLP: Natural Language Processing and Information Extraction for Legal and Regulatory Texts
,” URL 1806.03688.
10.
Porciello
,
J.
,
Ivanina
,
M.
,
Islam
,
M.
,
Einarson
,
S.
, and
Hirsh
,
H.
,
2020
, “
Accelerating Evidence-Informed Decision-Making for the Sustainable Development Goals Using Machine Learning
,”
Nat. Mach. Intell.
,
2
(
10
), pp.
559
565
.
11.
Engelbert
,
M.
,
Ravat
,
Z.
,
Quant
,
K.
,
Respekta
,
M.
,
Kastel
,
F.
,
Huang
,
C.
,
Frey
,
D.
, et al
,
2023
,
Agriculture-Led Growth in Low- and Middle-Income Countries: An Evidence Gap Map
,
International Initiative for Impact Evaluation (3ie)
,
Washington, DC
.
12.
Lane
,
C.
,
Storhaug
,
I.
,
Tree
,
V.
,
Cordova-Arauz
,
D.
,
Huang
,
C.
,
Frey
,
D.
,
Ahmed
,
F.
, et al
,
2023
,
Addressing the Systemic Causes of Malnutrition: The Nutrition-Sensitive Agriculture Evidence Gap Map
,
International Initiative for Impact Evaluation (3ie)
,
Washington, DC
.
13.
Berretta
,
M.
,
Lee
,
S.
,
Kupfer
,
M.
,
Huang
,
C.
,
Ridlehoover
,
W.
,
Frey
,
D.
,
Ahmed
,
F.
, et al
,
2023
,
Strengthening Resilience Against Shocks, Stressors and Recurring Crises in Low- and Middle-Income Countries: An Evidence Gap Map
,
International Initiative for Impact Evaluation (3ie)
,
Washington, DC
.
14.
Briner
,
R. B.
, and
Denyer
,
D.
,
2012
, “Systematic Review and Evidence Synthesis as a Practice and Scholarship Tool,”
The Oxford Handbook of Evidence-based Managementrch
,
D.
Rousseau
, ed.,
University Press
,
New York
, pp.
112
129
.
15.
Blaizot
,
A.
,
Veettil
,
S. K.
,
Saidoung
,
P.
,
Moreno-Garcia
,
C. F.
,
Wiratunga
,
N.
,
Aceves-Martins
,
M.
,
Lai
,
N. M.
, and
Chaiyakunapruk
,
N.
,
2022
, “
Using Artificial Intelligence Methods for Systematic Review in Health Sciences: A Systematic Review
,”
Res. Synth. Methods
,
13
(
3
), pp.
353
362
.
16.
Altena
,
A.
,
Spijker
,
R.
, and
Olabarriaga
,
S.
,
2018
, “
Usage of Automation Tools in Systematic Reviews
,”
Res. Synth. Methods
,
10
(
1
), pp.
72
82
.
17.
3ie
,
2023
, “
Evidence Mapping
,” https://www.3ieimpact.org/evidence-hub/evidence-gap-maps, Accessed April 8, 2021.
18.
O’Mara-Eves
,
A.
,
Thomas
,
J.
,
McNaught
,
J.
,
Miwa
,
M.
, and
Ananiadou
,
S.
,
2015
, “
Using Text Mining for Study Identification in Systematic Reviews: A Systematic Review of Current Approaches
,”
Syst. Rev.
,
4
(
1
), p.
1
.
19.
White
,
H.
,
Albers
,
B.
,
Gaarder
,
M.
,
Kornør
,
H.
,
Littell
,
J.
,
Marshall
,
Z.
,
Mathew
,
C.
, et al
,
2020
, “
Guidance for Producing a Campbell Evidence and Gap Map
,”
Campbell Syst. Rev.
,
16
(
4
), p.
e1125
.
20.
Oliver
,
K.
,
Innvar
,
S.
,
Lorenc
,
T.
,
Woodman
,
J.
, and
Thomas
,
J.
,
2014
, “
A Systematic Review of Barriers to and Facilitators of the Use of Evidence by Policymakers
,”
BMC Health Serv. Res.
,
14
(
1
), p.
2
.
21.
Shannon
,
C. E.
,
1948
, “
A Mathematical Theory of Communication
,”
Bell Syst. Tech. J.
,
27
(
4
), pp.
623
656
.
22.
Jurafsky
,
D.
, and
Martin
,
J. H.
,
2020
,
Speech and Language Processing
, 3rd ed.,
Pearson
,
Upper Saddle River, N J
.
23.
Bird
,
S.
,
Klein
,
E.
, and
Loper
,
E.
,
2009
,
Natural Language Processing With Python
,
O’Reilly Media
,
Sebastopol, CA
.
24.
Manning
,
C. D.
, and
Schütze
,
H.
,
1999
,
Foundations of Statistical Natural Language Processing
,
MIT Press
,
Cambridge, MA
.
25.
Vaswani
,
A.
,
Shazeer
,
N.
,
Parmar
,
N.
,
Uszkoreit
,
J.
,
Jones
,
L.
,
Gomez
,
A. N.
,
Kaiser
,
L.
, and
Polosukhin
,
I.
,
2017
, “
Attention Is All You Need
,” URL 1706.03762.
26.
Mikolov
,
T.
,
Sutskever
,
I.
,
Chen
,
K.
,
Corrado
,
G.
, and
Dean
,
J.
,
2013
, “
Distributed Representations of Words and Phrases and Their Compositionality
,” URL 1310.4546.
27.
Mikolov
,
T.
,
Chen
,
K.
,
Corrado
,
G.
, and
Dean
,
J.
,
2013
, “
Efficient Estimation of Word Representations in Vector Space
,” URL 1301.3781.
28.
Pennington
,
J.
,
Socher
,
R.
, and
Manning
,
C. D.
,
2014
, “
Glove: Global Vectors for Word Representation
,”
EMNLP
,
Doha, Qatar
,
Oct. 25–29
.
29.
Devlin
,
J.
,
Chang
,
M.-W.
,
Lee
,
K.
, and
Toutanova
,
K.
,
2019
, “
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
,” URL 1810.04805.
30.
Liu
,
Y.
,
Ott
,
M.
,
Goyal
,
N.
,
Du
,
J.
,
Joshi
,
M.
,
Chen
,
D.
,
Levy
,
O.
,
Lewis
,
M.
,
Zettlemoyer
,
L.
, and
Stoyanov
,
V.
,
2023
, “
RoBERTa: A Robustly Optimized BERT Pretraining Approach
,” URL 1907.11692.
31.
Radford
,
A.
,
Wu
,
J.
,
Child
,
R.
,
Luan
,
D.
,
Amodei
,
D.
, and
Sutskever
,
I.
,
2019
, “
Language Models Are Unsupervised Multitask Learners
,” OpenAI Blog.
32.
Ouzzani
,
M.
,
Hammady
,
H.
,
Fedorowicz
,
Z.
, and
Elmagarmid
,
A.
,
2016
, “
Rayyan—A Web and Mobile App for Systematic Reviews
,”
Syst. Rev.
,
5
(
210
).
33.
Marshall
,
I. J.
,
Kuiper
,
J.
,
Banner
,
E.
, and
Wallace
,
B. C.
,
2017
, “
Automating Biomedical Evidence Synthesis: RobotReviewer
,”
Annual Meeting of the Association for Computational Linguistics
,
Vancouver, Canada
,
July 30–Aug. 4
.
34.
Thomas
,
J.
,
McDonald
,
S.
,
Noel-Storr
,
A.
,
Shemilt
,
I.
,
Elliott
,
J.
,
Mavergames
,
C.
, and
Marshall
,
I. J.
,
2021
, “
Machine Learning Reduced Workload With Minimal Risk of Missing Studies: Development and Evaluation of a Randomized Controlled Trial Classifier for Cochrane Reviews
,”
J. Clin. Epidemiol.
,
133
, pp.
140
151
.
35.
Peters
,
M. E.
,
Neumann
,
M.
,
Iyyer
,
M.
,
Gardner
,
M.
,
Clark
,
C.
,
Lee
,
K.
, and
Zettlemoyer
,
L.
,
2018
, “
Deep Contextualized Word Representations
,” URL 1802.05365.
36.
Shen
,
Y.
, and
Liu
,
J.
,
2021
, “
Comparison of Text Sentiment Analysis Based on Bert and Word2vec
,”
2021 IEEE 3rd International Conference on Frontiers Technology of Information and Computer (ICFTIC)
,
Virtual
,
Nov. 12–14
.
37.
Kale
,
A. S.
,
Pandya
,
V.
,
Troia
,
F. D.
, and
Stamp
,
M.
,
2023
, “
Malware Classification With Word2Vec, HMM2Vec, BERT, and ELMo
,”
J. Comput. Virol. Hacking Tech.
,
19
(
1
), pp.
1
16
.
38.
Aum
,
S.
, and
Choe
,
S.
,
2021
, “
srBERT: Automatic Article Classification Model for Systematic Review Using BERT
,”
Syst. Rev.
,
10
(
285
).
39.
Ein-Dor
,
L.
,
Halfon
,
A.
,
Gera
,
A.
,
Shnarch
,
E.
,
Dankin
,
L.
,
Choshen
,
L.
,
Danilevsky
,
M.
,
Aharonov
,
R.
,
Katz
,
Y.
, and
Slonim
,
N.
,
2020
, “
Active Learning for BERT: An Empirical Study
,”
EMNLP
,
Virtual
,
Nov. 16–20
.
40.
Aggarwal
,
U.
,
Popescu
,
A.
, and
Hudelot
,
C.
,
2021
, “
Minority Class Oriented Active Learning for Imbalanced Datasets
,”
25th International Conference on Pattern Recognition
, Milan, Italy, Jan. 10–15, pp.
9920
9927
.
41.
Gal
,
Y.
,
Islam
,
R.
, and
Ghahramani
,
Z.
,
2017
, “
Deep Bayesian Active Learning With Image Data
,”
International Conference on Machine Learning
,
Sydney, Australia
,
Aug. 6–11
.
42.
Tur
,
G.
,
Hakkani-Tür
,
D.
, and
Schapire
,
R. E.
,
2005
, “
Combining Active and Semi-supervised Learning for Spoken Language Understanding
,”
Speech Commun.
,
45
(
2
), pp.
171
186
.
43.
Chung
,
M.-H.
,
Chignell
,
M.
,
Wang
,
L.
,
Jovicic
,
A.
, and
Raman
,
A.
,
2020
, “
Interactive Machine Learning for Data Exfiltration Detection: Active Learning With Human Expertise
,”
Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics (SMC)
,
Toronto, ON, Canada
,
Oct. 11–14
.
44.
Olsson
,
F.
,
2009
, “
A Literature Survey of Active Machine Learning in the Context of Natural Language Processing
,” Technical Report No. Swedish Institute of Computer Science.
45.
Angluin
,
D.
,
1998
, “
Queries and Concept Learning
,”
Mach. Learn.
,
2
(
4
), pp.
319
342
.
46.
Atlas
,
L.
,
Cohn
,
D.
, and
Ladner
,
R.
,
1989
, “Training Connectionist Networks with Queries and Selective Sampling,”
Advances in Neural Information Processing Systems
, Vol.
2
,
D.
Touretzky
, ed.,
Morgan-Kaufmann
,
Cambridge, MA
, pp.
566
573
.
47.
Lewis
,
D. D.
, and
Gale
,
W. A.
,
1994
, “
A Sequential Algorithm for Training Text Classifiers
,”
Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval
,
Dublin, Ireland
,
July 3–6
.
48.
Settles
,
B.
,
2009
, “
Active Learning Literature Survey
,” Computer Sciences Technical Report 1648, University of Wisconsin-Madison.
49.
Culotta
,
A.
, and
McCallum
,
A.
,
2005
, “
Reducing Labeling Effort for Structured Prediction Tasks
,”
AAAI
,
2
, pp.
746
751
.
50.
Scheffer
,
T.
,
Decomain
,
C.
, and
Wrobel
,
S.
,
2001
, “
Active Hidden Markov Models for Information Extraction
,”
Advances in Intelligent Data Analysis
,
Cascais, Portugal
,
Sept. 13–15
.
51.
3ie
,
2021
, “
Development Evidence Portal
,” https://developmentevidence.3ieimpact.org/, Accessed April 8, 2021.
52.
Hastie
,
T.
,
Tibshirani
,
R.
, and
Friedman
,
J.
,
2009
,
The Elements of Statistical Learning: Data Mining, Inference, and Prediction
, 2nd ed.,
Springer
,
New York
.
53.
Gholamy
,
A.
,
Kreinovich
,
V.
, and
Kosheleva
,
O.
,
2021
, “
Why 70/30 or 80/20 Relation Between Training and Testing Sets: A Pedagogical Explanation
,” https://api.semanticscholar.org/CorpusID:7467506
54.
Haddaway
,
N. R.
, and
Westgate
,
M. J.
,
2019
, “
Predicting the Time Needed for Environmental Systematic Reviews and Systematic Maps
,”
Conserv. Biol.
,
33
(
2
), pp.
434
443
.
55.
Chalmers
,
I.
,
Bracken
,
M. B.
,
Djulbegovic
,
B.
,
Garattini
,
S.
,
Grant
,
J.
,
Gülmezoglu
,
A. M.
,
Howells
,
D. W.
,
Ioannidis
,
J. P. A.
, and
Oliver
,
S.
,
2014
, “
How to Increase Value and Reduce Waste When Research Priorities Are Set
,”
Lancet
,
383
(
9912
), pp.
156
165
.
56.
OpenAI
,
T. B.
,
2022
, “
ChatGPT: Optimizing Language Models for Dialogue
,” OpenAI.
57.
Taylor
,
R.
,
Kardas
,
M.
,
Cucurull
,
G.
,
Scialom
,
T.
,
Hartshorn
,
A.
,
Saravia
,
E.
,
Poulton
,
A.
,
Kerkez
,
V.
, and
Stojnic
,
R.
,
2022
,
Galactica: A Large Language Model for Science
.
You do not currently have access to this content.