Browsing by Author "Uhlmann EL"
Now showing 1 - 3 of 3
Results Per Page
Sort Options
- ItemCan large language models help predict results from a complex behavioural science study?(The Royal Society, 2024-09) Lippert S; Dreber A; Johannesson M; Tierney W; Cyrus-Lai W; Uhlmann EL; Emotion Expression Collaboration; Pfeiffer TWe tested whether large language models (LLMs) can help predict results from a complex behavioural science experiment. In study 1, we investigated the performance of the widely used LLMs GPT-3.5 and GPT-4 in forecasting the empirical findings of a large-scale experimental study of emotions, gender, and social perceptions. We found that GPT-4, but not GPT-3.5, matched the performance of a cohort of 119 human experts, with correlations of 0.89 (GPT-4), 0.07 (GPT-3.5) and 0.87 (human experts) between aggregated forecasts and realized effect sizes. In study 2, providing participants from a university subject pool the opportunity to query a GPT-4 powered chatbot significantly increased the accuracy of their forecasts. Results indicate promise for artificial intelligence (AI) to help anticipate-at scale and minimal cost-which claims about human behaviour will find empirical support and which ones will not. Our discussion focuses on avenues for human-AI collaboration in science.
- ItemExamining the generalizability of research findings from archival data(PNAS, 2022-07-26) Delios A; Clemente EG; Wu T; Tan H; Wang Y; Gordon M; Viganola D; Chen Z; Dreber A; Johannesson M; Pfeiffer T; Generalizability Tests Forecasting Collaboration; Uhlmann ELThis initiative examined systematically the extent to which a large set of archival research findings generalizes across contexts. We repeated the key analyses for 29 original strategic management effects in the same context (direct reproduction) as well as in 52 novel time periods and geographies; 45% of the reproductions returned results matching the original reports together with 55% of tests in different spans of years and 40% of tests in novel geographies. Some original findings were associated with multiple new tests. Reproducibility was the best predictor of generalizability—for the findings that proved directly reproducible, 84% emerged in other available time periods and 57% emerged in other geographies. Overall, only limited empirical evidence emerged for context sensitivity. In a forecasting survey, independent scientists were able to anticipate which effects would find support in tests in new samples.
- ItemOn the trajectory of discrimination: A meta-analysis and forecasting survey capturing 44 years of field experiments on gender and hiring decisions(Elsevier Inc, 2023-11) Schaerer M; Plessis CD; Nguyen MHB; van Aert RCM; Tiokhin L; Lakens D; Clemente EG; Pfeiffer T; Dreber A; Johannesson M; Clark CJ; Uhlmann EL; Abraham AT; Adamus M; Akinci C; Alberti F; Alsharawy AM; Alzahawi S; Anseel F; Arndt F; Balkan B; Baskin E; Bearden CE; Benotsch EG; Bernritter S; Black SR; Bleidorn W; Boysen AP; Brienza JP; Brown M; Brown SEV; Brown JW; Buckley J; Buttliere B; Byrd N; Cígler H; Capitan T; Cherubini P; Chong SY; Ciftci EE; Conrad CD; Conway P; Costa E; Cox JA; Cox DJ; Cruz F; Dawson IGJ; Demiral EE; Derrick JL; Doshi S; Dunleavy DJ; Durham JD; Elbaek CT; Ellis DA; Ert E; Espinoza MP; Füllbrunn SC; Fath S; Furrer R; Fiala L; Fillon AA; Forsgren M; Fytraki AT; Galarza FB; Gandhi L; Garrison SM; Geraldes D; Ghasemi O; Gjoneska B; Gothilander J; Grühn D; Grieder M; Hafenbrädl S; Halkias G; Hancock R; Hantula DA; Harton HC; Hoffmann CP; Holzmeister F; Hoŕak F; Hosch A-K; Imada H; Ioannidis K; Jaeger B; Janas M; Janik B; Pratap KC R; Keel PK; Keeley JW; Keller L; Kenrick DT; Kiely KM; Knutsson M; Kovacheva A; Kovera MB; Krivoshchekov V; Krumrei-Mancuso EJ; Kulibert D; Lacko D; Lemay EPA preregistered meta-analysis, including 244 effect sizes from 85 field audits and 361,645 individual job applications, tested for gender bias in hiring practices in female-stereotypical and gender-balanced as well as male-stereotypical jobs from 1976 to 2020. A “red team” of independent experts was recruited to increase the rigor and robustness of our meta-analytic approach. A forecasting survey further examined whether laypeople (n = 499 nationally representative adults) and scientists (n = 312) could predict the results. Forecasters correctly anticipated reductions in discrimination against female candidates over time. However, both scientists and laypeople overestimated the continuation of bias against female candidates. Instead, selection bias in favor of male over female candidates was eliminated and, if anything, slightly reversed in sign starting in 2009 for mixed-gender and male-stereotypical jobs in our sample. Forecasters further failed to anticipate that discrimination against male candidates for stereotypically female jobs would remain stable across the decades.