nep-sog New Economics Papers
on Sociology of Economics
Issue of 2025–02–24
eight papers chosen by
Jonas Holmström, Axventure AB


  1. Can AI Solve the Peer Review Crisis? A Large-Scale Experiment on LLM's Performance and Biases in Evaluating Economics Papers By Pataranutaporn, Pat; Powdthavee, Nattavudh; Maes, Pattie
  2. Statistical reporting errors in economics By Bruns, Stephan; Herwartz, Helmut; Ioannidis, John P.A.; Islam, Chris-Gabriel; Raters, Fabian H. C.
  3. The Need for Equivalence Testing in Economics By Fitzgerald, Jack
  4. The Replication Database: Documenting the Replicability of Psychological Science By Röseler, Lukas; Kaiser, Leonard; Doetsch, Christopher Albert; Klett, Noah; Seida, Christian; Schütz, Astrid; Aczel, Balazs; Adelina, Nadia; Agostini, Valeria; Alarie, Samuel
  5. The role of results in deciding to publish By Muradchanian, Jasmine; Hoekstra, Rink; Kiers, Henk; van Ravenzwaaij, Don
  6. A scoping review on metrics to quantify reproducibility: a multitude of questions leads to a multitude of metrics By Heyard, Rachel; Pawel, Samuel; Frese, Joris; Voelkl, Bernhard; Würbel, Hanno; McCann, Sarah; Held, Leonhard; Wever, Kimberley E. PhD; Hartmann, Helena; Townsin, Louise
  7. Likelihood Ratio Test for Publication Bias – a Proof of Concept By Lenartowicz, Paweł
  8. Predicting Replication Rates with Z-Curve: A Brief Exploratory Validation Study Using the Replication Database By Röseler, Lukas

  1. By: Pataranutaporn, Pat (Massachusetts Institute of Technology); Powdthavee, Nattavudh (Nanyang Technological University, Singapore); Maes, Pattie (Massachusetts Institute of Technology)
    Abstract: We investigate whether artificial intelligence can address the peer review crisis in economics by analyzing 27, 090 evaluations of 9, 030 unique submissions using a large language model (LLM). The experiment systematically varies author characteristics (e.g., affiliation, reputation, gender) and publication quality (e.g., top-tier, mid-tier, low-tier, AI-generated papers). The results indicate that LLMs effectively distinguish paper quality but exhibit biases favoring prominent institutions, male authors, and renowned economists. Additionally, LLMs struggle to differentiate high-quality AI-generated papers from genuine top-tier submissions. While LLMs offer efficiency gains, their susceptibility to bias necessitates cautious integration and hybrid peer review models to balance equity and accuracy.
    Keywords: Artificial Intelligence, peer review, large language model (LLM), bias in academia, economics publishing, equity-efficiency trade-off
    JEL: A11 C63 O33 I23
    Date: 2025–01
    URL: https://d.repec.org/n?u=RePEc:iza:izadps:dp17659
  2. By: Bruns, Stephan; Herwartz, Helmut; Ioannidis, John P.A.; Islam, Chris-Gabriel; Raters, Fabian H. C.
    Abstract: We developed a tool that scrapes and interprets statistical values (DORIS) to analyze reporting errors, which occur if the eye-catcher depicting the level of statistical significance is inconsistent with the reported statistical values. Using 578, 132 tests from the top 50 economics journals, we find that 14.88 % of the articles have at least one strong error in the main tests. Our pre-registered analysis suggests that mandatory data and code availability policies reduce the prevalence of strong errors, while suggestive indication of a reversed effect is found for top 5 journals. Integrating DORIS into the review process can help improving article quality.
    Date: 2023–09–06
    URL: https://d.repec.org/n?u=RePEc:osf:metaar:mbx62_v1
  3. By: Fitzgerald, Jack (Vrije Universiteit Amsterdam)
    Abstract: Equivalence testing can provide statistically significant evidence that economic relationships are practically negligible. I demonstrate its necessity in a large-scale reanalysis of estimates defending 135 null claims made in 81 recent articles from top economics journals. 36-63% of estimates defending the average null claim fail lenient equivalence tests. In a prediction platform survey, researchers accurately predict that equivalence testing failure rates will significantly exceed levels which they deem acceptable. Obtaining equivalence testing failure rates that these researchers deem acceptable requires arguing that nearly 75% of published estimates in economics are practically equal to zero. These results imply that Type II error rates are unacceptably high throughout economics, and that many null findings in economics reflect low power rather than truly negligible relationships. I provide economists with guidelines and commands in Stata and R for conducting credible equivalence testing and practical significance testing in future research.
    Date: 2025–02–05
    URL: https://d.repec.org/n?u=RePEc:osf:metaar:d7sqr_v1
  4. By: Röseler, Lukas (University of Münster); Kaiser, Leonard; Doetsch, Christopher Albert; Klett, Noah; Seida, Christian; Schütz, Astrid (University of Bamberg); Aczel, Balazs (Eotvos Lorand University); Adelina, Nadia; Agostini, Valeria; Alarie, Samuel
    Abstract: In psychological science, replicability—repeating a study with a new sample achieving consistent results (Parsons et al., 2022)—is critical for affirming the validity of scientific findings. Despite its importance, replication efforts are few and far between in psychological science with many attempts failing to corroborate past findings. This scarcity, compounded by the difficulty in accessing replication data, jeopardizes the efficient allocation of research resources and impedes scientific advancement. Addressing this crucial gap, we present the Replication Database (https://metaanalyses.shinyapps.io/replicationdatabase/), a novel platform hosting 1, 239 original findings paired with replication findings. The infrastructure of this database allows researchers to submit, access, and engage with replication findings. The database makes replications visible, easily findable via a graphical user interface, and tracks replication rates across various factors, such as publication year or journal. This will facilitate future efforts to evaluate the robustness of psychological research.
    Date: 2024–04–10
    URL: https://d.repec.org/n?u=RePEc:osf:metaar:me2ub_v1
  5. By: Muradchanian, Jasmine; Hoekstra, Rink; Kiers, Henk; van Ravenzwaaij, Don (University of Groningen)
    Abstract: Background Publishing study results in scientific journals has been the standard way of disseminating science. However, getting results published may depend on their statistical significance. The consequence of this is that the representation of scientific knowledge might be biased. This type of bias has been called publication bias. The main objective of the present study is to get more insight into publication bias by examining it at the author, reviewer, and editor level. Additionally, we make a direct comparison between publication bias induced by authors, by reviewers, and by editors. We approached our participants by e-mail, asking them to fill out an online survey. Results Our findings suggest that statistically significant findings have a higher likelihood to be published than statistically non-significant findings, because (1) authors (n = 65) are more likely to write up and submit articles with significant results compared to articles with non-significant results (median effect size 1.10, BF10 = 1.09*10^7); (2) reviewers (n = 60) give more favourable reviews to articles with significant results compared to articles with non-significant results (median effect size 0.58, BF10 = 4.73*10^2); and (3) editors (n = 171) are more likely to accept for publication articles with significant results compared to articles with non-significant results (median effect size, 0.94, BF10 = 7.63*10^7). Evidence on differences in the relative contributions to publication bias by authors, reviewers, and editors is ambiguous (editors vs reviewers: BF10 = 0.31, reviewers vs authors: BF10 = 3.11, and editors vs authors: BF10 = 0.42). Discussion One of the main limitations was that rather than investigating publication bias directly, we studied potential for publication bias. Another limitation was the low response rate to the survey.
    Date: 2023–03–13
    URL: https://d.repec.org/n?u=RePEc:osf:metaar:dgshk_v1
  6. By: Heyard, Rachel; Pawel, Samuel (University of Zurich); Frese, Joris; Voelkl, Bernhard; Würbel, Hanno (University of Bern); McCann, Sarah; Held, Leonhard; Wever, Kimberley E. PhD (Radboud university medical center); Hartmann, Helena (University Hospital Essen); Townsin, Louise
    Abstract: *Background:* Reproducibility is recognized as essential to scientific progress and integrity. Replication studies and large-scale replication projects, aiming to quantify different aspects of reproducibility, have become more common. Since no standardized approach to measuring reproducibility exists, a diverse set of metrics has emerged and a comprehensive overview is needed. *Methods:* We conducted a scoping review to identify large-scale replication projects that used metrics and methodological papers that proposed or discussed metrics. The project list was compiled by the authors. For the methodological papers, we searched Scopus, MedLine, PsycINFO andEconLit. Records were screened in duplicate against predefined inclusion criteria. Demographic information on included records and information on reproducibility metrics used, suggested or discussed was extracted. *Results:* We identified 49 large-scale projects and 97 methodological papers, and extracted 50 metrics. The metrics were characterized based on type (formulas and/or statistical models, frameworks, graphical representations, studies and questionnaires, algorithms), input required, and appropriate application scenarios. Each metric addresses a distinct question. *Conclusions:* Our review provides a comprehensive resource in the form of a “live”, interactive table for future replication teams and meta-researchers, offering support in how to select the most appropriate metrics that are aligned with research questions and project goals.
    Date: 2024–11–26
    URL: https://d.repec.org/n?u=RePEc:osf:metaar:apdxk_v1
  7. By: Lenartowicz, Paweł
    Abstract: Publication bias poses a serious challenge to the integrity of scientific research and meta-analyses. There exist persistent methodological obstacles for estimating this bias, especially with heterogeneous dataset, where studies vary widely in methodologies and effect sizes. To address this gap, I propose a Likelihood Ratio Test for Publication Bias, a statistical method designed to detect and quantify publication bias in datasets of heterogeneous studies results. I also show the proof-of-concept implementation developed in Python and simulations that evaluate the performance. The results demonstrate that this new method clearly outperforms existing methods like Z-Curve 2 and the Caliper test in estimating the magnitude of publication bias, showing higher precision and reliability. While inherent challenges in publication bias detection remain, such as the influence of different research practices and the need for large sample sizes, the Likelihood Ratio Test offers a significant advancement in addressing these issues.
    Date: 2024–11–12
    URL: https://d.repec.org/n?u=RePEc:osf:metaar:jt5zf_v1
  8. By: Röseler, Lukas (University of Münster)
    Abstract: Concerns of replicability are widespread in the social sciences. As it is not feasible to replicate every published study, researchers have been developing methods to estimate replicability. I used the Replication Database (Röseler et al., 2023) to compare actual replication rates with replicability estimates provided by z-curve. After drawing stratified samples with actual replication rates that were uniformly distributed, z-curve’s replicability estimates had lower variance but correlated strongly with actual replicability rates, r = .933. Using a linear model, predicted replication rates deviated from actual replication rates by <±16% when samples from 322 studies (2.5 and 97.5% quantiles) were drawn. I propose that z-curve is a valid and economic method to compare replicability estimates for large sets of studies. Future studies of moderators in the context of z-curve or replicability should be tested using replication databases. The study’s code and data are available online (https://osf.io/k4d6w/).
    Date: 2023–10–12
    URL: https://d.repec.org/n?u=RePEc:osf:metaar:ewb2t_v1

This nep-sog issue is ©2025 by Jonas Holmström. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.