nep-big New Economics Papers
on Big Data
Issue of 2025–03–03
twenty papers chosen by
Tom Coupé, University of Canterbury


  1. An End-To-End LLM Enhanced Trading System By Ziyao Zhou; Ronitt Mehra
  2. Examining the Impact of Income Inequality and Gender on School Completion in Malaysia: A Machine Learning Approach Utilizing Malaysia's Public Sector Open Data By Muhammad Sukri Bin Ramli
  3. A Framework for Multimodal Document Intelligence and Fraud Prevention: Leveraging AI and Machine Learning-Enabled Device for Enhanced Decision-Making (Powered by DeepSeek-R1 and AI Agents) By Kamlakshya, Tikhnadhi; Hota, Ashish
  4. Making Time Count : A Machine Learning Approach to Predict Time Use in Low-Income Countries from Physical Activity Tracking Data By Mulder, Joris; Hocuk, Seyit; Kilic, Talip; Zezza, Alberto; Kumar, Pradeep
  5. Estimating Network Models using Neural Networks By Angelo Mele
  6. Promises and pitfalls of using LLMs to identify actor stances in political discourse By Walker, Viviane; Angst, Mario
  7. Who Rallies Round the Flag? The Impact of the US Sanctions on Iranians’ Attitude Toward the Government By RezaeeDaryakenari, Babak
  8. Using Large Language Models for Financial Advice By Christian Fieberg; Lars Hornuf; Maximilian Meiler; David J. Streich
  9. Can AI Solve the Peer Review Crisis? A Large Scale Experiment on LLM's Performance and Biases in Evaluating Economics Papers By Pat Pataranutaporn; Nattavudh Powdthavee; Pattie Maes
  10. Identification of an Expanded Inventory of Green Job Titles through AI-Driven Text Mining By Palinski, Michal; Asik, Günes; Gajderowicz, Tomasz; Jakubowski, Maciej; Efsan Nas Ozen; Dhushyanth Raju
  11. Yielding Insights : Machine Learning-Driven Imputations to Filling Agricultural Data Gaps By Ismael Yacoubou Djima; Marco Tiberti; Talip Kilic
  12. Data mining and NLP for Processing Social Offers of a National Aid Organization By Senst, Benjamin
  13. How the General Benefit Reform Emerged in Finland: A Critical Analysis Using Large Language Models in Policy Research By Moisio, Pasi; Mesiäislehto, Merita; Peltoniemi, Johanna; Pihlajamäki, Mika; Hiilamo, Heikki
  14. How Well Did Real-Time Indicators Track Household Welfare Changes in Developing Countries during the COVID-19 Crisis? By David Newhouse; Swindle, Rachel; Wang, Shun; Joshua David Merfeld; Utz Johann Pape; Kibrom Tafere; Michael Weber
  15. Evaluation of Door-to-Door Tax Enforcement Strategy in Indonesia By Antonacci, Paulo; Muhammad Khudadad Chattha
  16. AlphaSharpe: LLM-Driven Discovery of Robust Risk-Adjusted Metrics By Kamer Ali Yuksel; Hassan Sawaf
  17. COVID-19 Vaccine Diplomacy: A Computational Multimodal Analysis of the Neighborhood Effect in Bangladesh's Vaccine Roll-out Response By Juha, Sharmin Jahan; Mizan, Arefin
  18. Authorship Identity and Spatiality: Social Influences on Text Production By Alvero, AJ; antonio, anthony lising; Luqueño, Leslie; Pearman, Francis
  19. Twitter-based attention and the cross-section of cryptocurrency returns By Maître, Arnaud T.; Pugachyov, Nikolay; Weigert, Florian
  20. Afghanistan’s New Economic Landscape : Using Nighttime Lights to Understand the Civilian Economy after 2021 By Ivo Teruggi; Oscar Eduardo Barriga Cabanillas; Walker Kosmidou-Bradley; Silvia Redaelli; Eigo Tateishi

  1. By: Ziyao Zhou; Ronitt Mehra
    Abstract: This project introduces an end-to-end trading system that leverages Large Language Models (LLMs) for real-time market sentiment analysis. By synthesizing data from financial news and social media, the system integrates sentiment-driven insights with technical indicators to generate actionable trading signals. FinGPT serves as the primary model for sentiment analysis, ensuring domain-specific accuracy, while Kubernetes is used for scalable and efficient deployment.
    Date: 2025–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2502.01574
  2. By: Muhammad Sukri Bin Ramli
    Abstract: This study examines the relationship between income inequality, gender, and school completion rates in Malaysia using machine learning techniques. The dataset utilized is from the Malaysia's Public Sector Open Data Portal, covering the period 2016-2022. The analysis employs various machine learning techniques, including K-means clustering, ARIMA modeling, Random Forest regression, and Prophet for time series forecasting. These models are used to identify patterns, trends, and anomalies in the data, and to predict future school completion rates. Key findings reveal significant disparities in school completion rates across states, genders, and income levels. The analysis also identifies clusters of states with similar completion rates, suggesting potential regional factors influencing educational outcomes. Furthermore, time series forecasting models accurately predict future completion rates, highlighting the importance of ongoing monitoring and intervention strategies. The study concludes with recommendations for policymakers and educators to address the observed disparities and improve school completion rates in Malaysia. These recommendations include targeted interventions for specific states and demographic groups, investment in early childhood education, and addressing the impact of income inequality on educational opportunities. The findings of this study contribute to the understanding of the factors influencing school completion in Malaysia and provide valuable insights for policymakers and educators to develop effective strategies to improve educational outcomes.
    Date: 2025–01
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2501.18868
  3. By: Kamlakshya, Tikhnadhi (Citizens Bank); Hota, Ashish
    Abstract: This paper introduces a novel framework for multimodal document intelligence, designed to enhance fraud prevention across various sectors. The core innovation lies in the integration of advanced AI and ML techniques, including OCR, deep learning, and NLP, within a purpose-built computer device for multimodal data fusion, as detailed in the author's recently granted patent by www.gov.uk/ [Intellectual Property# 6419907]. This device facilitates the seamless integration of textual, visual, and metadata elements extracted from documents, enabling a holistic understanding of the document's veracity and intent. The escalating sophistication of fraudulent activities across industries necessitates advanced, adaptive security measures. This paper presents a novel framework for multimodal document intelligence, designed to enhance fraud prevention in sectors such as banking and finance, life science and healthcare, government, and the public sector. Grounded in a recently patented AI and ML-enabled computer device for multimodal data fusion, the framework leverages Optical Character Recognition (OCR), deep learning-based image analysis, and natural language processing (NLP). Furthermore, it integrates the capabilities of DeepSeek-R1, a high-performance Mixture-of-Experts (MoE) large language model (LLM), and autonomous AI Agents for advanced reasoning, contextual understanding, and decision-making. This integrated approach facilitates proactive fraud detection, improved risk assessment, and strengthened compliance adherence, while also achieving unprecedented cost-effectiveness in deployment and operation. The efficacy of the framework is demonstrated through illustrative use cases, highlighting its potential to mitigate financial losses and uphold data integrity. Keywords: Salesforce, Salesforce Financial Cloud, RAG, Data Completeness, Finance, Sales, Campaign, Digital Engagement, Customer Data Platform (CDP), Data Cloud, DeepSeek-R1, Optical Character Recognition (OCR), deep learning-based image analysis, and natural language processing (NLP)
    Date: 2025–02–11
    URL: https://d.repec.org/n?u=RePEc:osf:osfxxx:g5hw7_v1
  4. By: Mulder, Joris; Hocuk, Seyit; Kilic, Talip; Zezza, Alberto; Kumar, Pradeep
    Abstract: Understanding men’s and women’s time use is a key factor in addressing issues and formulating policies related to division of labor, domestic work, and related gender disparities. However, obtaining data on individuals’ time use can be difficult and costly in the context of household surveys. Leveraging unique survey data collected in rural Malawi, this study investigates the possibility of predicting men’s and women’s time allocation to an extensive set of activities, using sensor signal data captured by accelerometers. Using machine learning techniques, the study builds a supervised classification model that is trained on the accelerometer data and a random subset of the time use survey data to predict individuals’ time allocation to 12 broad activity groups. The model can correctly classify each performed activity in 76 percent of the cases. The analysis shows that with 40 percent of the training data, this method can achieve 90 percent of the maximum level of predictive accuracy reached in the analysis. The findings prove the feasibility of this methodology and offer insights for enhancing both survey and accelerometer data collection processes to build better models. Using the method can improve the quality of costly and difficult to obtain time use surveys with cheaper, yet accurate, modeled estimates, obtained by combining objective data from wearable devices with time use data collected on smaller samples.
    Date: 2024–06–28
    URL: https://d.repec.org/n?u=RePEc:wbk:wbrwps:19735
  5. By: Angelo Mele
    Abstract: Exponential random graph models (ERGMs) are very flexible for modeling network formation but pose difficult estimation challenges due to their intractable normalizing constant. Existing methods, such as MCMC-MLE, rely on sequential simulation at every optimization step. We propose a neural network approach that trains on a single, large set of parameter-simulation pairs to learn the mapping from parameters to average network statistics. Once trained, this map can be inverted, yielding a fast and parallelizable estimation method. The procedure also accommodates extra network statistics to mitigate model misspecification. Some simple illustrative examples show that the method performs well in practice.
    Date: 2025–02
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2502.01810
  6. By: Walker, Viviane; Angst, Mario
    Abstract: Empirical research in the social sciences is often interested in understanding actor stances; the positions that social actors take regarding normative statements in societal discourse. In automated text analysis applications, the classification task of stance detection remains challenging. Stance detection is especially difficult due to semantic challenges such as implicitness or missing context but also due to the general nature of the task. In this paper, we explore the potential of Large Language Models (LLMs) to enable stance detection in a generalized (non-domain, non-statement specific) form. Specifically, we test a variety of different general prompt chains for zero-shot stance classifications. Our evaluation data consists of textual data from a real-world empirical research project in the domain of sustainable urban transport. For 1710 German newspaper paragraphs, each containing an organizational entity, we annotated the stance of the entity toward one of five normative statements. A comparison of four publicly available LLMs show that they can improve upon existing approaches and achieve adequate performance. However, results heavily depend on the prompt chain method, LLM, and vary by statement. Our findings have implications for computational linguistics methodology and political discourse analysis, as they offer a deeper understanding of the strengths and weaknesses of LLMs in performing the complex semantic task of stance detection. We strongly emphasise the necessity of domain-specific evaluation data for evaluating LLMs and considering trade-offs between model complexity and performance.
    Date: 2025–02–03
    URL: https://d.repec.org/n?u=RePEc:osf:socarx:5a3k8_v1
  7. By: RezaeeDaryakenari, Babak (Leiden University)
    Abstract: While politicians often argue that economic sanctions can induce policy changes in targeted states by undermining elite and public support for the reigning government, the efficacy of these measures, particularly against non-democratic regimes, is debatable. We propose that, counterintuitively, economic sanctions can bolster rather than diminish support for the sanctioned government, even in non-democratic contexts. However, this support shift and its magnitude can differ across various political factions and depend on the nature of the sanctions. To empirically evaluate our theoretical expectations, we use supervised machine learning to scrutinize nearly 2 million tweets from over 1, 000 Iranian influencers, assessing their responses to both comprehensive and targeted sanctions during Donald Trump’s presidency. Our analysis shows that comprehensive sanctions generally improved sentiments toward the Iranian government, even among its moderate oppositions, rendering them more aligned with the state's stance. Conversely, while targeted sanctions elicited a milder rally-around-the-flag response, the identity of the targeted entity plays a crucial role in determining the scale of this reaction.
    Date: 2024–08–29
    URL: https://d.repec.org/n?u=RePEc:osf:socarx:r7ae4_v1
  8. By: Christian Fieberg; Lars Hornuf; Maximilian Meiler; David J. Streich
    Abstract: We study whether large language models (LLMs) can generate suitable financial advice and which LLM features are associated with higher-quality advice. To this end, we elicit portfolio recommendations from 32 LLMs for 64 investor profiles, which differ in their risk preferences, home country, sustainability preferences, gender, and investment experience. Our results suggest that LLMs are generally capable of generating suitable financial advice that takes into account important investor characteristics when determining market and risk exposures. The historical performance of the recommended portfolios is on par with that of professionally managed benchmark portfolios. We also find that foundation models and larger models generate portfolios that are easier to implement and more sensitive to investor characteristics than fine-tuned models and smaller models. Some of our results are consistent with LLMs inheriting human biases such as home bias. We find no evidence of gender-based discrimination, which can be found in human financial advice.
    Keywords: generative AI, artificial intelligence, large language models, financial advice portfolio management
    JEL: G00 G11 G40
    Date: 2025
    URL: https://d.repec.org/n?u=RePEc:ces:ceswps:_11666
  9. By: Pat Pataranutaporn; Nattavudh Powdthavee; Pattie Maes
    Abstract: We investigate whether artificial intelligence can address the peer review crisis in economics by analyzing 27, 090 evaluations of 9, 030 unique submissions using a large language model (LLM). The experiment systematically varies author characteristics (e.g., affiliation, reputation, gender) and publication quality (e.g., top-tier, mid-tier, low-tier, AI generated papers). The results indicate that LLMs effectively distinguish paper quality but exhibit biases favoring prominent institutions, male authors, and renowned economists. Additionally, LLMs struggle to differentiate high-quality AI-generated papers from genuine top-tier submissions. While LLMs offer efficiency gains, their susceptibility to bias necessitates cautious integration and hybrid peer review models to balance equity and accuracy.
    Date: 2025–01
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2502.00070
  10. By: Palinski, Michal; Asik, Günes; Gajderowicz, Tomasz; Jakubowski, Maciej; Efsan Nas Ozen; Dhushyanth Raju
    Abstract: This study expands the inventory of green job titles by incorporating a global perspective and using contemporary sources. It leverages natural language processing, specifically a retrieval-augmented generation model, to identify green job titles. The process began with a search of academic literature published after 2008 using the official APIs of Scopus and Web of Science. The search yielded 1, 067 articles, from which 695 unique potential green job titles were identified. The retrieval-augmented generation model used the advanced text analysis capabilities of Generative Pre-trained Transformer 4, providing a reproducible method to categorize jobs within various green economy sectors. The research clustered these job titles into 25 distinct sectors. This categorization aligns closely with established frameworks, such as the U.S. Department of Labor’s Occupational Information Network, and suggests potential new categories like green human resources. The findings demonstrate the efficacy of advanced natural language processing models in identifying emerging green job roles, contributing significantly to the ongoing discourse on the green economy transition.
    Date: 2024–09–16
    URL: https://d.repec.org/n?u=RePEc:wbk:wbrwps:10908
  11. By: Ismael Yacoubou Djima; Marco Tiberti; Talip Kilic
    Abstract: This paper addresses the challenge of missing crop yield data in large-scale agricultural surveys, where crop-cutting, the most accurate method for yield measurement, is often limited due to cost constraints. Multiple imputation techniques, supported by machine learning models are used to predict missing yield data. This method is validated using survey data from Mali, which includes both crop-cut and self-reported yield information. The analysis covers several crops, providing insights into the importance of different predictors, including farmer-reported yields and geo-spatial variables, and the conditions under which the approach is valid. The findings show that machine learning-based imputations can provide accurate yield estimates, especially for crops with low intercropping rates and higher commercialization. However, survey-to-survey imputations are less accurate than within-survey imputations, suggesting limitations in extrapolating data across different survey rounds. The study contributes valuable insights into improving cost-efficiency in agricultural surveys and the potential of imputation methods.
    Date: 2024–11–04
    URL: https://d.repec.org/n?u=RePEc:wbk:wbrwps:10964
  12. By: Senst, Benjamin
    Abstract: For large organisations with numerous organisational units, it can be challenging to keep track of individual events. In a joint project by Data Science for Social Good Berlin e.V. and the Data Science Hub of the German Red Cross, social services were processed over several phases between summer 2022 and summer 2024 using new technologies such as web scraping, data engineering, and natural language processing, and their implementation in various user applications was tested. More than 600, 000 web documents were collected and more than 30, 000 offers were identified. The results of this automated method were compared with the existing data set. Web scraping and subsequent processing are suitable for at least supplementing the previous approach. Web scraping, NLP, and data engineering offer large organisations the opportunity to effectively gain an overview of local events.
    Date: 2024–09–06
    URL: https://d.repec.org/n?u=RePEc:osf:socarx:3pd4s_v1
  13. By: Moisio, Pasi; Mesiäislehto, Merita; Peltoniemi, Johanna; Pihlajamäki, Mika; Hiilamo, Heikki
    Abstract: Utilizing Large Language Models (LLM), this study investigates the evolution of an innovative social security policy idea, the General Benefit concept into a policy reform proposal in Fin-land from 2007 to 2023. Drawing from the ideational analysis we hypothesize that political parties struggled over social security conditionality during the 2010s and that social security simplification was manipulated differently in relation to conditionality. Our primary data is elec-tion manifestos and governmental programs from 2007-2023. We employed LLMs, mainly a customized ChatGPT, for the text analysis of policy documents. Additionally, we conduct a critical human evaluation of the LLMs analysis and publish our model in the GPT store for the open replication of analyses. Findings indicate that the weakening of the tripartite industrial relations system and the break-ing of “status quo of three big parties” allowed new parties to influence social policy in 2010s. The General Benefit emerged as a response to calls for social security simplification and for countering (unconditional) basic income proposals. Adopted in 2023, the General Benefit concept aims to merge Finnish universal / residence-based social insurance benefits for the working-aged while preserving core principles like social risk categories and conditionality. Despite increased nativism from the rising True Finns party, and the adoption of universal / unconditional basic income by several parties, Finnish social policy trends from 2007 to 2023 continued to emphasize employment and public finance sustainability. Our study also contributes to methodological discussions on using LLMs in policy analysis. The “human evaluation”, performed by the authors, confirms that the LLM analysis accurately summarises the main features of the policy evolution. However, we also found that the LLM lacks ability to recognise the nuances of “multidimensional” political language and is not very helpful in cross-sectional evaluation, which leaves the analysis partly shallow. Thus, we con-clude that in qualitative policy analysis, LLMs in their current form are suitable for comple-menting rather than substituting human evaluation.
    Date: 2024–10–10
    URL: https://d.repec.org/n?u=RePEc:osf:socarx:ab8mr_v1
  14. By: David Newhouse; Swindle, Rachel; Wang, Shun; Joshua David Merfeld; Utz Johann Pape; Kibrom Tafere; Michael Weber
    Abstract: This paper investigates the extent to which real-time indicators derived from internet search, cell phones, and satellites predict changes in household socioeconomic indicators across approximately 300 administrative level-1 regions in 20 countries during the COVID-19 crisis. Measures of changes in socioeconomic status in each region are taken from high-frequency phone surveys. When using the first wave of data, fielded between April and August 2020, models selected using the least absolute shrinkage and selection operator explain 37 percent of the cross-regional variation in the share of households reporting declines in total income and 34 percent of the share of respondents reporting work stoppages since the onset of the crisis. Real-time indicators explain a lower amount of the within-region variation in income losses and current employment over time, with an R2 of 15 percent for current employment and 22 to 26 percent for the prevalence of income declines. When limiting the sample to urban regions, real-time indicators are far more effective at explaining within-region variation in income losses and current employment, with R2 values of approximately 0.54 and 0.38, respectively. Income gains, self-reported food insecurity, social distancing behavior, and child school engagement are more difficult to predict, with R2 values ranging from 0.06 to 0.17. Google search terms related to food, money, jobs, and religion were the most powerful predictors of work stoppage and income declines in the first survey wave, while those related to food, exercise, and religion better tracked changes in income declines and employment over time. Google mobility measures are also strong predictors of changes in employment and the prevalence of specific types of income declines. In general, satellite data on vegetation, pollution, and nighttime lights are far less predictive. Google mobility and search data, and to a lesser extent vegetation and pollution data, can provide a meaningful signal of regional economic distress and recovery, particularly during the early phases of a major crisis such as COVID-19.
    Date: 2024–09–18
    URL: https://d.repec.org/n?u=RePEc:wbk:wbrwps:10916
  15. By: Antonacci, Paulo; Muhammad Khudadad Chattha
    Abstract: This paper presents an evaluation of a tax enforcement program conducted in Indonesia where officials from the tax authority visited properties to engage directly with owners about their property tax obligations. Through these visits, auditors explained outstanding debts and payment processes, aiming to improve tax compliance and revenue collection. The paper uses an administrative data set and a new set of machine learning–based techniques to assess the program’s effectiveness. The program was responsible for increasing tax compliance on the extensive margin by 4.3 percent and on the intensive margin by 5.1 percent in the first year it was implemented. These effects are particularly strong as they persist in the following period. The findings show that the visited properties had better compliance history, lower value, smaller area, and were more likely to have some construction on them. A key finding from the analysis is that higher-value properties are less sensitive to the visits. In other words, if a data-driven tax-enforcement strategy is to be applied, then it may focus resources on enforcing taxation at the poorest part of the population in this case. This opens up the discussion of the distributional consequences of an algorithm-based enforcement strategy, which is increasingly important as machine learning techniques are used by tax authorities.
    Date: 2024–09–06
    URL: https://d.repec.org/n?u=RePEc:wbk:wbrwps:10901
  16. By: Kamer Ali Yuksel; Hassan Sawaf
    Abstract: Financial metrics like the Sharpe ratio are pivotal in evaluating investment performance by balancing risk and return. However, traditional metrics often struggle with robustness and generalization, particularly in dynamic and volatile market conditions. This paper introduces AlphaSharpe, a novel framework leveraging large language models (LLMs) to iteratively evolve and optimize financial metrics. AlphaSharpe generates enhanced risk-return metrics that outperform traditional approaches in robustness and correlation with future performance metrics by employing iterative crossover, mutation, and evaluation. Key contributions of this work include: (1) an innovative use of LLMs for generating and refining financial metrics inspired by domain-specific knowledge, (2) a scoring mechanism to ensure the evolved metrics generalize effectively to unseen data, and (3) an empirical demonstration of 3x predictive power for future risk-return forecasting. Experimental results on a real-world dataset highlight the superiority of AlphaSharpe metrics, making them highly relevant for portfolio managers and financial decision-makers. This framework not only addresses the limitations of existing metrics but also showcases the potential of LLMs in advancing financial analytics, paving the way for informed and robust investment strategies.
    Date: 2025–01
    URL: https://d.repec.org/n?u=RePEc:arx:papers:2502.00029
  17. By: Juha, Sharmin Jahan; Mizan, Arefin
    Abstract: Bangladesh started its COVID-19 mass vaccination program from February 2020. According to the COVAX live (Live COVID-19 Vaccination Tracker), as of May 2022, 70.26% of the total population have been vaccinated. This is indeed an example of Bangladesh governments' astounding competence that Bangladesh is considered among the first few countries to start vaccinations. However, Bangladesh did experience occasional hiccups in steady vaccine roll-out due to disruptions in the supply chain. Experts condemned Bangladesh's diplomatic choice of relying on only India as a steady vaccine manufacturing source after India decided to temporarily halt vaccinations right before the administration of second doses. Bearing reference to the 'Neighborhood Effect' in International Politics which implies that excessive dependency on geographical neighbors can cause similar levels of instability in both the countries (neighbors), this paper examines Bangladesh's overall Diplomatic approach in its COVID-19 Vaccination program with comparison to its East Asian counterpart Mongolia. Mongolia secured high-ranking position in COVID-19 mass vaccination using its strategic partnerships to pool vaccines from multiple sources as a result of its 2011 multi-pillars Foreign Policy (Third Neighborhood Policy) approach. Using a novel computational multimodal discourse analysis using machine learning assisted techniques in two large hand-collected datasets, the paper delves into the practices implemented by Bangladesh's multi-level stakeholders from the early stages of the pandemic until January this year to find any signs of or impacts of the Neighborhood Effect in its Vaccine Diplomacy. The paper later on makes policy-level suggestions on how to resolve this in case of future health crisis with occasional mention and comparison to Mongolia's Third Neighborhood approach and its implacability in Bangladeshi context.
    Date: 2025–01–10
    URL: https://d.repec.org/n?u=RePEc:osf:socarx:eg58k_v1
  18. By: Alvero, AJ; antonio, anthony lising; Luqueño, Leslie; Pearman, Francis
    Abstract: Computational text analysis has grown in popularity among social scientists due to the massive influx of digitized data. However, connecting text to authorship could be a boon for digital demography and expand the scope of computational text analysis from trends of what is being written toward social patterning of the people producing it. We explore this potential through examinations of a large corpus of college admissions essays (n = 254, 820 essays submitted by 83, 538 applicants) and show how personal identity markers and ZIP code-level social context data influence large scale processes of textual production. After generating numerical representations of the essays using computational methods, we model the relationships between different identity and spatial characteristics of applicants and their local communities. We find strong relationships between identity and spatial features with the essays. We also find that individuals whose personal identities are spatially unique--that is, demographically different from others in their immediate content--were most likely to be misclassified, indicating that writing is influenced both socially and spatially. This work clarifies how authorship characteristics shape large scale textual production processes, like college admissions, and complements other large scale analyses of text by focusing on authorship rather than purely textual patterns.
    Date: 2025–01–31
    URL: https://d.repec.org/n?u=RePEc:osf:socarx:pt6b2_v2
  19. By: Maître, Arnaud T.; Pugachyov, Nikolay; Weigert, Florian
    Abstract: This paper investigates how investors' abnormal attention affects the cross-section of cryptocurrency returns in the period from 2018 to 2022. We capture abnormal attention using the (log) number of Twitter posts on individual cryptocurrencies on the current day minus a 30-day average. Our results reveal that abnormal attention is positively associated with contemporaneous and one-day ahead crypto performance. Among the different Twitter tweets, return predictability arises due to Ticker-tweets from investors, but not due to tweets from the cryptocurrency channel. These Official-tweets, however, are able to forecast technological innovations on the blockchain.
    Keywords: Bitcoin, cryptocurrencies, Twitter attention, textual sentiment
    Date: 2025
    URL: https://d.repec.org/n?u=RePEc:zbw:cfrwps:311833
  20. By: Ivo Teruggi; Oscar Eduardo Barriga Cabanillas; Walker Kosmidou-Bradley; Silvia Redaelli; Eigo Tateishi
    Abstract: This study uses nighttime lights to examine the evolution of economic activity in Afghanistan after the August 2021 regime change. A year later, nighttime luminosity had dropped by 20 percent, with two-thirds of this decline tied to the pre-planned international military withdrawal. To focus on local economic activity, the study filters out light emissions from foreign military installations, which accounted for up to 30 percent of lights over the past decade. Using civilian nighttime lights to understand the new economic reality in the country indicates a significant economic recovery concentrated in previously conflict-affected regions. By 2023/24, civilian luminosity had surpassed pre-2020/21 levels by 10.5 percent while, in contrast, official gross domestic product indicates an economy that is one-quarter smaller. The findings highlight changes in economic dynamics, including increased informality, shifts in the geographic distribution of activity, and improved security post-Taliban takeover.
    Date: 2024–11–06
    URL: https://d.repec.org/n?u=RePEc:wbk:wbrwps:10969

This nep-big issue is ©2025 by Tom Coupé. It is provided as is without any express or implied warranty. It may be freely redistributed in whole or in part for any purpose. If distributed in part, please include this notice.
General information on the NEP project can be found at https://nep.repec.org. For comments please write to the director of NEP, Marco Novarese at <director@nep.repec.org>. Put “NEP” in the subject, otherwise your mail may be rejected.
NEP’s infrastructure is sponsored by the School of Economics and Finance of Massey University in New Zealand.