LLM Coding Promps and the Statistical Power of the Welch-Satterthwaite t-test: A Focused Simulation Study

George Johanson

0 evaluations Published on Aug 15, 2025

This article on Sciety

Abstract

A clear and comprehensive prompt for code from a Large Language Model (LLM) is required for successful execution and this very necessity can assist in the need for simulation replications. A carefully crafted and evolved prompt, whether used for code creation or not, supports both study understanding and replication in a concise, readable, and readily applied manner. The prompt for the current Welch-Satterthwaite independent t-test, WS, simulation illustrated the costs and benefits to be had from reporting a developed coding request. The WS is known to be robust against Type I error and is often recommended by default over Student’s t-test. WS is also known to be less statistically powerful than Student’s t-test. Two factors that impact the power of the WS are the reliability of the dependent measure and design balance. The theoretical, applied, and didactic value of the relationship of these factors and their possible interaction to WS power was the subject of this study. Results indicated that all three of the predictors were statistically significant and explained virtually all the variation in a linear model with design balance being most important. When the linear model was restricted to more commonly observed values of reliability and design imbalance, the interaction disappeared, and reliability was most important.

Related articles are currently not available for this article.