On the Choice of Linear Regression Algorithms for Biological and Ecological Applications.

Model II regression (i.e. minimizing residuals obliquely) is the adequate alternative to Model I regression by Ordinary Least Squares (i.e. minimizing residuals vertically) given the absence of well-established dependence relationships or x measured with error. Yet, it has no perfect solution. Determining the true slope from errors-in-the-variables models requires the errors in x and y estimated from higher order moments. However, their accurate estimation requires enormous data sets and thus they are not applicable to most ecological problems. The alternative Reduced Major Axis (RMA) is dependent on a strict set of assumptions, hardly met with real data, making it prone to bias, whereas Principal Components Analysis (PCA) becomes less reliable with decreasing correlations while x and y presenting approximate variances. We used artificial data (allowing for the determination of the true slope) to demonstrate when RMA or PCA should be preferred.

Consequently, we propose using PCA whenever r2+s2x/s2y is higher than 1.5. Otherwise, we suggest generating artificial data manipulated to match the structure of the original, and to test which method provides closer estimates to the input true slope. We provide a user-friendly script to perform this task. We tested the use of RMA and PCA with real data about intraspecific and interspecific biomass-density relations in algae and seagrass, algae frond growth, crustacean and bird morphometry, sardine fisheries and social sciences data, commonly finding widely divergent slope estimates leading to severely biased parameter estimations and model applications. Their analyses support the suggested approach for method selection summarized above.