Most genomic selection regression models use linear models that assume continuous and normally distributed phenotypes. Disease resistance, such as stripe rust resistance, is commonly expressed in ordinal scales and percentages. Disease severity (SEV) and infection type (IT) generally do not follow these assumptions and have skewed distributions due to high levels of resistance.
When faced with data that does not follow this assumption, researchers have four options. They may either ignore the lack of normality, transform the phenotypes, use generalized linear mixed models (GLMM), or use supervised learning algorithms and classification models with no restriction on the distribution of response variables.
To compare the four options for genomic selection for both SEV and IT, we used 452 diversity panel lines from 2013 to 2015 and 41,856 SNPs. We used rrBLUP as the regression model and compared unadjusted, square-root transformed, and adjusted data using a generalized linear mixed model with a poisson distribution. The unadjusted data had the highest mean accuracy for both SEV and IT regression. The model used for classification was a support vector machine with a radial kernel, and we compared three different scales for classification purposes, 0-9 classes, a reduced 0-2 classes, and binary 0-1 classes. The reduced and binary class models resulted in high accuracy (0.62 and 0.72).
Overall, there is no significant difference between traits, or model types on average. However, by using a reduced scale or binary classification system, breeders can accurately decide whether to keep or discard lines for disease resistance. This study showed the validity of using genomic selection for selecting lines with high stripe rust resistance.