elocation-id: e3892
This study aimed to evaluate the performance of four durum wheat cultivars Odysseo, Saragola, Irid and Maestrale using two machine learning techniques: classification and regression trees and random trees. Classification tree and regression analysis showed that mean annual temperature is the dominant factor influencing yield in all cultivars. For the Saragola, Irid and Maestrale cultivars, yield increased significantly when the mean annual temperature exceeded 17.25 °C, particularly when emergence density was optimal. In contrast, the Odysseo cultivar showed sensitivity to both average annual temperature and seeds per spike, with higher yields associated with an average annual temperature above 17.25 °C and seeds per spike above 33.6. The random tree analysis confirmed the importance of average annual temperature and emergence density, highlighting their strong predictive power. The models provided greater robustness and generalizability by reducing prediction variance, making them reliable tools for yield prediction. These findings highlight cultivar-specific responses to agroclimatic conditions, with Odysseo influenced by both mean annual temperature and seeds per ear, while Saragola, Irid and Maestrale demonstrate a critical interaction between mean annual temperature and emergence density. Integrating random tree models improves prediction accuracy and provides valuable information for developing precision agriculture strategies tailored to environmental conditions.
Triticum durum, decision tree analysis, machine learning, precision agriculture.
Wheat (Triticum durum) is a staple crop of global importance, with its production significantly influenced by agroclimatic factors (Martínez-Moreno et al., 2022). Understanding the relationship between environmental conditions and yield is essential for improving productivity and ensuring food security, especially in the face of climate variability (Shewry et al., 2015). Key factors such as annual average temperature (AAT), precipitation, plant density, and seed characteristics play a crucial role in determining wheat yield (Kang et al., 2020).
Traditional statistical methods, such as linear regression and generalized linear models, have been widely used to predict crop yields. However, these approaches often fall short in capturing complex, nonlinear relationships between multiple variables (Sharma et al., 2021). Recent advances in machine learning (ML) provide more robust and adaptable models for analyzing such interactions. Decision tree-based models, including Classification and Regression Trees (C RT) and Random Forests (RF), are particularly well-suited for agricultural applications due to their ability to handle nonlinear relationships and rank variable importance (Breiman, 2001; Sarker et al., 2020).
Despite the increasing use of ML models in agriculture, limited studies have focused on the comparative performance of C RT and RF for predicting wheat yield across multiple varieties. This study aims to address this gap by evaluating the predictive accuracy of these models for four wheat varieties, identifying the most influential agroclimatic factors, and establishing decision rules for yield optimization.
Four durum wheat cultivars (Odysseo, Saragola, Irid and Maestrale) were selected for this study based on their agronomic performance and adaptability. These cultivars are commercially recognized for their high yield potential, grain quality and stress tolerance (De Vita et al., 2007; Kabbaj et al., 2017). Field experiments were conducted during the 2020 growing season across three different agroclimatic zones in Algeria: Annaba (Annaba), Coastal region with a humid Mediterranean climate; Ouled Rahmoune (Constantine), Semi-arid region with moderate rainfall; Oued Zenati (Guelma), Dry region with limited water availability.
Each experimental site covered an area of 2 500 m2, and the trials were conducted using a randomized complete block design (RCBD) with three replications per cultivar. A seeding rate of 200 kg ha-1 was employed to achieve adequate plant density, promoting uniform emergence and crop establishment. Basal fertilization was carried out using monoammonium phosphate (MAP) applied at a rate of 150 kg ha-1 to provide essential nutrients for early growth. Additionally, crop protection measures included the application of fungicidal treatments such as Celest Xtra and Amistar Xtra, along with Acil, to safeguard the wheat plants against potential diseases and enhance crop performance.
Agroclimatic and agronomic data were collected throughout the growing season, including annual average temperature (AAT) (°C), altitude, annual total precipitation (ATP) (mm), seeds per spike (count), emergence density (plants m-2), spike m-2 (count), tiller per plant (count), thousand-kernel weight TKW (g), and practical wheat yield (q ha-1), used as the target variable. Meteorological data were obtained from the National Meteorological Office (Algeria), while agronomic parameters were measured following standardized field and laboratory procedures (Blum, 2011; Joia et al., 2025).
Two machine-learning approaches were applied using IBM SPSS Modeler 18.0 to predict wheat yield: Classification and regression trees (C RT), a decision tree-based model that partitions data into homogeneous subsets based on the most significant variables (Breiman et al., 1984); and Random trees regression (RT), an ensemble learning method that enhances predictive accuracy by averaging multiple decision trees (Liaw and Wiener, 2002). Model performance was assessed using root square error (RMSE), relative error (RE) and explained variance (EV) (Chlingaryan et al., 2018).
Feature importance was evaluated using Gini impurity (C RT) and permutation importance (RT). The generated decision trees were analyzed for each wheat cultivar to identify key thresholds influencing wheat yield (Hastie et al., 2009).
Results highlight significant variability in the agronomic performance of the four durum wheat cultivars across three distinct localities (Table 1). This variability is primarily attributed to environmental factors, particularly climatic conditions and agronomic practices, which are known to influence growth, yield and phenotypic traits of wheat cultivars (Kabbaj et al., 2017; Royo et al., 2020).
In Annaba, practical yields were highest for Saragola (54 ±5.66 q ha-1), which is consistent with findings from previous studies indicating that this cultivar exhibits good adaptation to moderate conditions, particularly when temperature and soil moisture are adequate (Cséplő et al., 2024). The TKW values for Odysseo (49.5 ±0.71 g) and IRID (47.5 ±0.71 g) suggest good grain filling potential, which is a desirable trait for yield improvement (Maccaferri et al., 2011).
The Ouled Rahmoune locality demonstrated increased tillering and spike density across all cultivars, with Maestrale achieving the highest emergence density (292 ±0 plants m-2) and tillering rate (4.6 ±0 tillers per plant). This phenomenon can be attributed to favorable soil conditions that likely promoted tiller formation and spike emergence, as supported by Kabbaj et al. (2017), who reported that improved soil fertility enhances tiller production and consequently increases yield. However, practical yields were lower compared to Annaba, with Odysseo and Saragola recording the lowest yields (28 ±2.4 q ha-1 and 27.9 ±3.39 q ha-1, respectively). This suggests that yield potential may not solely depend on spike density but also on grain filling efficiency, which may have been compromised by suboptimal climatic conditions during the grain-filling period (Royo et al., 2020).
Oued Znati exhibited the highest overall productivity, particularly for the Saragola cultivar, which achieved a practical yield of 41.5 ±4.67 q ha-1 with a TKW of 51 ±0 g. This locality also demonstrated superior tillering ability and spike density for all cultivars, with Odysseo reaching 703 ±0 spikes m-2 and 6 ±0 tillers per plant. Moreover, the high TKW values observed in this locality are indicative of favorable conditions for grain filling, a critical determinant of yield (Kabbaj et al., 2017).
The random trees regression model exhibited a strong predictive capability for wheat yield estimation, with an explained variance of 70.4%, suggesting that the selected agroclimatic and agronomic variables account for a substantial proportion of yield variability. The root mean square error (RMSE) was 7.395, indicating a moderate level of deviation between predicted and observed values. Furthermore, the relative error of 0.296 suggests a fairly reliable model performance (Table 2).
These results demonstrate the robustness of machine learning techniques in agricultural yield prediction, aligning with previous studies highlighting the effectiveness of decision tree-based models for predicting crop responses to environmental factors (Chlingaryan et al., 2018; López-Granados et al., 2020).
The C RT analysis revealed that the AAT was the dominant variable influencing yield for the Saragola, Irıd, and Maestrale cultivars, with emergence density also playing a significant role. In contrast, for the Odysseo cultivar, yield was mainly influenced by AAT and the number of seeds per spike.
For Odysseo, the C RT decision tree identified AAT as the primary determinant of yield variation. When AAT ≤16.15 °C, the average yield was 28 q ha-1, representing a significant reduction due to suboptimal temperature conditions. For an AAT between 16.15 °C and 17.25 °C, yield increased to 34 q ha-1, showing a positive impact of higher temperatures on grain development. When the AAT exceeded 17.25 °C, yield reached 52.5 q ha-1, if the seeds per spike exceeded 33.6. These findings suggest that Odysseo cultivar responds favourably to warmer temperatures, with yield improving as AAT increases above 17.25 °C. The critical role of seed density further highlights the importance of optimizing spike fertility under varying temperature regimes (Figure 1).
For Saragola, yield was highly sensitive to AAT and emergence density. When AAT was below 16.15 °C, yield dropped to 38 q ha-1, indicating a negative impact of lower temperatures on grain filling. When AAT exceeded 16.15 °C and emergence density was optimal, yield increased to 51 q ha-1, demonstrating the combined effect of temperature and agronomic management on productivity. These results highlight that Saragola cultivar is less tolerant to low temperatures, requiring warmer conditions for optimal yield expression. This aligns with previous reports on durum wheat varieties that show reduced grain development under cooler climates (Ferrise et al., 2019) (Figure 2).
The Irid cultivar decision tree model identified AAT and emergence density as the key yield determinants. When AAT was below 17.25 °C, yield remained low, suggesting that Irid cultivar requires higher temperatures for grain development. When AAT exceeded 17.25 °C, yield increased significantly, particularly when the plant density was high. This behaviour indicates that Irid cultivar benefits from higher temperatures, but plant density also plays a crucial role in achieving high productivity. This finding is consistent with studies emphasizing the role of grain weight as a primary yield component in wheat (Lobell et al., 2017) (Figure 3).
The C RT analysis for Maestrale cultivar indicated a strong dependency on AAT and emergence density. Yield remained low when AAT was below 17.25 °C, likely due to poor grain filling conditions. When AAT exceeded 17.25 °C, yield improved significantly, provided that emergence density was optimal. These results suggest that Maestrale requires both warm temperatures and adequate emergence density for optimal productivity. The interplay between temperature and plant density is well documented in wheat physiology, where poor emergence density can exacerbate the negative effects of suboptimal temperatures (Trnka et al., 2021) (Figure 4).
Saragola and Irid cultivars are highly dependent on AAT and emergence density, with yield improving significantly, when AAT exceeds 16.15 °C and 17.25 °C, respectively, and emergence density is optimal. Maestrale cultivar exhibits similar behaviour to Irid, with yield enhancement linked to AAT exceeding 17.25 °C and favourable emergence density.
Odysseo cultivar demonstrated greater resilience to temperature fluctuations, particularly benefiting from higher temperatures. However, its productivity is strongly dependent on high emergence density, indicating the importance of dense and uniform sowing, especially in warmer regions.
Saragola, Irid, and Maestrale cultivars showed increased sensitivity to both temperature and emergence density, implying that these varieties require more precise seed rate calibration and adapted sowing schedules under changing climatic conditions to avoid yield penalties. Practical recommendations for improving wheat crop management include: 1) tailoring sowing density by cultivar and expected temperature regime: adopt higher seed rates for Odysseo in warm zones and fine-tune densities for other cultivars based on predictive emergence models; 2) integrating real-time agroclimatic data to adjust management practices, particularly in terms of sowing date and field preparation; and 3) employing site-specific management zones using decision rules derived from the models to optimize inputs (fertilizer, irrigation) where they will have the greatest effect on yield.
The comparison between C RT and RF models revealed their complementary strengths in predicting wheat yield and developing actionable decision rules. RF provided robust and generalizable insights due to its ensemble nature, making it a valuable tool for data-driven agronomic decision-making. Given the promising results, artificial intelligence (AI) tools, especially those based on machine learning and ensemble learning algorithms, offer significant potential for refining yield predictions and supporting adaptive agronomic decisions. AI-driven systems can dynamically integrate multi-source data (satellite, sensor, weather forecasts) to provide real-time, site-specific recommendations, fostering the transition toward precision agriculture and climate-resilient wheat production.
This research was funded by the General Directorate of Scientific Research and Technological Development (DGRSDT, for its French acronym)-Algeria and CRAPast, Algeria’. The authors express their gratitude to the Directorate General for Scientific Research and Technological Development (DGRSDT, for its acronym in French-Algeria) and to CRAPAST, Algeria.
Cséplő, M.; Puskás, K.; Vida, G.; Mészáros, K.; Uhrin, A.; Tóth, V.; Ambrózy, Z.; Grausgruber, H.; Bonfiglioli, L.; Pagnotta, M. A.; Urbanavičiūtė, I.; Mikó, P. and Bányai, J. 2024. Performance of a durum wheat diversity panel under different management systems. Cereal Research Communications. 52(1):489-502.
Kabbaj, H.; Sall, A. T.; Al-Abdallat, A.; Geleta, M.; Amri, A.; Filali-Maltouf, A.; Belkadi, B.; Ortiz, R. and Bassi, F. M. 2017. Genetic diversity within a global panel of durum wheat (Triticum durum) landraces and modern germplasm reveals the history of alleles exchange. Frontiers in Plant Science. 8(1):1-13.
Maccaferri, M.; Sanguineti, M. C.; Demontis, A.; El-Ahmed, A.; Garcia-Moral, L.; Maalouf, F.; Nachit, M.; Nserallah, N.; Ouabbou, H.; Rhouma, S.; Royo, C.; Villegas, D. and Tuberosa, R. 2011. Association mapping in durum wheat grown across a broad range of water regimes. Journal of Experimental Botany. 62(2):409-438.