Alonso-Barrera, Lara-Viveros, and Reyes-Rosas: Estimation of Podosphaera xanthii in cucumber: machine learning techniques with digital images

Journal Metadata

Journal Identifier: remexca [journal-id-type=publisher-id]

Journal Title Group

Journal Title (Full): Revista mexicana de ciencias agrícolas

Abbreviated Journal Title: Rev. Mex. Cienc. Agríc [abbrev-type=publisher]

ISSN: 2007-0934 [pub-type=ppub]

Publisher

Publisher’s Name: Instituto Nacional de Investigaciones Forestales, Agrícolas y Pecuarias

Article Metadata

Article Identifier: 10.29312/remexca.v16i30.4039 [pub-id-type=doi]

Article Identifier: 00002 [pub-id-type=other]

Article Grouping Data

Subject Group [subj-group-type=heading]

Subject Grouping Name: Articles

Title Group

Article Title: Estimation of Podosphaera xanthii in cucumber: machine learning techniques with digital images

Contributor Group

Contributor [contrib-type=author]

Name of Person [name-style=western]

Surname: Alonso-Barrera

Given (First) Names: Berenice

X (cross) Reference [ref-type=aff; rid=aff1]

Superscript: 1

Contributor [contrib-type=author]

Name of Person [name-style=western]

Surname: Lara-Viveros

Given (First) Names: Francisco Marcelo

X (cross) Reference [ref-type=aff; rid=aff1]

Superscript: 1

X (cross) Reference [ref-type=corresp; rid=c1]

Superscript: §

Contributor [contrib-type=author]

Name of Person [name-style=western]

Surname: Reyes-Rosas

Given (First) Names: Audberto

X (cross) Reference [ref-type=aff; rid=aff1]

Superscript: 1

Affiliation [id=aff1]

Label (of an Equation, Figure, Reference, etc.): 1

Institution Name: in an Address: Centro de Investigación en Química Aplicada. Blvd. Enrique Reyna Hermosillo núm. 140, Saltillo, Coahuila, México. CP. 25294. Tel. 844 4389830. (berenice.alonso.ps@ciqa.edu.mx; audberto.reyes@ciqa.edu.mx). [content-type=original]

Institution Name: in an Address: Centro de Investigación en Química Aplicada [content-type=orgname]

Address Line

City: Saltillo

State or Province: Coahuila

Postal Code: 25294

Country: in an Address: Mexico [country=MX]

Email Address: berenice.alonso.ps@ciqa.edu.mx

Email Address: audberto.reyes@ciqa.edu.mx

Author Note Group

Correspondence Information: [^§] Autor para correspondencia: francisco.lara@ciqa.edu.mx. [id=c1]

Publication Date [date-type=pub; publication-format=electronic]

Day: 15

Month: 10

Year: 2025

Publication Date [date-type=collection; publication-format=electronic]

Season: Sep-Oct

Year: 2025

Volume Number: 16

Issue Number: esp30

Electronic Location Identifier: e4039

History: Document History

Date [date-type=received]

Day: 00

Month: 02

Year: 2025

Date [date-type=accepted]

Day: 00

Month: 08

Year: 2025

Permissions

License Information [license-type=open-access; xlink:href=https://creativecommons.org/licenses/by-nc/4.0/; xml:lang=es]

Este es un artículo publicado en acceso abierto bajo una licencia Creative Commons

Abstract

Title: Abstract

Phytopathogenic fungi pose a considerable threat to cucurbit crops, so early detection and accurate quantification of diseases are essential to reduce production losses. In this study, a methodology was developed to quantitatively estimate the damage caused by Podosphaera xanthii in cucumber leaves, using digital images and machine learning techniques. Convolutional neural networks were used to visually classify the degree of severity into six predefined categories, using sections of leaves with apparent symptoms of the fungus. Additionally, four supervised classification algorithms were trained and compared: K-NN, decision trees, random forests, and neural networks. The model that obtained the best performance was the random forest model, with an accuracy of 90%, whereas K-NN reached the lowest value (79%). These results position the model as a helpful tool for automated disease monitoring in the field, facilitating phytosanitary decision-making. In addition, the methodology provides a solid foundation for researchers interested in designing and implementing automatic plant disease classification systems, providing clear information on the performance of different classification architectures. The algorithm developed in R allows this solution to be adapted and scaled to different cultivation conditions and types of foliar diseases.

Keyword Group [xml:lang=en]

Title: Keywords

Keyword: algorithm

Keyword: disease

Keyword: powdery mildew

Counts

Figure Count [count=11]

Table Count [count=2]

Equation Count [count=0]

Reference Count [count=36]

Abstract

Keywords:

algorithm, disease, powdery mildew.

Introduction

Powdery mildew, caused by fungi such as Golovinomyces cichoracearum, Erysiphe cichoracearum, Sphaerotheca fuliginea, Podosphaera xanthii and Podosphaera fusca (Mohamed et al., 1995; Morejón et al., 2010), is a widely distributed disease that seriously compromises the quality and yield of cucumber crops, generating significant economic losses (Sun et al., 2022). This phytopathology initially manifests itself as a whitish coloration that changes to creamy-yellow spots, mainly affecting the leaves (Rocha et al., 2023).

Given its impact, early and accurate detection is essential for timely management; this directly contributes to food security (Kaushik et al., 2023). In this context, artificial intelligence (AI) offers valuable tools to improve agricultural systems and the economy of farmers. Common supervised classification techniques include logistic regression, discriminant analysis, K-NN, neural networks, decision trees and random forests (Zapata et al., 2014; Paymode and Malode, 2022).

Automated plant disease estimation streamlines monitoring in large crops and enables early detection of symptoms. Different machine learning algorithms generate varying results, so it is crucial to identify the most suitable one for each specific problem. The K-NN technique is based on the proximity of similar objects in the feature space (Zhao and Yang, 2023).

Another method currently used is that of decision trees, based on observations and logic, which represent and categorize successive conditions for solving problems (Ramos et al., 2023). Likewise, Pacciorett et al. (2020) report that the random forest (RF) classification method is a regression model that uses sampling to construct multiple regression trees and assembles them to achieve a predictive model.

It is worth mentioning that among the most used classification models was the neural network classification (CNN), which consists of computational classification models that offer solutions and validation of sequences in pattern recognition as an extension of classical statistical methods (Hassoun and Threshold, 1995). This method was adapted to the demands of the environment, since it can combine techniques that process information in parallel (Figueredo y Ballesteros, 2016).

The objective was to develop a tool for the automated estimation of leaf damage caused by P. xanthii in cucumber leaves, using digital images and machine learning techniques. A methodology based on convolutional neural networks (CNN), decision trees, random forests (RF) and K-NN was proposed in order to compare their performance in classifying disease severity.

Materials and methods

Plant material

The plant material was obtained from a commercial plot of French cucumber that is five years old. A parthenocarpic variety has been exclusively cultivated in this plot, and the manifestation of symptoms associated with Podosphaera xanthii has been observed in all production cycles.

Identification of the microorganism

The microorganism was identified by preparing samples on slides from the leaf lesions. The observed morphology was contrasted with the descriptions published in the taxonomic manual of Erysiphaceae (Braun and Cook, 2012) and with the characterization provided by Cipriano and González (2022) (Figure 1).

a id="f12">

Figure 1

Figure 1. a) ovoid conidia of P. xanthii seen under a microscope and b) chain conidia of P. xanthii.

Training data

For the analysis, random images were taken, and sections of leaves that exhibited obvious symptoms of the fungus were extracted. These segments were unified to create a 5 580 000-pixel image, which captured the distinctive features of the affected leaves. Similarly, the same process was performed to obtain a control image, cutting out areas of leaves with no visible signs of the disease (Figure 2).

a id="f13">

Figure 2

Figure 2. Set of images intended for training. a) Image with symptoms of the fungus and b) Image without symptoms of the fungus.

In order to avoid bias in the training of the classification models, the dataset was balanced by selecting an equivalent number of pixels from both classes (healthy and diseased). This balance was achieved by uniform random sampling, ensuring each class was represented by the same number of pixels in the training set.

Processing of sample images. In order to train and validate the machine learning models, the segmented images were grouped into three datasets, each with a different number of pixels: 4 464 000; 3 348 000 and 2 232 000. A stratified partition scheme was applied to each set, where 70% of the pixels were used for training and the remaining 30% for validation and testing.

This proportion ensured a balanced distribution that allowed the model to generalize without overfitting the input data. As shown in Figure 3, the total number of pixels used exclusively for training was 7 030 800, distributed as follows: 3 124 800 pixels from the first set, 2 343 600 from the second and 1 562 400 from the third.

a id="f14">

Figure 3

Figure 3. Process of training and validation data partitioning for the detection of symptoms caused by the fungus.

Training of machine learning systems. Training data were used to generate machine learning models (K-NN, decision trees, random forests, and neural networks) in order to classify pixels as ‘healthy’ or ‘sick’. For training, the most common hyperparameters reported in the literature were considered (Table 1). The input data for the models was the values of the RGB channels of each pixel.

Table 1

Table 1. Hyperparameters reported in the literature for different machine learning techniques.

Classification model	Hyperparameter	Reference
Best neighbor K-NN	Number of neighbors (K)	Zhang et al. (2019)
Decision tree	The depth Number of observations at each node	Demirovi’c and Stuckey (2021)
Random forests (RF)	Number of trees	Benali et al. (2019)
Neuronal networks (CNN)	Layer depth Number of layers Activation function	Ma et al. (2018)

After fitting the machine learning models, a confusion matrix was generated (Table 2).

Table 2

Table 2. A confusion matrix used to calculate the metrics of machine learning models.

Class observed	Class estimated by the model
Class observed	Healthy	Sick
Healthy	True positive (TP)	False positive (FP)
Sick	False negative (FN)	True negative (TN)

The above data were used to calculate the accuracy using the following formula: $Accuracy= \frac{TP+TN}{TP+TN+FP+FN}$ . In all cases, calculated accuracy was reported for validation data only.

Image processing for severity calculation

To carry out the segmentation, the images were transformed into the HSV color space. Subsequently, the thresholding method (Otsu, 1978) was applied to generate a binary image, which was multiplied by each of the RGB channels of the original image. This procedure allowed us to obtain a segmented image, which was later used as input for the classification system (Figure 4).

a id="f15">

Figure 4

Figure 4. **Methodology of preprocessing and processing of cucumber leaves to estimate the severity of the fungus P. xanthii.**

At the same time, each image was labeled with one of the six visual severity classes previously defined, Mohamed et al. (1995) and modified by Hernandez et al. (2007), which corresponded to different ranges of percentage of foliar damage (0%, 10-29%, 30-49%, 50-69%, 70-89%, 90-100%) (Figure 5). Fifty images were selected per class, resulting in a balanced dataset for training and testing.

a id="f16">

Figure 5

Figure 5. **Severity scale used to classify P. xanthii damage in cucumber leaves visually.**

Results and discussion

K-best neighbor (K-NN). Figure 6 showed the maximum accuracy of 0.85, with a slight tendency to decrease when the number of data used as predictors (neighbors) increased. Despite using from 40% to 100% of the population, this variable did not significantly affect the model’s accuracy.

a id="f17">

Figure 6

Figure 6. **Accuracy of K-NN models for P. xanthii estimation.**

Cruz et al. (2020) findings used the K-NN method because of its success in agricultural studies to accelerate disease detection. This research was conducted with images of selected cucumber leaves with and without disease in order to classify the pixels; according to Guaillazaca and Hernández (2020), they used this classification model to obtain a code that would allow them to identify three variables of identification (good product, regular product, bad product) by colors and forms.

The best neighbor technique is widely used to classify foliar diseases with high accuracy (Sarkar et al., 2023). There are works that use this algorithm and have achieved an accuracy of 0.9 using color and texture parameters (Zhang and Wallace, 2015); in this analysis, an accuracy of 0.85 was obtained. One of the main problems in training machine learning models is the optimization of hyperparameters (Ghawi and Pfeffer, 2019) because in K-NN models, the class estimation is based on the Euclidean distance between the closest observations, and the number of observations affects the final accuracy of the model (Torgo, 2014).

In this research, approximately 10 data points were used to estimate the class of each of the sample data. Suganya et al. (2020) used this same technique, with an accuracy greater than 0.9; however, the images they used in training the model were taken under controlled lighting conditions. In contrast, in this work, the training images were obtained directly in the field, which generated a decrease in the accuracy of the model due to the highly variable lighting conditions.

Decision tree. The accuracy in the validation tests was 0.79 when 80% of the data was used; it could also be observed that as the size of the population increased, the accuracy increased. One of the hyperparameters that significantly influenced the accuracy of the model is the number of branches (Fernández, 2023). In the present research, it was observed that accuracy did not improve when increasing this parameter beyond seven branches (Figure 7); this behavior is consistent with what was reported by Ramos (2020), who used five branches and obtained an accuracy of 0.84.

a id="f18">

Figure 7

Figure 7. **Effect of classification tree size on model accuracy for P. xanthii estimation.**

In this regard, Olivares et al. (2021) used random forests to determine the development of the incidence of banana wilt, obtaining an accuracy of 0.74, so they did not consider it an efficient model. The main disadvantage of this method is the high computational cost required for its implementation (Alaminos, 2023).

In this research, the symptoms visible in cucumber leaves caused changes in the value of the RGB channels, compared to the values present in normal leaves, which were used by the model to classify the pixels (Figure 8); these results coincide with Velázquez et al. (2011), who report that powdery mildew in rose can be detected through the color space with images taken at a close distance for better accuracy.

a id="f19">

Figure 8

Figure 8. Representation of the decision rules generated by the random forest algorithm, for the classification of healthy cucumber leaves and leaves with the presence of disease.

The results obtained show that the random forest model allowed us to achieve a satisfactory classification of the classes analyzed. This approach has previously been used by various researchers in data analysis (Flores et al., 2016) due to its ability to adapt to different types and scales of databases. Its main advantage lies in the fact that it does not require assuming a normal distribution and offers remarkable flexibility to model nonlinear relationships between predictor variables and target classes.

Random forests. The number of trees and the percentage of the population used in the training data affected the accuracy of the model (Figure 9); the highest values were obtained when 80% of the population was used to train the model. In general, the accuracy did not increase after 100 trees, with which a maximum accuracy of 0.9 was achieved.

a id="f20">

Figure 9

Figure 9. Effect of the number of trees on the accuracy of a model of random forests with different population proportions.

This machine learning method is based on the use of groups of random trees to estimate the class to which the data belongs; therefore, the number of trees is considered as an important control parameter that significantly affects the final accuracy with which each class is estimated (Sujatha et al., 2021); this coincides with what was reported in this analysis, in which the accuracy increased until it remained stable despite increasing the number of trees used.

Other authors have used random forests to find nonlinear relationships between variables to detect diseases in plants employing images; for example, Wójtowicz et al. (2021) used training data to generate a model based on random forests, obtaining success rates above 90%, which is similar to what is reported in this work.

Neural networks

The neural network structure that showed the highest accuracy values was when three hidden layers were used, with two neurons each, using 80% of the total available population (Figure 10).

a id="f21">

Figure 10

Figure 10. Structure of the neural network used for estimation.

Neural networks have been widely used in disease detection; for example, Ma et al. (2018); Larijani et al. (2019) used this technique to detect diseases in rice leaves using the Lab color values as input variables; in the present study, the RGB color space was used, so a conversion was not necessary. Other authors, Sujatha et al. (2021), used different neural network architectures to detect viruses in plants with an accuracy greater than 0.95, which is much higher than the accuracy reported in this work using the same technique (neural networks); nevertheless, the symptoms produced by viruses are a very accentuated chlorosis that is relatively easy to detect using models based on color changes.

In the specific case of cucumbers, Zhang et al. (2019) tested different systems based on neural networks to identify different diseases in cucumber leaves based on a set of images under relatively controlled conditions, with accuracies greater than 0.95 and with training times of 6 and 14 h. In this research, training times of between 60 and 85 min were obtained, but the number of classes was significantly lower, so less processing power was required.

The preprocessing of the images and the obtaining from the training data is one of the factors that most affects the performance of the mathematical models (Li et al., 2022); in the case of the images used as the basis to generate the training dataset of this work, they were obtained under uncontrolled conditions, which generated a great diversity of conditions in color and shape of those parts of the image that did not represented the leaf.

Of all the machine learning methods tested in this study, it was the random forest method that showed the highest accuracy value, so this model was used to compare the relationship between human classification of plants using a hedonic scale and the severity reported by the model. The results showed that there was a relationship between the class given by a person using a hedonic scale and severity (Figure 11).

a id="f22">

Figure 11

Figure 11. **Comparison of estimated hedonic scale of P. xanthii and severity using the random forest classification model.**

Conclusions

It is demonstrated that machine learning algorithms are an effective tool for high-precision estimation of the severity of leaf damage caused by Podosphaera xanthii in cucumber leaves. In particular, the model based on random forests achieved an accuracy of 90%, standing out for its generalizability and its robustness against variable lighting and capture conditions in the field. This automated approach represents a viable alternative to the traditional visual assessment method, bringing objectivity, reproducibility, and efficiency to phytosanitary monitoring.

The validation of the model using a hedonic scale reveals a significant correspondence between computational predictions and human assessments, which supports its practical application in integrated disease management programs. Based on these findings, several future lines of work are proposed. It is recommended to expand the dataset to improve the generalizability of the model, incorporating images of different cucumber varieties, diverse phenological phases, and heterogeneous environmental conditions.

Likewise, the integration of the model into mobile platforms will allow real-time diagnostics to be carried out, directly in the agricultural environment. These actions could significantly contribute to the adoption of artificial intelligence technologies in precision agriculture, improving timely disease detection and decision-making in the field.

Bibliography

Alaminos, F. A. F. 2023. Árboles de decisión en R con Random Forest. Obts Ciencia Abierta Alicante: limecop. 47-50 pp.

Benali, L.; Notton, G.; Fouilloy, A.; Voyant, C.; Dizene, R.; Boum, H.; Ene, E.; Alia, E.; Ezzouar, B. and Algiers, A. 2019. Solar radiation forecasting using artificial neural network and random forest methods: application to normal beam, horizontal diffuse and global components. Renewable Energy. 132:871-884. https://doi.org/10.1016/j.renene.2018.08.044.

Braun, U. and Cook, A. R. T. 2012. Taxonomic manual of the erysiphales (powdery mildews). Centraalbureau voor Schimmelcultures. 11. CBS Biodiversity: 86-644 pp.

Cipriano, G. R. y González, D. 2022. Identificación molecular de los tipos de compatibilidad en poblaciones de Podosphaera xanthii (Erysiphaceae) infectando cucurbitáceas en Veracruz, México. Acta Botánica Mexicana. 128(129):1-11. https://doi.org/10.21829/abm129.2022.2068.

Cruz, S. H.; Sanchez, M. G.; Rivera, C. J. P. and Avila, G. H. 2020. Identification of phenological stages of sugarcane cultivation using Sentinel-2 images. applications in software engineering proceedings of the 9^th. International Conference on Software Process Improvement, CIMPS. 110-116 pp. https://doi.org/10.1109/CIMPS52057.2020.9390095.

Demirovi´c, D. and Stuckey, P. J. 2021. Optimal decision trees for nonlinear metrics. 3733-3744. http://www.aaai.org.

Fernández, A. 2023. Árboles de decisión en R con Random Forest. Obets Ciencia Abierta. Alicante: limencop. 134 p.

Figueredo, A. A. y Ballesteros, R. J. 2016. Identificación del estado de madurez de las frutas con redes neuronales artificiales, una revisión. Ciencia y Agricultura 13(1):117-132. https://www.redalyc.org/journal/5600/560062814010/html/.

Flores, P. G.; López, I. F.; Kemp, P. D.; Dörner, J. y Zhang, B. 2016. Modelo de árbol de decisión: una herramienta para el manejo de la pradera. Agro Sur. 44(2):3-10. https://doi.org/10.4206/agrosur.2016.v44n2-02.

Ghawi, R. and Pfeffer, J. 2019. Efficient hyperparameter tuning with grid search for text categorization using knn approach with BM25 similarity. Open Computer Science. 9(1):160-180. https://doi.org/10.1515/COMP-2019-0011/machinereadablecitation/ris.

Guaillazaca, G. C. A. y Hernández, A. V. 2020. Clasificador de productos agrícolas para control de calidad basado en machine learning e industria 4.0. Revista Perspectivas. 2(2):21-28. https://doi.org/10.47187/perspectivas.vol2iss2.pp21-28.2020.

Hassoun, M. H. 1995. Fundamentals of artificial neural networks book. MIT Press. Cambridge, Massachusetts. 417-452 pp. https://kupdf.net/download/48375906-fundamentals-of-artificial-neural-networks-book-1-598b1ef3dc0d601b67300d18-pdf.

Hernández, F. Y.; González, Z. E.; Marrero, T. A. y Dueñas, G. M. J. 2007. Uso de escala para determinar severidad de enfermedades fungosas en híbridos de pepino bajo cultivo protegido. INIFAT. 11(31):49-51.

Kaushik, H.; Khanna, A.; Singh, D.; Kaur, M. and Lee, H. N. 2023. TomFusioNet: a tomato crop analysis framework for mobile applications using the multi-objective optimization based late fusion of deep models and background elimination. Applied Soft Computing. 133:1-24. https://doi.org/10.1016/j.asoc.2022.109898.

Larijani, M. R.; Asli-Ardeh, E. A.; Kozegar, E. and Loni, R. 2019. Evaluation of image processing technique in identifying rice blast disease in field conditions based on KNN algorithm improvement by K-means. Food Science and Nutrition. 7(12):3922-3930. https://doi.org/10.1002/FSN3.1251.

Li, S.; Li, K.; Qiao, Y. and Zhang, L. 2022. A multi-scale cucumber disease detection method in natural scenes based on YOLOv5. Computers and Electronics in Agricultura. 202:1-12. https://doi.org/10.1016/J.COMPAG.2022.107363.

Ma, J.; Du, K.; Zheng, F.; Zhang, L.; Gong, Z. and Sun, Z. 2018. A recognition method for cucumber diseases using leaf symptom images based on deep convolutional neural network. Computers and Electronics in Agriculture 154:1-7. https://doi.org/10.1016/j.compag.2018.08.048.

Mohamed, Y. F.; Bardin, M. N. P. C. and Pitrat, I. 1995. Causal agents of powdery mildew of cucubits in Sudan. Plant Disease. 79(6):635-636. https://www.apsnet.org/publications/plantdisease/backissues/Documents/1995Articles/PlantDisease79n06-634.

Morejón, G. N.; Coca, M. B. y Martínez, I. D. 2010. Mildiu polvoriento en las cucurbitáceas. Revista de Protección Vegetal. 25(1):44-50. https://revistas.censa.edu.cu/index.php/RPV/article/view/282.

Olivares, B. O.; Vega, A.; Angélica, M.; Calderón, R.; Rey, J. C. and Lobo, D. 2021. Classification of areas affected by banana wilt: an application with machine learning algorithms in Venezuela. REICIT: Revista especializada de ingeniería y ciencias de la tierra https://revistas.up.ac.pa/index.php/REICTORCID.

OTSU, N. A. 1978. Threshold selection method from gray level histogram. IEEE. Transactions on Systems, Man and Cybernetycs. 9(1):62-66.

Pacciorett, P. A.; Kurina, F. G. y Balzarini, M. G. 2020. Muestreo de sitios a escala regional para mapeo digital basado en propiedades de suelo. Ciencia del Suelo. 38(2):310-320. https://www.scielo.org.ar/scielo.php?script=sciarttext&pid=S185020672020000200310&lng=es&tlng=es.

Paymode, A. S. and Malode, V. B. 2022. Transfer learning for multi-crop leaf disease image classification using convolutional neural Network VGG. Artificial Intelligence in Agriculture. 6:23-33. https://doi.org/10.1016/j.aiia.2021.12.002.

Ramos, R. T. V.; Castillo, A. P. J.; Ticona, J. B. y Velasco, B. J. G. 2023. Predicción del éxito del telemarketing bancario mediante el uso de árboles de decisión. Innovación y Software. 4(1):122-137. https://doi.org/10.48168/innosoft.s11.a84.

Rocha, J. F. L.; Reyes, D. Y.; Días, L. E.; Francisco, F. N. y Juárez, C. J. A. 2023. El mildiu polvoriento en calabaza: identificación y manejo bajo las condiciones de Tehuacán, México. Cultivos Tropicales. 44(2): https://cu-id.com/2050/v44n2e09. https://ediciones.inca.edu.cu/index.php/ediciones/article/view/1731.

Sarkar, C.; Gupta, D.; Gupta, U. and Hazarika, B. B. 2023. Leaf disease detection using machine learning and deep learning: review and challenges. Applied Soft Computing. 145:1-61. https://doi.org/10.1016/J.ASOC.2023.110534.

Suganya, D. K.; Srinivasan, P. and Bandhopadhyay, S. 2020. H2K-A robust and optimum approach for detection and classification of groundnut leaf diseases. Computers and Electronics in Agriculture. 178. https://doi.org/10.1016/j.compag.2020.105749.

Sujatha, R.; Chatterjee, J. M.; Jhanjhi, N. Z. and Brohi, S. N. 2021. Performance of deep learning vs machine learning in plant leaf disease detection. Microprocessors and Microsystems. 80:103615. https://doi.org/10.1016/J.MICPRO.2020.103615.

Sun, Z.; Hu, S. Y. and Wen, Y. 2022. Biological control of the cucumber downy mildew pathogen Pseudoperonospora cubensis. Horticulturae. 8:1-15. MDPI. https://doi.org/10.3390/horticulturae8050410.

Torgo L. 2014. Data mining using R: learning with case studies (CRC) Press, Ed. Second. Minneapolis, Minnesota, USA. 87-165 pp. ISBN: 9781439810187.

Velázquez, L. N; Sasaki, Y.; Nakano, K.; Mejía, M. J y Romanchik, K. E. 2011. Detección de cenicilla en rosa usando procesamiento de imágenes por computadora. Revista Chapingo Serie Horticultura. 17(2):151-160. http://sourceforge.net/projects/opencvlibrary.

Wójtowicz, A.; Piekarczyk, J.; Czernecki, B. and Ratajkiewicz, H. 2021. A random forest model for the classification of wheat and rye leaf rust symptoms based on pure spectra at leaf scale. Journal of Photochemistry and Photobiology B. Biology. 223:1-11. https://doi.org/10.1016/j.jphotobiol.2021.112278.

Zhang, S.; Zhang, S.; Zhang, C.; Wang, X. and Shi, Y. 2019. Cucumber leaf disease identification with global pooling dilated convolutional neural network. Computers and Electronics in Agriculture 162:1-9. https://doi.org/10.1016/j.compag.2019.03.012.

Zhang, Y. and Wallace, B. 2015. A sensitivity analysis and practitioners’ guide to convolutional neural networks for sentence classification. 256-263 pp. http://arxiv.org/abs/1510.03820.

Zhao, Y. and Yang, L. 2023. Distance metric learning based on the class center and nearest neighbor relationship. Neural Networks. 164:631-644. https://doi.org/10.1016/j.neunet.2023.05.004.

Zapata, T. A.; Pérez, L. S. y Mora, F. J. 2014. Método basado en clasificadores k-NN parametrizados con algoritmos genéticos y la estimación de la reactancia para localización de fallas en sistemas de distribución. Revista Facultad de Ingeniería Universidad de Antioquia. 70:220-232. http://www.scielo.org.co/scielo.php?script=sci-arttext&pid=S012062302014000100021&lng=pt&tlng=es.

https://doi.org/10.29312/remexca.v16i30.4039 elocation-id: e4039

Estimation of Podosphaera xanthii in cucumber: machine learning techniques with digital images

Berenice Alonso-Barrera

Francisco Marcelo Lara-Viveros

Audberto Reyes-Rosas

Abstract

Keywords:

Introduction

Materials and methods

Plant material

Identification of the microorganism

Figure 1

Figure 1. a) ovoid conidia of P. xanthii seen under a microscope and b) chain conidia of P. xanthii.

Training data

Figure 2

Figure 2. Set of images intended for training. a) Image with symptoms of the fungus and b) Image without symptoms of the fungus.

Figure 3

Figure 3. Process of training and validation data partitioning for the detection of symptoms caused by the fungus.

Table 1

Table 1. Hyperparameters reported in the literature for different machine learning techniques.

Table 2

Table 2. A confusion matrix used to calculate the metrics of machine learning models.

Image processing for severity calculation

Figure 4

Figure 4. Methodology of preprocessing and processing of cucumber leaves to estimate the severity of the fungus P. xanthii.

Figure 5

Figure 5. Severity scale used to classify P. xanthii damage in cucumber leaves visually.

Results and discussion

Figure 6

Figure 6. Accuracy of K-NN models for P. xanthii estimation.

Figure 7

Figure 7. Effect of classification tree size on model accuracy for P. xanthii estimation.

Figure 8

Figure 8. Representation of the decision rules generated by the random forest algorithm, for the classification of healthy cucumber leaves and leaves with the presence of disease.

Figure 9

Figure 9. Effect of the number of trees on the accuracy of a model of random forests with different population proportions.

Neural networks

Figure 10

Figure 10. Structure of the neural network used for estimation.

Figure 11

Figure 11. Comparison of estimated hedonic scale of P. xanthii and severity using the random forest classification model.

Conclusions

Bibliography

Article Information (continued)

Keywords

https://doi.org/10.29312/remexca.v16i30.4039
elocation-id: e4039

Figure 4. **Methodology of preprocessing and processing of cucumber leaves to estimate the severity of the fungus P. xanthii.**

Figure 5. **Severity scale used to classify P. xanthii damage in cucumber leaves visually.**

Figure 6. **Accuracy of K-NN models for P. xanthii estimation.**

Figure 7. **Effect of classification tree size on model accuracy for P. xanthii estimation.**

Figure 11. **Comparison of estimated hedonic scale of P. xanthii and severity using the random forest classification model.**