elocation-id: elocation-id: e4108
Coffee is one of the most important agricultural crops in Mexico due to its economic and social relevance and because it generates income in rural areas, placing the country in the fourteenth place worldwide as a producer. In order to identify the economic factors that determine national coffee production, an econometric study was conducted in Mexico for the 1960-2024 period. A multiple linear regression model was applied based on time series obtained from official sources such as INEGI, SADER, OIC and Banxico. The dependent variable was coffee production, explained by the cultivated area, international prices, real input prices and production lags. The results show a high level of fit (R²= 0.974; Fcalculated= 207.72 > Ftabulated= 2.15), indicating that the selected variables significantly explain the variations in production. It was found that the cultivated area, lagged production and two- and four-period lagged prices positively influence production, while international prices and some price lags have negative effects. In conclusion, Mexican coffee production depends mainly on the cultivated area and the behavior of production in previous years, reflecting the temporal persistence characteristic of this perennial crop.
coffee production, economic factors, elasticity of supply, Mexican agricultural sector, multiple linear regression.
Coffee is one of the most important agricultural crops in Mexico due to its economic, social, cultural, and ecological relevance. It is grown in more than twelve states, with annual production ranging from 4 to 5 million quintales, covering more than 480 000 production units, mainly in rural regions of the southeast (INAES, 2019) and according to the 2022 Agricultural Census (INEGI, 2023).
Worldwide, Brazil, Vietnam, Indonesia, Colombia and Ethiopia account for more than 70% of coffee production. Mexico ranks fourteenth, with around 2% of the world’s volume and 1% of the world’s exports (FAOSTAT, 2025).
Coffee production is mainly concentrated in Chiapas, Veracruz, Puebla, and Oaxaca, which together account for about 90% of the national total. It is estimated that some 500 000 coffee growers participate in the 15 producing states, which reflects its economic and social relevance (SENASICA, 2020; SIAP-SIACON, 2025).
According to the Secretariat of Agriculture and Development (SADER, 2018), coffee generates more than 1.5 million indirect jobs, and three million Mexicans depend on this activity. Nonetheless, the sector faces challenges such as international price volatility, rising input costs, and structural limitations of the cultivated area (OIC, 2024).
Several authors have analyzed the economic determinants of agricultural supply, highlighting the influence of relative prices, input costs, and structural factors on production. Tomek and Kaiser (2014) explain that producers allocate resources between crops based on expected prices and production costs, which generates substitution or complementarity relationships between agricultural products.
In the case of Mexico, Martínez and Salinas (2004) found that the coffee supply has low short-term price elasticity, due to structural restrictions and the crop’s perennial nature. Likewise, Terrones-Cordero and Sánchez-Torres (2013) document that in the Mexican agricultural sector, internal and external prices, together with input use, maintain dynamic relationships that influence production decisions.
In their study, Martínez and Salinas (2004) found that supply responds weakly to prices in the short term; however, fertilizer prices partially influence production. The authors also pointed out that land and labor are the most elastic factors in agricultural production, justifying the inclusion of variables associated with input use and productive efficiency.
The need to understand the factors that determine coffee production has led to the use of econometric models that allow the identification of the magnitude and direction of relationships between socioeconomic and productive variables. In this context, the economic literature proposes to answer different questions within the social and economic approach, such as: what are the determining factors in coffee production in Mexico?
The research consisted of analyzing the economic, productive, and structural factors that determine the coffee production function in Mexico, in order to identify the variables that significantly influence the behavior of the national coffee supply.
To this end, it was proposed to estimate a multiple linear regression model, which allowed us to quantify the effects of the real price of bananas, the international price of coffee, the prices of agricultural inputs, the harvested area, and the production lags on production levels.
Therefore, it was sought to generate empirical evidence that deepens understanding of the performance of the coffee-growing sector and serves as a basis for designing public policies aimed at strengthening the productivity and sustainability of coffee cultivation in Mexico.
Coffee production is framed within microeconomic theory in the agricultural field, where the producer maximizes profits by combining land, labor and inputs, and the supply depends positively on the product’s price and negatively on input costs (Tomek and Kaiser, 2014).
In perennial crops such as coffee, the adjustment is gradual due to the rigidity of the area and the ripening period, so dynamic models of partial adjustment are used, where production depends on past prices and levels (Nerlove, 1958). Thus, the area and the price of coffee are expected to have positive effects, while inputs are expected to have a negative effect, incorporating the sector’s productive inertia.
The formulation of the classical linear regression model is based on the classical theory of statistical inference, which is divided into two main areas: estimation and hypothesis testing. The purpose of regression analysis is to understand the relationship between a dependent variable and one or more independent variables (Gujarati, 2012).
The purpose is to estimate or predict the average value of the dependent variable in the population, based on known values of the explanatory variables, considering that these come from repeated observations or samples.
The econometric model was constructed by integrating economic theory, empirical evidence and statistical tools. It is based on the agricultural supply theory and the partial adjustment model of Nerlove (1958), in which production depends on prices, costs, area and previous production, incorporating the dynamics of perennial crops.
Empirically, official time series (1960-2024) on the sector’s production, real prices and costs for Mexico were used. Statistically, it was estimated by multiple linear regression by OLS, with diagnostic tests to validate significance and assumptions of the model.
The data were obtained from the statistical yearbook of national apparent consumption of agricultural products of INEGI (CNAPA, by its Spanish acronym) for the period from 1960 to 1980, and from the Agrifood Information Consultation System (SIACON, by its Spanish acronym) of the General Directorate of the Agrifood and Fisheries Information Service for data from 1980 to 2024.
Likewise, data were obtained from the Bank of Mexico (Banxico) to obtain the value of the dollar from 1960 to 2024, from the National Minimum Wage Commission (CONASAMI, by its Spanish acronym), and from the Food and Agriculture Organization Corporate Statistical Database (FAOSTAT) for international prices of coffee and fertilizers. Nominal monetary values were converted into real values by using the national consumer price index, base 2024.
The time series covered the 1960-2024 period (65 annual observations). Nevertheless, because some explanatory variables were lagged by up to five periods, the first five observations (1960-1964) could not be used in estimating the model, as there were no prior values for their construction.
The econometric model was estimated with 60 effective observations (1965-2024). The information obtained from each database and used in the original model is as follows:
SIAP-SIACON: coffee production (CP), coffee harvested area (CHA), coffee yield obtained (CYO), banana production (BP), banana harvested area (BHA), banana yield obtained (ROP), average rural price of coffee (ARPC), average rural price of bananas (ARPB), cocoa production (COCP), cocoa harvested area (COCHA), cocoa yield obtained (COCYO), average rural price of cocoa (ARPCOC).
FAOSTAT: price of urea fertilizer (PUF), international price of coffee (IPC). CONASAMI: general minimum wage (GMW).
Real values with the INPC: real price of coffee (REALPCOFFEE), real price of bananas (REALPBAN), real price of cocoa (REALPCOC) and real price of fertilizer (REALPRICEFERT).
A database from 1960 to 2024 was structured for the main product (coffee), complementary product (bananas), substitute product (cocoa), inputs (urea fertilizer), harvested area, production, average rural price and real prices of the values.
To obtain the best model, the values were converted to logarithms and lags were added for production and real prices; when the regressions were run, the model was refined by discarding variables such as the price of cocoa, the international price and the minimum wage, among others.
The multiple linear regression model was estimated to use SAS Studio via the SAS OnDemand for Academics platform, developed by SAS Institute. This environment enables descriptive analysis, hypothesis testing, regressions, multivariate models, and time series analysis to be performed using specialized procedures such as SAS/STAT® and SAS/ETS®, which are widely used in econometric research (SAS Institute Inc., 2023).
The model formulated and estimated is:
The estimation procedure used is: PROC REG DATA=LOG14; MODEL LP= LHA LREALPBAN LREALPUF LIP LRPC-LAG2 LRPC-LAG3 LRPC-LAG4 LRPC-LAG5 LP-LAG1/SPEC VIF TOL DW DWPROB; OUTPUT OUT= RESIDUAL RESIDUAL= RES; Run; Proc univariate data= residual plots normaltest; Var res; Run.
The dependent variable is: LP= logarithm of coffee production. The final independent variables that were included are the following: logarithm of the harvested area (LHA). Lagged prices of coffee (LRPC-LAG2, LRPC-LAG3, LRPC-LAG4, LRPC-LAG5). Logarithm of lagged production (LP-LAG1). Logarithm of the real price of bananas (LREALPBAN). Logarithm of real fertilizer prices (LREALPUF). Logarithm of international prices (LIP).
To validate the reliability of the multiple linear regression model, various statistical tests were applied. The goodness of fit was evaluated using the R² and adjusted R² coefficients, which showed a high explanatory level for coffee production, confirming the adequacy of the model (Montgomery et al., 2012).
The individual and global significance of the parameters was verified using Student’s t-test and Fisher’s F-test, demonstrating that the explanatory variables exert statistically significant effects on production.
Likewise, residual diagnostic tests (Shapiro-Wilk, Kolmogorov-Smirnov and Durbin-Watson) were applied, which revealed reasonable compliance with the assumptions of normality, independence and the absence of autocorrelation, thereby ensuring the validity of the inferences (Montgomery et al., 2012).
Finally, multicollinearity detection using the Variance Inflation Factor (VIF) confirmed the stability and consistency of the estimated coefficients, thereby guaranteeing the reliability of the model results (Montgomery et al., 2012).
The estimated model includes a lagged dependent variable (LP-LAG1); the Durbin-Watson statistic is not appropriate for evaluating serial autocorrelation because its distribution is affected in the presence of dynamic regressors.
For this reason, the autocorrelation of the residuals was evaluated using the Breusch-Godfrey (LM) test, which allows contrasting higher-order autocorrelation in models that include lagged variables. This test is consistent under dynamic specifications and is the recommended procedure in regression models with an autoregressive structure (Gujarati and Porter, 2012; Wooldridge, 2013).
Since the study uses long-term time series (1960-2024), the stationarity of the variables was evaluated using the augmented Dickey-Fuller (ADF) and Phillips-Perron (PP) unit root tests. These tests contrast the existence of a unit root, that is, non-stationarity, as a null hypothesis.
The results indicated that coffee production in logarithms (LP) is not stationary in levels, since, in all specifications (mean zero, intercept, and trend), the null hypothesis of a unit root (p > 0.1) was not rejected. This behavior is consistent with long-term macroeconomic and agricultural series.
However, when estimating the model at logarithmic levels and analyzing the residuals, the ADF test was applied again on the residual series. In this case, the results showed a strong rejection of the null hypothesis of a unit root (p < 0.001), indicating that the residuals are stationary.
According to the approach of Engle and Granger (1987), if the variables are I(1), but the residuals are stationary, there is cointegration, which confirms a long-term equilibrium relationship and rules out spurious regression; therefore, the estimation in logarithmic levels is consistent. Likewise, the inclusion of lags is based on the partial adjustment model of Nerlove (1958), which explains the gradual adjustment of production to price changes, especially in perennial crops such as coffee, where the effects are distributed over time.
The multiple linear regression model was estimated using 65 observations, of which 60 were used in the analysis after excluding five cases with missing values. The analysis of variance (Anova) showed that the model has nine degrees of freedom and a sum of squares of 44.00693.
The Fcalculated value (207.72), which was higher than the Ftabulated value (2.15), confirmed the overall significance of the model, indicating that the explanatory variables collectively have a statistically significant effect on coffee production in Mexico.
In Table 1, the coefficient of determination (R²= 0.974) and its adjusted version (adjusted R²= 0.9693) indicate that 97.4% of the variability in coffee production is explained by the model’s independent variables, reflecting a high explanatory capacity and an adequate level of fit. This level of accuracy is supported by a coefficient of variation of 1.12% and a root mean square error (RMSE) of 0.15343, values that confirm the consistency and reliability of the estimates.
According to Montgomery et al. (2012), results of this magnitude are characteristic of well-specified models that are capable of adequately representing the phenomenon analyzed.
Serial autocorrelation was evaluated using a fourth-order autoregressive specification (AUTOREG), with no statistical significance found in the autoregressive terms (p > 0.05), indicating the absence of autocorrelation up to that order. The stability of the coefficients and the high fit of the model (R²= 0.974) confirmed its consistency.
Although the Shapiro-Wilk test rejected normality (p = 0.002), other tests such as the Kolmogorov-Smirnov, Cramer-von Mises and Anderson-Darling tests did not show significant deviations.
The Shapiro-Wilk test (p = 0.002) rejects the hypothesis of normality of the residuals. Nevertheless, given the sample size (n= 60), the OLS estimators retain consistency and asymptotic normality under the Central Limit Theorem. Therefore, the statistical inference remains valid in approximate terms.
Collinearity is explained by the structural relationship between harvested areas and production, expected in perennial crops such as coffee, where production depends on the cultivated area. Therefore, it is not an anomaly, but a characteristic of the sector. The fit indicators meet the criteria for econometric robustness, as they show that models with high R² and low residual variance have solid explanatory capacity in agricultural analysis.
The analysis of the estimated coefficients shown in Table 2 showed both positive and negative effects on coffee production in Mexico. The variables with a positive and statistically significant influence were the harvested area (LHA= 1.6875; p < 0.0001), the two-period (LRPC-LAG2= 0.2154; p = 0.0114) and four-period lagged prices (LRPC-LAG4= 0.5175; p < 0.0001), as well as lagged production (LP-LAG1= 0.6119; p < 0.0001).
The literature on agricultural supply indicates that the relative prices of alternative crops and input costs influence production decisions (Tomek and Kaiser, 2014; OECD-FAO, 2023).
These results confirm that the expansion of the cultivated area and the persistence of production are the main drivers of growth in the sector, in accordance with SAGARPA (2017).
On the contrary, negative effects were identified for the real price of bananas (LREALPBAN= -0.2363; p = 0.0145), the international price of coffee (LIP = -0.0275; p = 0.0188), and the three-period lagged price (LRPC-LAG3= -0.2690; p = 0.0187), which suggests contractionary responses of supply to price increases, typical behavior of perennial crops.
The international price coefficient (LIP) was negative and significant, contrary to the classical theory of supply. This may be due to imperfect price transmission, adjustment lags in perennial crops or interrelationships between variables. Therefore, it should be interpreted as evidence conditioned by the model and not as a structural relationship. Factors such as high multicollinearity (VIF > 10) and possible endogeneity between production and price under OLS may explain this result.
Variance inflation (VIF) values indicate the presence of high multicollinearity among some explanatory variables, particularly in LHA (VIF= 26.3). This level cannot be considered moderate, but high, which implies possible effects on the stability of the individual coefficients and extension of their standard errors. Nonetheless, multicollinearity does not bias OLS estimators; rather, it affects their accuracy.
Given that the estimated signs are economically coherent in most cases and that the overall significance of the model is high (F= 207.72, 0.001), it was decided to maintain the original specification for theoretical consistency.
Moreover, in open agricultural markets, the transmission of international prices to producers is neither immediate nor complete, which can distort the contemporary relationship. Consequently, the negative sign should be interpreted with caution, as a statistical relationship conditioned by the dynamic specification and structure of the market, and not as a contradiction of supply theory.
Likewise, marginally significant variables were observed, such as the real price of fertilizer (LREALPUF= 0.0821; p = 0.058) and the five-period lagged price (LRPC-LAG5= -0.1633; p = 0.0967), which provide additional information on production expectations.
The coefficient of the real price of fertilizer (LREALPUF) was positive and marginally significant, contrary to what was theoretically expected. This may reflect a correlation with phases of agricultural expansion rather than a causal effect, where higher input prices coincide with higher demand and production. In addition, the absence of variables on technification or regional segmentation limits interpretation, so the result should be taken with caution and not as a positive structural relationship.
Since the variables were expressed in natural logarithms, the coefficients represent elasticities. In this sense, a 1% increase in the harvested area raises production by 1.68%, which confirms its structural role as the main engine of coffee growth. The price of bananas showed a negative relationship with production, indicating competition for productive resources between the two crops in tropical areas, where farmers can partially substitute coffee with bananas in response to profitability increases (Tomek and Kaiser, 2014).
The price of fertilizer exhibited a positive, albeit marginal, elasticity, associated with an intensification of production in areas with greater technification. This behavior aligns with what was reported by FAO (2023), pointing out that increases in input costs do not necessarily reduce production, as some producers compensate through technological improvements or greater efficiency.
For its part, the international price of coffee had a negative, low-magnitude effect, which shows partial price transmission to the domestic market due to the presence of intermediaries and local contracts that limit producers’ responsiveness to international quotations (Valencia, 2023).
Lagged prices showed a mixed pattern: positive effects in the two- and four-period lags and negative effects in three- and five-period lags. This behavior reveals that producers adjust their decisions according to past prices, generating intertemporal cycles of expansion and contraction characteristic of the coffee market.
Finally, lagged production (LP-LAG1= 0.6119) showed a positive and highly significant elasticity, confirming the existence of temporary persistence or productive inertia. This reflects that current production levels depend on previous volumes, a phenomenon typical of perennial crops, where the renewal of plantations occurs gradually (Rendón, 2016; CENICAFE, 2022).
Overall, the results confirm that coffee production in Mexico responds mainly to structural factors, such as cultivated area and productive inertia, rather than to price stimuli. The coffee supply shows rigidity in the face of market variations, adapting slowly to economic conditions.
This behavior coincides with the empirical evidence reported by García (2004); Martínez and Salinas (2002); Terrones-Cordero and Sánchez-Torres (2013), who highlight the dependence of production on physical factors and the low elasticity with respect to international prices.
The results obtained using ordinary least squares indicate that relative prices, input costs and production lags are statistically significantly associated with Mexican coffee production during the 1980-2024 period. The estimated model shows high explanatory capacity (R²= 0.974) and global significance, suggesting internal coherence in the adopted specification.
Methodological limitations associated with high multicollinearity in some explanatory variables and the absence of strict normality of the residuals are identified, so the results should be interpreted with caution. In this sense, the findings constitute empirical evidence consistent with the theoretical framework of agricultural supply under the proposed dynamic structure; however, their scope is descriptive-conditional and does not imply definitive structural validation of the economic relations analyzed.