Understanding The Dynamics of Air Quality in Greece in Conjunction with The Environmental and Meteorological Indicators With AI

March 18, 2025

2284

Abstract

Historically, Greece has faced a harmful presence of air pollutants in several of its city settings, partly due to novice methods of air pollution measurement and prevention systems. In this study, the Autoregressive Integrated Moving Average (ARIMA) and Facebook Prophet were used to predict the long-term air pollution in Athens, Greece. This study focuses solely on ozone or O3 for air pollutant prediction. Current air pollution forecasting methods mostly require large, updating databases or complex meteorological parameters, making forecasting challenging. A dataset used in this study containing three years of air quality data was sourced from a dataset published for the 2024 IEEE International Geoscience and Remote Sensing Symposium(IEEE IGARSS-2024). To account for these issues, this study explores and tests the value of meteorological parameters in the form of exogenous variables. ARIMA and Prophet produced RMSEs of 3.3 μg/m3 and 2.7 μg/m3, MAEs of 2.8 μg/m3 and 2.3 μg/m3, and MAPEs of 5.4% and 4.6% respectively. and with the best pre-processing settings, respectively. With the addition of exogenous meteorological variables, the best ARIMA and Prophet produced RMSEs of 3.9 μg/m3 and 1.2 μg/m3, MAEs of 3.4 μg/m3 and 1.0 μg/m3, and MAPEs of 6.6% and 1.9% respectively with the best pre-processing settings. In summary, relative to the respective RMSE, ARIMA, and Prophet had a marginal improvement that did not justify the increased data requirement. For future applications of time series forecasting, models without the need for meteorological parameters are recommended.

Introduction

Air pollution is the presence of toxic chemicals and compounds harmful to human health in a given environment. Since 2021, air pollution has been the second leading risk factor of death, with a reported 8.1 million deaths attributed to air pollution in 2021 alone¹. In Greece, air pollution has already made a sizeable impact on the health of its residents. On average, Greek citizens are expected to have a 0.67-year decrease in lifespan² due to air pollution levels exceeding World Health Organization (WHO) standards³. In recent years, Greece has also faced wildfires that, in addition to health risks, contribute to increased air pollutants like PM2.5, PM10, O3, and CO2. Amidst the risk of wildfires and other natural disasters, more effective detection methods must be introduced.

The monitoring and forecasting of air pollution have been heavily studied in the past decade. Nevertheless, current monitoring and prediction models have major flaws that limit their scalability in more diverse regions. Models like the United States Environmental Protection Agency’s Community Multiscale Air Quality Modelling System (CMAQ) or the Weather Research and Forecasting (WRF) rely on an extensive source list that requires frequent updates to accurately model air pollution, rendering regions that have little access to sufficient monitoring stations unable to implement these models. In addition, statistical approaches such as CMAQ cannot accurately model air pollutants in certain regions due to complex topographical characteristics such as terrain and altitude influencing wind speed, generating bias in concentrations across regions and inaccurate measurement of air pollutant data⁴. These limitations have prompted several studies and projects applying machine learning to forecast air pollution as they can predict values using exclusively historical data, hence eliminating the need for a constantly updating database. Studies have found that deep neural network models like the Long-Short Term Memory model (LSTM)⁵ ,⁶produced accurate long-term and short-term air quality predictions. However, some deep-learning-based models still require real-time air pollutant and meteorological data. Microsoft established the Urban Air project in 2012, which uses machine learning models to monitor and forecast AQI up to 48 hours in the future. However, this model relies on inferring from real-time and historical data⁷.

The objective of this study is to use simpler time-series models without an exhaustive list of hyperparameters and to predict the ozone with acceptable precision in the near future (8 months). We will explore the ARIMA and Prophet time-series models, including the effect of exogenous variables such as temperature, vegetation, and humidity.

Dataset

We used an air pollutant, meteorological, and environmental dataset in European cities (Angelis, 2024), specifically in Athens, Greece. The measurements were collected hourly from May 1st, 2020, to May 29th, 2023. However, we have resampled the data to an average daily measurement for this study. Out of the 19 columns of data provided by the dataset, we use thirteen columns which are: Wind-Speed (U), Wind-Speed (V), Dew-Point Temp, Soil Temp, Total Precipitation, Vegetation (High), Vegetation (Low), Temp, Relative Humidity, PM10, PM2.5, NO2, O3. Wind-Speed (U) provides the direction in which wind blows parallel to the longitude axis in meters per second (m/s). Wind-Speed (V) provides the direction in which wind blows parallel to the latitude axis in meters per second (m/s). Dewpoint Temp provides the temperature at which the air can hold no more water (water vapor). Temp provides the air temperature in Celsius (Cº). Vegetation (High) provides the measurement of High-level plant cover. Vegetation (Low) provides the measurement of Low-level plant cover. Soil Temp provides the average soil temperature in Celsius (Cº). Total Precipitation provides the flux of water equivalent (rain or snow) reaching the land surface (mm). Relative Humidity provides the actual amount of water vapor in the air compared to the total amount of vapor that can exist at its current temperature (%).

Exploratory Data Analysis

This section shows the pre-processing methods used for an empirical analysis of the Athens dataset. First, we smooth the daily resampled O3 data to make it usable for empirical analysis. Then, a time-series decomposition is performed to analyse the dataset’s trend and seasonality. Following this, a Pearson correlation matrix of the dataset is created to identify correlations with O3. Finally, we created several scatterplots to understand the relationship between O3 and other meteorological data.

First, we tested the daily resampled O3 data for stationarity using the Augmented Dickey-Fuller (ADF) test. It produced a statistic of -2.002546 with a p-value of 0.285458. The critical value was -3.436 at 1%, -2.864 at 5% and -2.568 at 10. With such values, it was concluded that the original resampled O3 data was not stationary. With a non-stationary dataset, a high p-value will be produced, indicating that the results are statistically insignificant and inaccurate. Since ARIMA models require the data to be stationary, this violates a key assumption for its application. As a result, additional preprocessing, such as differencing or smoothing, is necessary to ensure stationarity before ARIMA can be appropriately utilized. However, after smoothing daily resampled O3 data with a Weighted Moving Average (WMA) method, the test showed an ADF statistic of -5.217405 with a p-value of 0.000008. It had very similar critical values of -3.437 at 1%, -2.864 at 5%, and -2.568 at 10%. Hence, we concluded that the smoothed resampled O3 data was stationary.

Next, we decomposed our time-series O3 data to understand the trend and seasonality of the data. Figure 1 illustrates a time series decomposition of the resampled and smoothed O3 data to understand the trend and seasonality. We used the moving average with a window size of 90 days to smooth the O3. We tried the same time series decomposition on temperature, soil temperature, relative humidity, and high and low vegetation, achieving similar trends and seasonality. The graph shows a cycle of O3 every year or so, peaking between August to October.

**Figure 1.** Time series decomposition of O3 into trends, seasonal, and residual patterns.

Figure 2 illustrates the autocorrelation and partial autocorrelation of the resampled and smoothed O3 data to collect the number of days necessary for training the models. We found that the O3 autocorrelation was consistently above the confidence interval, which is the transparent blue area, at roughly 55 days and partial autocorrelation at roughly eight days. We tried this method for temperature, soil temperature, relative humidity, and high and low vegetation. They produced very similar numbers of days between 50 – 60 days and 6 – 10 days for autocorrelation and partial autocorrelation, respectively.

**Figure 2.** Autocorrelation and partial autocorrelation of daily resampled and smoothed O3 data

Figure 3 illustrates the correlation heatmap between numeric features. We noted strong negative correlations between relative humidity and O3 and NO2 and O3, while there are also strong positive correlations between temperature and O3 and soil temperature and O3. We chose to explore these solid positive and negative correlations further. Both high and low vegetation show a slight positive correlation to O3. Two features, Soil Temperature, and Dewpoint Temperature, exhibit highly non-linear relationships with Temperature, as indicated by their Pearson correlation coefficients of 0.90 and 0.66, respectively. These strong correlations suggest the potential presence of collinearity in the dataset.

**Figure 3.** Pearson correlation matrix between numeric features

Figure 4 illustrates the relationships between O3 and relative humidity, temperature, low and high vegetation. Figure 4a (top left) illustrates a strong negative correlation between O3 and relative humidity as well as temperature respectively. Figure 4b (top right) illustrates a positive correlation between O3 and temperature. It also shows how areas with temperatures ranging between 12ºC and 21ºC have the most vegetation. Figure 4c shows a slight positive correlation between O3 and vegetation. It also shows how areas with a vegetation index of 1.350 – 1.425 have the most O3 and the highest temperature. Figure 4d illustrates a relatively strong positive correlation between O3 and soil temperature. It also shows that soil temperature has a negative correlation with relative humidity. All figures underscore that higher humidity and more tropical areas have lower O3, but higher temperature environments generally have higher O3 and vegetation.

**Figure 4.** The relationship between O3 and (a) relative humidity, (b) temperature, (c) vegetation (High), (d) soil temperature

Methods and Models

Data Pre-processing

In this study, we built the models using the original resampled O3 data and several differently smoothed data. The data was smoothed with a Weighted Moving Average (WMA) with different smoothing window sizes. We took three window sizes: 30 days, 60 days, and 90 days. This smoothing is needed to create a stationary dataset for training while reducing noise and pronouncing the trend and seasonality further in predictions.

We implemented different machine learning time series approaches to model and predict O3 in the near future. The method used in all models is time series forecasting, which uses historical data to create predicted values for a set range; this study uses historical data from June 2021 to June 2023 for training. The last eight months (November 2022 – June 2023) are reserved for testing to compare observed and predicted data.

Models and Evaluation Metrics

We first used the Autoregressive Integrated Moving Average time series model (ARIMA). As noted in the model’s name, ARIMA can be split into three parts. The autoregressive aspect or A in ARIMA automatically considers each value from the dataset at its specific time value to create predictions. The integrated aspect denoted by I in ARIMA keeps the time series stationary, meaning that the mean, variance, and autocorrelation are constant over time. Lastly, the moving average aspect or MA helps model the error term by linearly combining each previous error term.

Moreover, when implementing ARIMA, the function takes three parameters: P, D, and Q. P denotes the order of the autoregressive part that we inferred directly from our empirical analysis in Figure 2 (a) with the autocorrelation. D is the number of differences the raw observations undergo which is required to make the time series stationary. In this study, there was very minimal differencing due to all of the data being stationary after smoothing. Q is the order of the moving average part that we inferred from Figure 2 (b) with the partial autocorrelation.

In this study, we used RMSE (Root Mean Square Error), MAE (Mean Absolute Error), and MAPE (Mean Absolute Percentage Error) to evaluate model performance in predicting air quality dynamics in Greece. RMSE was chosen for its sensitivity to large errors, which are critical in time series and environmental forecasting, while MAE provided an intuitive measure of average prediction accuracy. MAPE complemented these metrics by offering a percentage-based evaluation, allowing for consistent comparisons across varying scales of air quality data. Together, these metrics ensured a comprehensive and clear assessment of model reliability and accuracy.

Hyperparameter Tuning

We performed hyperparameter tuning, a method of testing a wide range of parameters from the training dataset to find the parameters that produce the most accurate model on all ARIMA models. It was performed using the values from the autocorrelation and partial autocorrelation in Figure 2 as the range of test values in a program that manually tests such graphs. Taking the last value above the confidence interval, we estimated a maximum ACF or P value of 50 and a maximum PACF or Q value of 10. The D value was either 1 or 0, depending on whether the smoothed or unsmoothed data was stationary. The output showed the MSE and RMSE values for all possible orders of the p, d, and q parameters and the predicted O3 values on plots. We took the order with the lowest RMSE as the best result from these outputs. In the graphs of each ARIMA model prediction, there are 4 distinct aspects of the model displayed. These are labelled as observed, train prediction, test prediction, and confidence interval respectively. The observed aspect, colored in blue, showcases the actual data in the form of a line plot graphed across the span of several years. The train prediction aspect, colored in red, showcases the model’s prediction of the training data. The test prediction aspect, colored in green, showcases the model’s average prediction of the test data. Both the train and test prediction lines are overlayed above the observed line to visualize the accuracy of each respective prediction. Lastly, the confidence interval aspect, denoted by the grey area, serves to give a range in which the model estimates the values to be.

Despite its compatibility with our chosen data, the ARIMA model still has several limitations. An ARIMA model requires that the data points remain stationary throughout the dataset without any change in the mean and variance of the data. Additionally, ARIMA assumes that the relationship between current and past values is linear. Most importantly, ARIMA does not consider the data’s seasonality or any external factors. This assumption does not account for the variables that can account for changes in the O3 values or the patterns they follow.

Hence, to account for ARIMA’s weakness, we also used ARIMAX as the second model to predict O3 in the near future. In addition to the process of ARIMA, ARIMAX uses exogenous parameters to make more accurate forecasts of our data. We chose the exogenous parameters that would be used by observing the relationships each variable in the entire dataset had with O3. Based on EDA, we narrowed the possible exogenous parameters to 4 specific features: temperature, soil temperature, vegetation, and relative humidity. Figure 4 demonstrates the visualization of these relationships.

However, ARIMAX still has a few limitations that are crucial to forecast accuracy. Similar to ARIMA, it did not account for seasonality. Naturally, we would have chosen a straightforward upgrade with SARIMAX; however, after several attempts with different implementation approaches, we found SARIMAX to be too inefficient in predicting seasonal results. Thus, for such reasons, we decided upon Facebook’s Prophet time series model as a good fit for the final model(s).

Prophet uses an additive decomposition model, basing predictions on trends, seasonality, and holidays. It accounts for the trend and seasonality to estimate the components in each observation. The trend component of Prophet relies on either a piecewise linear or logistic growth model. The linear model describes a constant rate of change over the time series with unbounded growth. In logistic models, there are set maximum (carrying capacity) and minimum values. Either model is applicable but the linear growth model was used in order to ensure Prophet could account for the multiple outliers in the O3 dataset. The seasonality component uses the Fourier series, a mathematical tool to present periodic functions as harmonic sinusoidal waves, in order to capture the complex patterns in the time series data. In our model, we set the seasonality frequency to Prophet’s defaults which are a yearly and weekly frequency. Lastly, the holidays component lets Prophet account for certain holidays that may affect the O3 levels on a certain day.

Similar to ARIMA, we performed hyperparameter tuning with different adjustable parameters according to Prophet. For the Prophet model, there are four adjustable parameters in tuning: the prior changepoint scale, which controls the flexibility of the trend; the prior seasonality scale, which controls the magnitude of the seasonal fluctuations; the prior holidays scale, which controls the impact that holidays have on the forecast; and the seasonality mode, which specifies whether the model is additive or multiplicative. Our final model implemented the use of exogenous parameters with Prophet like ARIMA. These parameters are also identical to the exogenous parameters of ARIMA.

Results and Discussion

Figure 5 illustrates the results of the four different smoothing windows we used for ARIMA models. Our ARIMA model forecast has been trained on the entire dataset of O3 except for the last eight months. We used November 2022 – June 2023 as the test set. Figure 5a illustrates the best ARIMA model for the resampled O3 data without any smoothing. We found that a model with 40 autoregressive terms (p) and three lagged forecast errors (q) produced the most accurate results. The model produced a Root Mean Squared Error (RMSE) of 13.4, a Mean Absolute Error (MAE) of 10.7, and a Mean Absolute Percentage Error (MAPE) of 25.6%. Figure 5 b – d illustrates the ARIMA model forecast for O3 data that has been smoothed with a window size of 30, 60, and 90, respectively. For the window size 30, we found that a model with eight autoregressive terms and six lagged forecast errors produced the most accurate results with an RMSE of 10.6, an MAE of 8.0, and a MAPE of 18.2%. For the window size 60, we found that a model with 12 autoregressive terms and 7 lagged forecast errors produced the most accurate results with an RMSE of 8.6, an MAE of 7.6, and a MAPE of 15.7%, Lastly, we found a model with 6 autoregressive terms and 6 lagged forecast errors produced the best results with an RMSE of 3.3, an MAE of 2.8, and a MAPE of 5.4%.

Through comparison of RMSE, MAE, and MAPE, an ARIMA model with a smoothing window size of 90 is clearly shown to be best. However, it is important to note that despite all of the RMSE, MAE, and MAPE values being small, all of the ARIMA models proved to be somewhat accurate. All graphs have the observed portions inside the forecast’s confidence interval; even the non-stationary model without smoothing does so quite well. Hence, we have concluded that ARIMA models produce the most accurate predictions when attempting to forecast future air pollutant levels for the smoothed O3 data with a window size of 90 days.

**Figure 5.** ARIMA models run through and test forecast of O3 (a) without smoothing, (b) with a smoothing window of 30 days, (c) O3 with a smoothing window of 60 days, (c) with a smoothing window of 90 days

Figure 6 illustrates the results of the four smoothing windows we used for ARIMAX models. Our ARIMAX model forecast relies on the entire dataset of O3 except for the last eight months (November 2022 – June 2023), as well as temperature, relative humidity, soil temperature, and vegetation as exogenous parameters. The outputted model is virtually the same as that produced in ARIMA. The first ARIMAX model, figure 6a, illustrates the most fitting ARIMA model for the resampled O3 data without smoothing. It is the most accurate model with 1 autoregressive term (p) and 2 lagged forecast errors (q). The graph has a Root Mean Squared Error (RMSE) of 12.3, an MAE of 10.5, and a MAPE of 25.1%. Figure 6 b – d illustrates the ARIMAX model forecast for O3 data that has been smoothed with a window size of 30, 60, and 90, respectively. For the window size 30, we found that a model with 9 autoregressive terms and 5 lagged forecast errors produced the most accurate graph with an RMSE of 5.8, an MAE of 4.0, and a MAPE of 9.4%. For the window size 60, we found that a model with 29 autoregressive terms and 10 lagged forecast errors produced the most accurate graph with an RMSE of 6.7, an MAE of 5.4, and a MAPE of 11.6%. Lastly, we found a model with 37 autoregressive terms and 9 lagged forecast errors that produced the best general results with an RMSE of 3.9, an MAE of 3.4, and a MAPE of 6.6%. Through comparison of RMSE, again, a smoothing window size of 90 is proven to be most accurate with ARIMAX. Again, it is important to note that despite the larger RMSE values, all of the ARIMAX models were somewhat accurate. All graphs have the actual data portions inside the forecast’s confidence interval, with the non-stationary model without smoothing doing so as well. Hence, we have concluded that, as a whole, the ARIMAX model produces reasonably accurate predictions when attempting to forecast future air pollutant levels for the next several months. In comparison to ARIMA, ARIMAX had a more accurate forecast for the unsmoothed, 30 and 60 window sizes. However, ARIMA was superior for the most accurate window size of 90 days. Overall, despite most of the values being more accurate for ARIMAX, the differences are somewhat marginal, with an average reduction of 1.8 in RMSE, 0.6 in MAE, and 2.8% in MAPE.

**Figure 6.** ARIMAX model train and test forecast of O3 (a) without smoothing, (b) with a smoothing window of 30 days, (c) O3 with a smoothing window of 60 days, (c) with a smoothing window of 90 days

Figure 7 illustrates the result of four different smoothing windows we used for Prophet models. Our Prophet model functions exactly the same as Figures 5, 6, and 8. It is trained with data for all but the last 8 months, which is then predicted. Figure 7 a) displays the best Prophet model achieved

through hyperparameter tuning; it has a prior changepoint scaling (c) and prior seasonality scaling (s) of 1.0, a prior holiday scaling (h) of 0.01, and a multiplicative mode for seasonality (m) which produced an RMSE of 11.2, an MAE of 10.8, and a MAPE of 23.8%. Figures 8 b – d show Prophet models with window sizes 30, 60, and 90 respectively. Figure 8 b) had a prior changepoint scaling of 0.01, a prior seasonality scaling of 0.1, a prior holiday scaling of 0.01, and an additive mode for seasonality, producing an RMSE of 5.2, an MAE of 4.5, and a MAPE of 9.0%. Figure 8 c) had a prior changepoint scaling of 0.01, a prior seasonality scaling of 10.0, a prior holiday scaling of 0.01, and an additive mode for seasonality, producing an RMSE of 3.6, an MAE of 3.1, and a MAPE of 6.2%. Figure 8 d) had a prior changepoint scaling of 0.01, a prior seasonality scaling of 0.1, a prior holiday scaling of 0.01, and an additive mode for seasonality with an RMSE of 2.7, an MAE of 2.3, and a MAPE of 4.6%.

Similar to the ARIMA / ARIMAX models, a smoothing window size of 90 provides the best result. Observing the Prophet graphs, we can see they are all, in varying degrees, accurate. In comparison to ARIMA, Prophet shows to be better suited in forecasting as all RMSE, MAE, and MAPE values from each window proved to be lower than ARIMA or ARIMAX. The superiority in comparison to the respective ARIMA methods stems from Prophet’s better adaption to strong and lengthier seasonal data. Moreover, Prophet takes into all the parameters that ARIMA does alongside additional parameters like “holidays”. Overall, our results show that Prophet is better suited to predicting O3 values.

**Figure 7.** Prophet model train and test forecast of O3 (a) without smoothing, (b) with smoothing window of 30 days, (c) O3 with smoothing window of 60 days, (c) with smoothing window of 90 days

Figure 8 illustrates the result of four different smoothing windows we used for Prophet models. Our Prophet model functions the same as Figures 5, 6, and 7. It is trained with data for all but the last eight months and predicts the values for those eight months. Moreover, all models in Figure 8 consider the same exogenous parameters as Figure 6. Figure 8 a) shows the best Prophet model achieved through hyperparameter tuning, with a prior changepoint scaling and prior seasonality scaling of 0.1, a prior holiday scaling of 0.01, and an additive mode for seasonality; it produced an RMSE of 10.8, an MAE of 8.8, and a MAPE of 19.2%. Figures 8 b – d show Prophet models with 30, 60, and 90 window sizes, respectively. Figure 8 b) had a prior changepoint scaling of 0.01, a prior seasonality scaling of 0.1, a prior holiday scaling of 0.01, and an additive mode for seasonality, producing an RMSE of 4.8, an MAE of 4.1, and a MAPE of 8.1%. Figure 8 c) had a prior changepoint scaling of 0.01, a prior seasonality scaling of 10.0, a prior holiday scaling of 0.01, and an additive mode for seasonality, producing an RMSE of 3.3, an MAE of 2.7, and a MAPE of 5.5%. Figure 8 d) had a prior changepoint scaling of 0.1, a prior seasonality scaling of 10.0, a prior holiday scaling of 0.01, and an additive mode for seasonality with an RMSE of 1.2, an MAE of 1.0, and a MAPE of 1.9%.

Like the rest of the models, a smoothing window size of 90 provides the best result for Prophet with exogenous parameters. Interestingly, the exogenous prophet graphs for Figures 8 b – d have large sections where the actual data does not fall in the confidence interval. Despite this, all exogenous Prophet models show a greater accuracy and lower RMSE than the regular Prophet model. Compared to the rest of the model’s results, Prophet with exogenous parameters exceeds each one with its accuracy. Again, this improvement can be attributed to including exogenous factors that affect the seasonality and forecasting from the model. Between the best parameters for Prophet, the model showed that the seasonality was most pronounced throughout each smoothing model and that holidays had little influence on the forecasting.

**Figure 8.** Prophet with exogenous parameters model train and test forecast of O3 (a) without smoothing, (b) with a smoothing window of 30 days, (c) O3 with a smoothing window of 60 days, (c) with a smoothing window of 90 days

Overall, the RMSE shows that Prophet was superior in predicting O3 values with and without exogenous parameters. Yet, the time taken to tune and produce the most accurate model was an important consideration for both models. Compared to its non-exogenous counterpart, ARIMAX mostly shows only a marginal improvement in RMSE compared to the RMSE of the ARIMA models. In ascending order, ARIMAX produced an 8.2%, 45.3%, and 22.1% decrease for the unsmoothed window, 30-day window, and 60-day window models, respectively. However, it produced an 18.1% increase for the 90-day window model. On average, the addition of exogenous parameters induced a 14.4% decrease in the RMSE of all models. This trend can also be observed when comparing Prophets with and without exogenous parameters. With a 3.5%, 7.7%, 8.3%, and 55.6% decrease in RMSE for Prophet with exogenous parameters compared to without, Prophet has an 18.8% decrease on average between exogenous and non-exogenous models. Though more consistent with improvements, Prophet shows a similar level of improvement in RMSE when implementing exogenous parameters.

In terms of Mean Absolute Error (MAE), the ARIMAX model demonstrated improvements over ARIMA with reductions of 1.87%, 50%, and 28.95% for the unsmoothed, 30-day window, and 60-day window models, respectively. However, in the case of the 90-day window (similar to RMSE comparisons), ARIMAX resulted in a 21.4% increase in MAE. In contrast, the Prophet model, which incorporated exogenous parameters, yielded reductions of 18.52%, 21.16%, 13.0%, and 67.75% for the unsmoothed, 30-day, 60-day, and 90-day window models, respectively. On average, ARIMAX achieved a 14.8% reduction in MAE compared to ARIMA, while Prophet achieved a 30.1% reduction compared to its non-exogenous counterpart.

Despite these findings, the marginal contributions of exogenous parameters in improving metrics like MAE and RMSE are relatively minor when compared to non-exogenous models. Prophet’s reductions, while noteworthy in terms of percentage of previous values, translate to only a 1–3% improvement relative to the scale of O3 concentrations (0–120). Given the extra computational resources required, the benefit of adding exogenous parameters is questionable for practical applications. This aligns with broader academic findings, which also indicate limited enhancements when including meteorological parameters in time series modelling.

One study, a 2023 analysis on the usage of integrated causal models in predicting air pollution and meteorological variables in Jakarta, found that the “integration” or usage of external variables on LSTM and Gated Recurrent Unit (GRU) saw significant improvements in some fronts and inconsistencies in others. Whilst the integrated models had significantly lower MAE and RMSE in predicting PM10, NO2, and SO2, they showed little to no improvement in predicting O3 values⁸.

Another study, published in 2024, analysed the impact of meteorological factors on predicting air pollution and found significant improvements through the use of external variables. By testing hourly predictions using the Random Forest model, they reported a 12.57% improvement in accuracy from 86.42% to 98.99%⁹.

Notably, a 2020 study exploring the usage of Facebook Prophet in South Korea claimed the use of external meteorological variables would worsen the accuracy of predictions⁴ ; although this does conflict with our findings, they agree that the implementation of meteorological parameters is unnecessary as it does not provide significant enough improvements in accuracy.

Prophet demonstrated superior performance compared to ARIMA/ARIMAX in this study, likely due to its inherent strengths in handling seasonality and modelling long-term trends. Given our dataset spanned 2 years and required predictions for an 8-month horizon, Prophet’s explicit treatment of seasonal components allowed it to capture recurring patterns in air quality data more effectively than ARIMA, which assumes fixed seasonal structures and is less flexible with such a dataset.

Model	Data Processing	Parameters	Test RMSE	Test MAE	Test MAPE
ARIMA	No smoothing	p = 40, d = 1, q = 3	13.4	10.7	25.6Z%
ARIMA	Smoothing with 30 day window	p = 8, d = 0, q = 3	10.6	8.0	18.2%
ARIMA	Smoothing with 60 day window	p = 12, d = 0, q = 7	8.6	7.6	15.7%
ARIMA	Smoothing with 90 day window	p = 6, d = 0, q = 6	3.3	2.8	5.4%
ARIMA + Exog	No smoothing	p = 1, d = 1, q = 2	12.3	10.5	25.1%
ARIMA + Exog	Smoothing with 30 day window	p = 9, d = 0, q = 5	5.8	4.0	9.4%
ARIMA + Exog	Smoothing with 60 day window	p = 29, d = 0, q = 10	6.7	5.4	11.6%
ARIMA + Exog	Smoothing with 90 day window	p = 37, d = 0, q = 9	3.9	3.4	6.6%
Prophet	No smoothing	c = 1.0, s = 1.0, h = 0.01, m = multiplicative	11.2	10.8	23.8%
Prophet	Smoothing with 30 day window	c = 0.01, s = 0.1, h = 0.01, m = additive	5.2	5.2	4.5%
Prophet	Smoothing with 60 day window	c = 0.01, s = 10.0, h = 0.01, m = additive	3.6	3.1	6.2%
Prophet	Smoothing with 90 day window	c = 0.01, s = 0.1, h = 0.01, m = additive	2.7	2.3	4.6%
Prophet + Exog	No smoothing	c = 0.1, s = 0.1, h = 0.01, m = additive	10.8	8.8	19.2%
Prophet + Exog	Smoothing with 30 day window	c = 0.01 s = 0.1 h = 0.01 m = additive	4.8	4.1	4.8%
Prophet + Exog	Smoothing with 60 day window	c = 0.01, s = 10.0, h = 0.01, m = additive	3.3	2.7	5.5%
Prophet + Exog	Smoothing with 90 day window	c = 0.1, s = 10.0, h = 0.01, m = additive	1.2	1.0	1.9%

Table 1. Summary of RMSE scores from different models with different pre-processing techniques

Conclusions

The threat of high ozone (O3) concentrations to human health highlights the importance of developing a reliable and efficient method to predict such concentrations. These forecasts support scientific and governmental bodies in monitoring the climate and developing effective environmental policies rooted in the predictions. This research study on ARIMA and Prophet models with and without exogenous variables non-smoothed and smoothed O3 data further explores the value of meteorological factors in creating more precise forecasting models. Out of the four models used, Prophet with exogenous parameters produced the most accurate results.

The ARIMA and Prophet models produced an RMSE of 3.3 μg/m3 and 2.7 μg/m3 with the best pre-processing settings, respectively. With the addition of exogenous meteorological variables, the best ARIMA and Prophet produced an RMSE of 3.9 μg/m3 and 1.2 μg/m3 with the best pre-processing settings, respectively. Compared to the non-exogenous counterparts, ARIMAX and Prophet with Exogenous parameters show an 18% reduction in RMSE and a 22.5% reduction in MAE compared to their non-exogenous counterparts. Therefore, despite the models producing more accurate models with exogenous parameters, many improvements were somewhat marginal compared to ARIMA models with and without exogenous parameters and Prophet models without exogenous parameters. Considering computational costs, data availability, and difficulty of collection, time series forecasting models like ARIMA and Prophet create solid models regardless of including exogenous parameters.

References

State of Global Air Report 2024 | State of Global Air. Health Effects Institute (2024). [↩]
The Air Quality Life Index (AQLI). Energy Policy Institute at the University of Chicago (EPIC) (2021). [↩]
What Are the WHO Air Quality Guidelines?. World Health Organization: WHO (2021). [↩]
Shen J., Valagolam D., McCalla S. Prophet Forecasting Model: A Machine Learning Approach to Predict the Concentration of Air Pollutants (PM2.5, PM10, O3, NO2, so2, CO) in Seoul, South Korea. PeerJ, vol. 8, pp. e9961–e9961 (2020). [↩] [↩]
Neo E., Hasikin K., Lai K., Mokhtar M., Azizan M., Hizaddin H., Razak S., Yanto. Artificial Intelligence-Assisted Air Quality Monitoring for Smart City Management. PeerJ Computer Science, vol. 9, pp. e1306–e1306 (2023). [↩]
Zhao Z., Wu J., Cai F., Zhang S., Wang Y. A Hybrid Deep Learning Framework for Air Quality Prediction with Spatial Autocorrelation during the COVID-19 Pandemic. Scientific Reports, vol. 13, no. 1 (2023). [↩]
Zheng Y., Yi X., Li M., Li R., Shan Z., Chang E., Li T. Forecasting Fine-Grained Air Quality Based on Big Data. Proceedings of the 21st SIGKDD Conference on Knowledge Discovery and Data Mining, KDD (2015). [↩]
Handhayani T., “An Integrated Analysis of Air Pollution and Meteorological Conditions in Jakarta.” Scientific Reports, vol. 13, no. 1 (2023). [↩]
Nurchaerani K., Faisal M., Kurniawan F., “Analysis of the Impact of Meteorological Factors on Predicting Air Quality in South Tangerang City Using Random Forest Method.” Applied Information System and Management (AISM), vol. 7, no. 2 (2024). [↩]

Understanding The Dynamics of Air Quality in Greece in Conjunction with The Environmental and Meteorological Indicators With AI

Abstract