All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.
Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.
Thanks to the authors for their efforts to improve the article. I believe it can be accepted now.
[# PeerJ Staff Note - this decision was reviewed and approved by Shawn Gomez, a PeerJ Section Editor covering this Section #]
The authors have carefully addressed my comments.
The authors have carefully addressed my comments.
The authors have carefully addressed my comments.
Please carefully consider the reviewers' comments and revise the paper accordingly. Then it will be evaluated again.
**PeerJ Staff Note:** Please ensure that all review and editorial comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.
**Language Note:** The review process has identified that the English language must be improved. PeerJ can provide language editing services - please contact us at [email protected] for pricing (be sure to provide your manuscript number and title). Alternatively, you should make your own arrangements to improve the language quality and provide details in your response letter. – PeerJ Staff
1. Describe the datasets used for training, including their sources, characteristics, and relevance to the study.
2. Provide insights into the quality and quantity of the datasets. Are they sufficient for robust model training, or do they present limitations such as bias, sparsity, or incompleteness?
3. While the study compares different machine learning models, integrating physics-based models could reduce the need for extensive data while improving the interpretability and generalizability of results. Consider discussing the feasibility of incorporating existing physics-based models. Refer to the following paper on hybrid modeling to understand its potential application and discuss how it might benefit this study: Hybrid modeling of first-principles and machine learning: A step-by-step tutorial review for practical implementation
4. Have the authors considered the extrapolation capability of the developed model? How was this tested, and how different were the testing conditions compared to those used in training? Providing insights into the model's robustness outside the training domain would strengthen the analysis.
5. Transformer models have been extensively applied to time-series data across various systems. For a more comprehensive literature review, discuss additional relevant studies on transformer-based models and their application to similar problems.
NA
NA
Clear and Unambiguous English Usage: The manuscript is written in generally clear and professional English. However, some sections—particularly in the Introduction and Results—could benefit from more concise language and improved transitions between ideas. Minor grammatical errors and redundant phrases occasionally affect readability, and a careful editorial pass is recommended to enhance clarity.
Clarity: The abstract could be improved by explicitly stating the most significant performance metrics or gains to better reflect the contribution.
Language and Readability: While generally clear, several paragraphs could benefit from more concise writing and improved flow. Redundant or overly technical phrasing—particularly in the Introduction and Results—should be refined for clarity.
Introduction and Background – Context and Motivation: The Introduction provides a relevant overview of the growing importance of AI-driven weather forecasting, citing both foundational and recent literature. The motivation for the study is outlined—comparing transformer and RNN architectures in weather time series forecasting—but could be made more explicit. The specific research gap (e.g., lack of head-to-head model comparisons on real-world weather station data) is only briefly implied and should be more clearly stated.
Literature – Well Referenced and Relevant: The literature review is current and relevant, citing works up to 2024. Foundational models (e.g., Vaswani et al., 2017 for transformers; Graves, 2012 for RNNs) are properly referenced alongside more recent advancements such as Informer, FEDformer, and PatchTST. However, the section leans toward a listing of models and tools rather than an integrated synthesis. A more analytical comparison of prior work—highlighting trends, trade-offs, and limitations—would better frame the paper’s contribution. Additionally, the citation pool could be more diverse, as a few authors are cited repeatedly.
Specific Suggestions for the Literature Review:
1. Strengthen the analytical tone: Compare transformer and RNN models in terms of their empirical performance, data requirements, and strengths in different forecast horizons.
2. Clarify motivation and novelty: Clearly state the research gap—few studies compare these models head-to-head using real-world, single-station data within the same framework.
3. Diversify citations: A few authors (e.g., Stefenon, dos Santos) are cited frequently. Additional perspectives from other groups working on benchmark transformer models would enhance balance.
4. Tool justification: The use of the NeuralForecast library is appropriate but could benefit from a critical evaluation—are there constraints in using this framework for performance benchmarking?
Lines 26–41]
Comment:
The introduction outlines the relevance of neural networks in weather forecasting but would benefit from a more analytical tone. Consider synthesizing key findings from cited works rather than listing them. For example, how do GraphCast and Pangu-Weather differ in terms of input structure or forecast range?
[Lines 42–51]
Comment:
The authors explain the transformer architecture well but could contrast its strengths and limitations with RNNs more explicitly. For example, mention that RNNs may perform better with smaller datasets or short-range tasks due to their simplicity.
[Line 57–68]
Comment:
While this section compares RNNs and transformers, it reads more like a textbook overview. Consider integrating empirical insights from cited studies (e.g., performance metrics or dataset characteristics) to show how these models perform differently in real-world applications.
[Lines 69–84]
Comment:
The motivation for using the NeuralForecast library is solid, but more critical evaluation is needed. Are there any limitations or biases in this toolkit that might affect the comparison of models?
[Lines 85–98]
Comment:
The transition into the study’s purpose is clear, but this section would benefit from explicitly identifying the gap in prior work. For example, many studies evaluate these models in isolation — few compare them head-to-head on local weather station data. Make this distinction clearer to emphasize the novelty.
Citation Diversity:
Multiple references (e.g., Stefenon et al., dos Santos et al.) appear several times throughout the LR. While relevant, try to diversify the sources to reflect a broader consensus in the field.
Missing References Suggestion:
Consider adding recent benchmark studies in weather forecasting with transformers, such as:
Lam et al. (2022) on GraphCast
Bi et al. (2022) on Pangu-Weather
Wu et al. (2023) on Autoformer vs. Informer comparisons.
This would place your contribution in stronger context with current leading models.
Structure and Formatting – PeerJ and Discipline Norms: The manuscript follows the expected structural conventions for PeerJ Computer Science and the machine learning discipline. It includes standard sections (Abstract, Introduction, Methods, Results, Discussion, Conclusion), with figures and tables appropriately placed and labeled. No major deviations from format are noted. However, figure captions could be more descriptive, and some tables (e.g., Table 2 and Table 3) would benefit from clearer integration and explanation within the narrative.
Formal Definitions and Results: While this paper is empirical in nature and does not contain formal mathematical theorems or proofs, it defines all key modeling terms, architectures, and metrics with sufficient clarity. The methodological descriptions, including forecasting types (recursive vs. direct), model configurations, and error metrics (e.g., RMSE, RMSPE), are generally well presented. A table summarizing model configurations is included and helpful, though a clearer explanation of hyperparameter tuning strategies would improve transparency.
Hyperparameter tuning: The manuscript mentions “automatic hyperparameter adjustment” but lacks details on the tuning process (e.g., grid search, Bayesian optimization). This is important given the complexity of transformer models.
Cross-validation setup: The paper states that k-fold cross-validation was used, but doesn't clarify how temporal dependencies were respected. Time-based validation (e.g., walk-forward) may be more appropriate for time series data.
Variable selection: While air temperature was chosen as the target, further rationale on why other strongly correlated variables (like max/min temperature or radiation) weren't used in multivariate setups could be helpful.
Statistical significance: The paper does not report confidence intervals or statistical tests (e.g., Wilcoxon signed-rank test) to validate the significance of observed differences. Adding these would greatly strengthen the comparative claims.
Overfitting control: Although early stopping is mentioned, there is limited discussion of how overfitting was evaluated, especially on smaller models vs. larger transformer architectures.
Generality: The conclusions are drawn from data collected at a single station. While this is acknowledged, a more detailed discussion on generalizability (e.g., across regions or climates) is encouraged.
Refine the literature review to make it more comparative and critical.
Clarify and elaborate on experimental details—especially tuning strategies and validation protocols.
Consider running significance tests to validate key model differences.
Explicitly discuss the scalability of results to different regions or time series domains.
All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.