Javascript is disabled in your browser. Please enable Javascript to view PeerJ.

Review History
Parametric art creation platform design based on visual delivery and multimedia data fusion

All reviews of published articles are made public. This includes manuscript files, peer review comments, author rebuttals and revised materials. Note: This was optional for articles submitted before 13 February 2023.

Peer reviewers are encouraged (but not required) to provide their names to the authors when submitting their peer review. If they agree to provide their name, then their personal profile page will reflect a public acknowledgment that they performed a review (even if the article is rejected). If the article is accepted, then reviewers who provided their name will be associated with the article itself.

View examples of open peer review.

Summary

The initial submission of this article was received on May 5th, 2025 and was peer-reviewed by 2 reviewers and the Academic Editor.
The Academic Editor made their initial decision on June 4th, 2025.
The first revision was submitted on July 19th, 2025 and was reviewed by 2 reviewers and the Academic Editor.
The article was Accepted by the Academic Editor on August 7th, 2025.

Version 0.2 (accepted)

Osama Sohaib · Aug 7, 2025 · Academic Editor

Accept

Both reviewers have confirmed that the authors have addressed their comments.

[# PeerJ Staff Note - this decision was reviewed and approved by Mehmet Cunkas, a PeerJ Section Editor covering this Section #]

Reviewer 1 · Jul 30, 2025

Basic reporting

no comment

Experimental design

no comment'

Validity of the findings

no comment'

Additional comments

no comment'

Cite this review as

Anonymous Reviewer (2025) Peer Review #1 of "Parametric art creation platform design based on visual delivery and multimedia data fusion (v0.2)". PeerJ Computer Science

Reviewer 2 · Jul 24, 2025

Basic reporting

Overall look after revision is better.

Experimental design

The revised version looks well and to be accepted

Validity of the findings

Paper looks more compact now.

Cite this review as

Anonymous Reviewer (2025) Peer Review #2 of "Parametric art creation platform design based on visual delivery and multimedia data fusion (v0.2)". PeerJ Computer Science

Download Version 0.2 (PDF) Download author's response letter - submitted Jul 19, 2025

Version 0.1 (original submission)

Osama Sohaib · Jun 4, 2025 · Academic Editor

Major Revisions

Please see both detailed reviews. The reviews highlight that while the manuscript presents a novel bimodal classification model (BTSCM) with a well-structured approach, it suffers from several technical weaknesses. Key issues include incomplete or ambiguous mathematical formulations (e.g., 3D convolution, LSTM gates, MFCC DCT), insufficient experimental details (e.g., dataset balancing, alignment protocols, ablation studies), and missing validation metrics (e.g., statistical significance, confusion matrices). The reviewers also note a lack of clarity in architectural choices (e.g., asymmetric pooling strides, DQN's role) and call for comparative analyses to justify design decisions. Addressing these gaps would strengthen the paper's reproducibility, validity, and impact.

**PeerJ Staff Note:** Please ensure that all review and editorial comments are addressed in a response letter and that any edits or clarifications mentioned in the letter are also inserted into the revised manuscript where appropriate.

**Language Note:** The review process has identified that the English language must be improved. PeerJ can provide language editing services - please contact us at [email protected] for pricing (be sure to provide your manuscript number and title). Alternatively, you should make your own arrangements to improve the language quality and provide details in your response letter. – PeerJ Staff

Reviewer 1 · Jun 3, 2025

Basic reporting

The proposed method has merit, but a significant revision is necessary to address technical ambiguities.
 The manuscript mentions various pooling strides across layers (e.g., temporal stride = 1, spatial = 2 for early layers). However, no ablation or analysis is provided to validate why this asymmetric sampling improves temporal sensitivity. A comparative test against uniform stride configurations (e.g., (2,2,2)) would be illuminating.

Experimental design

The equation includes undefined variables such as “the center frequency of the first Meier filter” — possibly a typographical error for “Mel filter.” Additionally, the precise construction of the triangular filter bank (e.g., number of bands, bandwidth overlap) is missing.
 The equation does not define variables like CnC_nCn, NNN, or the exact form of the DCT. Please provide a full expression of the DCT formula used (type-II or type-III), as this affects MFCC reproducibility and comparability with other work.
 The manuscript briefly states that MFCC features are synchronized with video frames, but omits the alignment protocol. Are MFCCs aggregated across fixed frame windows (e.g., 30 ms per frame)? How are unequal sampling rates between video (e.g., 25 fps) and audio (e.g., 16 kHz) resolved?

Validity of the findings

Equation (8) for 1D FCN is vague. How is this applied to temporal sequences? Are the convolutions over time, feature dimensions, or both?
 The notation does not match standard FCN terminology (e.g., use of kernel width, dilation). Please re-specify with complete dimensions.

Additional comments

No details are provided about the number of videos, duration per clip, annotation process, or class distribution. This makes it difficult to assess the generalizability or real-world applicability of the system to art platforms.
 The paper uses precision, recall, and F1-score but omits accuracy and class-wise confusion matrices, which are especially important for unbalanced or multi-class tasks.
 The model uses a 2-layer LSTM and FCN32s, but no explanation or alternatives are tested. Varying these architectures (e.g., FCN8s vs FCN32s) may yield insights into optimal depth/width trade-offs.

Cite this review as

Anonymous Reviewer (2025) Peer Review #1 of "Parametric art creation platform design based on visual delivery and multimedia data fusion (v0.1)". PeerJ Computer Science

Reviewer 2 · Jun 3, 2025

Basic reporting

The manuscript presents a novel Bimodal Time Series Classification Model (BTSCM) that integrates I3D, MFCC, FCN, LSTM, and DQN to perform video-based multimodal classification, particularly for art platform applications. The work is well-structured and combines state-of-the-art techniques in feature extraction and fusion. However, several technical weaknesses, ambiguities, and missing justifications reduce the strength of the contribution.
The 3D convolution formula is referenced but not fully expressed. Please explicitly define the indexing and tensor dimensions of inputs, weights, and outputs. Without a complete form, readers cannot reconstruct the implementation or understand the kernel’s temporal behavior.
Although Figure 1 is referenced, the exact inflation mechanics from 2D to 3D (i.e., filter reuse across time) are not specified. It would be valuable to clarify whether pre-trained 2D kernels are used and then inflated (as in Carreira & Zisserman, 2017), or if the model is trained from scratch.

Experimental design

Equations (10)–(12) omit several key terms. The forget gate and cell state equations, which are core to LSTM, are not provided. This reduces clarity regarding how the temporal states evolve.
It's unclear whether the LSTM operates on concatenated audio+visual embeddings or processes modalities sequentially.
The paper selects 50 actions from each dataset but does not explain the criteria or whether these subsets are balanced across classes. This could lead to bias in reported performance metrics.

Validity of the findings

The paper should also report statistical significance (e.g., standard deviation across runs or confidence intervals) to validate performance improvements.
Figure 9 suggests performance gains from DQN, but the mechanism remains unclear. What does the DQN “action” represent in this classification context? How does it enhance the LSTM’s memory updates?
Please provide numerical values or significance testing for the ablation experiments. Bar plots alone are insufficient.

Cite this review as

Anonymous Reviewer (2025) Peer Review #2 of "Parametric art creation platform design based on visual delivery and multimedia data fusion (v0.1)". PeerJ Computer Science

Download Original Submission (PDF) - submitted May 5, 2025

All text and materials provided via this peer-review history page are made available under a Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Review History Parametric art creation platform design based on visual delivery and multimedia data fusion

Summary

Version 0.2 (accepted)

Osama Sohaib · Aug 7, 2025 · Academic Editor

Reviewer 1 · Jul 30, 2025

Basic reporting

Experimental design

Validity of the findings

Additional comments

Reviewer 2 · Jul 24, 2025

Basic reporting

Experimental design

Validity of the findings

Version 0.1 (original submission)

Osama Sohaib · Jun 4, 2025 · Academic Editor

Reviewer 1 · Jun 3, 2025

Basic reporting

Experimental design

Validity of the findings

Additional comments

Reviewer 2 · Jun 3, 2025

Basic reporting

Experimental design

Validity of the findings

Review History
Parametric art creation platform design based on visual delivery and multimedia data fusion