Background. Digital marketing widely relies on Google Analytics 4 (GA4) and Google Tag Manager (GTM), yet the conditions under which event data are lost remain underexplored because reverse engineering of platform internals is prohibited. We aimed to characterize missing-data patterns in GA4 using only publicly observable components and a reproducible, policy-compliant methodology.
Methods. We built a bot-driven Three-Layer Monitoring System that simultaneously records (i) GTM-recognized data layer events, (ii) request payloads to GA4 collection endpoints, and (iii) GA4 user-interface (UI) counts. Experiments were run on Windows 10 with Google Chrome (Chromium v139) using a Python 3.10 bot implemented with Playwright. A minimal test page (HTML + GTM) generated click events under three interaction types—in-page (no navigation), same_tab (navigate in the same tab), and new_tab (navigate in a new tab)—and three dwell times (1s, 2s, 3s). Each condition was repeated 100 times (a total of 900). Inter-trial interval was 1s. GA4 key events and custom events were both measured. Page-view events within the same experiment were also evaluated.
Results. Distinct missing patterns were observed across the three layers, but no time-order inversions occurred (i.e., we did not see payload/GA4 present while the upstream data layer was absent). GA4 key events and custom events yielded identical counts despite being transmitted as separate payloads. Missingness was minimal at 2–3s dwell (≥90% retention across layers) but pronounced at 1s dwell, especially for in-page interactions; same_tab exhibited more loss than new_tab, particularly in payload and GA4 UI. Page-view events showed missingness comparable to click events within the same runs.
Conclusions. Short dwell times and interaction types that provide little processing window are primary drivers of GA4 event loss, consistent with timing constraints on tag execution and request dispatch. The proposed Three-Layer Monitoring System offers a reproducible, policy-compliant framework for evaluating measurement reliability without reverse engineering. Practically, designs should avoid near-immediate navigations or very short dwell times before critical events and ensure sufficient time for analytics requests to be sent. Limitations include a minimal page, a single OS/browser, and synthetic traffic; future work should test heavier pages, multiple devices/browsers, and controlled network/CPU conditions with formal statistical comparisons.
If you have any questions about submitting your review, please email us at [email protected].