Google’s documentation on data retention in GA4 can be kind of confusing. If you just search how long does ga4 store data you’ll get a featured snippet from a Google support page telling you it’s a maximum of 2 or 14 months - but that’s not the complete answer.
<div class="post-note">GA4 only stores data in exploration reports for 2 months (default) or 14 months (if you change it). But data used in GA4 standard reports is retained indefinitely.</div>
So why is GA4 data stored for different time periods for different types of reports?
According to Google’s data retention documentation:
<blockquote class="quote">The retention period applies to user-level and event-level data associated with cookies, user-identifiers (e.g., User-ID), and advertising identifiers (e.g., DoubleClick cookies, Android’s Advertising ID [AAID or AdID], Apple’s Identifier for Advertisers [IDFA]).</blockquote>
GA4 explorations use raw event and user-level data which includes those identifiers listed above, while the GA4 standard reports use daily aggregated tables. (Here’s a quick rundown on those data terms if you need a primer.)
The aggregated data used in standard reports has all the user-identifying information stripped out, so the data retention period doesn’t apply.
<div class="post-alert">No matter what your data retention settings are, a two-month retention limit applies to all user-level data that includes age, gender, and interests. But honestly, how reliable are those user-provided personal details anyway? I know I don’t fill in all the fields truthfully (or at all) when creating Google accounts for work and personal use.</div>
Why GA4 Limits Data Retention
Universal Analytics data retention settings let you select “do not automatically expire” for all your data. So why is GA4 much more limited? Two reasons:
- Compliance. Google added a lot more data privacy protection to the latest version of Analytics because of the EU’s relatively recent data protection law called General Data Protection Regulation (GDPR).
- Money. Data storage is expensive, and GA4 is a free platform. You can access user-level raw data for as long as you want if you pay Google to store it in BigQuery.
GDPR doesn’t say user-level data has to expire after 14 months, it just says data can only be retained for as long as it takes to “achieve the purpose for which the information was collected” and that companies have to “document and justify” how long they store personal data. This is obviously open to interpretation, so why is 14 months the limit in GA4? Why not 12 months, or 24?
Nobody knows! If you know, please tell me. Best guess is because with 14 months of data you can comfortably do ad hoc year-over-year reporting in explorations.
Since GA4 isn’t fully GDPR compliant anyway, putting limits on data retention seems like an opportunistic move to use burgeoning privacy compliance measures as a way to charge monthly fees for BigQuery data transfer and storage.
Maximize Raw Data Access in GA4
If a BigQuery subscription isn’t in the cards, there are things you can to to maximize your access to raw data from your GA4 property.
Max out data retention
<div class="post-action">Go to Admin > Data Settings (under the Property column) > Data Retention and select 14 months from the dropdown.</div>
Reset user data on new activity
While you’re in there, make sure “Reset user data on new activity” is toggled on. When this is enabled, the user identifier is reset each time that user initiates a new session. So if a user initiates a new session at least once every 14 months on your website, their user data doesn’t expire.
Export raw GA4 data regularly
You can export your raw events from GA4 to BigQuery - and from there you’re on your own, I’ve never used it.
You can also export raw data from GA4 explorations as a TSV or CSV file:
If you throw that data in a Google Sheet with the proper formatting you can use the sheet as a data source and visualize it in Looker Studio. That way you’d theoretically be able to compare raw data over multiple years, as long as you keep up with the exports. Choose a time period (one month, three months, six months) and set a reminder to export your data periodically so you don’t forget.
GA4 Data Thresholds
A related data availability topic is data thresholding in GA4. Thresholding is when GA4 won’t show you some data because the user count is so low that you might be able to infer the identity of individual users based on demographic information and other signals.
Because of thresholding, data can be withheld from both explorations and standard GA4 reports. GA4 data thresholds cannot be changed.
You can see if thresholding has been applied to your data because the data indicator icon changes to an orange triangle:
There are three instances when GA4 applies data thresholds:
- Google signals is enabled and the user count is low for the selected date range
- Your GA4 reporting identity relies on device ID and there aren’t enough total users
- Your exploration or standard report includes search queries and there aren’t enough total users
You’re more likely to encounter data thresholds when using a narrow date range, so try expanding the date range and see if that helps.
That’s the end of this article - below is a data terminology primer to provide context if needed. Keep reading about how to create and report on conversion events in GA4.
<div class="post-note-cute">If you need help with GA4 implementation, reporting audits, dashboard configuration, or if you have questions about anything analytics related, don't hesitate to reach out: <a href="mailto:firstname.lastname@example.org">email@example.com</a></div>
Data Jargon Explained
Here’s a quick rundown on the types of data mentioned in this article:
User-level data vs event-level data
User-level data is associated with a specific user and includes information about demographics (age, gender, location), preferences (communication, products, privacy) and more (recency, frequency, operating system).
Event-level data is associated with events (actions) that users take on a website or app and includes information about the type of event, when and where the event happened, and other details (such as session duration and conversions).
User-level data and event-level data are analyzed together to provide a complete picture of user interactions with web and app properties. After all, you can’t have events without users! Learn more about using GA4 sessions and views to inform SEO strategy.
In GA4, both explorations & standard reports use user-level and event-level data. The difference is that the data is raw in explorations and aggregated in standard reports. There are also differences between GA4 user metrics vs UA user metrics that are important to understand.
Raw data vs aggregated data
Raw data is unfiltered and unorganized. Raw data can include user-identifiable information like advertising identifiers and event timestamps for all user activities on your website.
Explorations use raw event and user-level data, which is why you can mix and match segments, dimensions, and metrics to analyze the data in custom ways.
Aggregated data can’t be tied to any individual because it’s made up of individual user data that has been combined (aggregated) at a high level. It does not include a user ID or timestamps for events. Aggregated data is basically a summary of user activities. For example, the Pages and screens report shows the average engagement time for each page, but it won’t show you how long an individual user spent on a page.
Standard GA4 reports use aggregated event and user-level data, which is why you are limited to each built-in report format. BUT you can customize GA4 detail reports as well as create brand-new detail reports from scratch - just one of the reasons why GA4 is better than UA!
Resources used to create this article: