Understanding the User ID in Google Analytics 4
At first glance the User ID seems simple enough, but it can get a bit complex when you start pulling reports. The purpose of this post is to explain how the User ID is used to generate reports. If you need help installing the User ID and making sure that it works properly, take a look at my post on Properly Setting the User ID.
-
Migrated from ken-williams.com to dive.team and made some minor edits.
-
Migrated to new CMS and made minor updates.
User ID in Summary
The user ID is a unique identifier that you can pass to Google Analytics to identify an authenticated user. It is typically stored in a customer database and passed to Google Analytics through a data layer, and the user ID should be consistent across browsers and devices. To explain how the User ID works in Google Analytics 4, I’m going to break it down into two categories: data collection and reporting identity.
Data Collection
On the client side, Google Analytics data is persisted differently in a mobile app than it is on a website. On a website you must explicitly set the User ID with every event that fires, but in a mobile app the User ID automatically persists after it is set once. In this way, mobile apps treat the User ID like a user-scoped custom dimension which will continue to fire with all events after it has been set.
Reporting Identity
In Google Analytics 4, "Reporting Identity" refers to the user identifier(s) that are used to generate reports. You have an option to configure this under your property settings (I've written more about what this means HERE), but what is relevant to this article is to understand that Google prefers to use the user_id to identify users because it is considered to be the most reliable.
All events that are generated for a user (with a User ID or not) will include a "user pseudo ID" (sometimes this is displayed as the "Device ID" or "App-Instance ID" in reports). On the web, this is supplied by a first-party cookie, and was known as the "client ID" in previous versions of Google Analytics. For Android and iOS apps, this is set to the App-Instance ID.
Stitching the User ID and User Pseudo ID
If you prefer a video, check out this clip from our course Prep Google Analytics Data for Reporting in BigQuery where we discuss much of the same information provided in this post, but we also explain how to build an identity graph for more sophisticated identity stitching in BigQuery.
If the user lands on a website unauthenticated, and then authenticates and begins passing a User ID in a later event, Google Analytics 4 will use the User Pseudo ID to attribute the User ID to previous events where the User ID was not set.
In the example above, the User Explorer report will only display one “Effective user ID” (which is the identifier that us used to create the report) set to “12345” for both of these events. BUT, if more events are detected by the same User Pseudo ID without the User ID set then these will not be attributed to the User ID.
In this example, the User Explorer report will show you two Effective user IDs: “12345”, and “abc”.
This next part can be counterintuitive: The first two events can be attributed to both of these App-instance IDs. If you drill down into “12345” you will see two events, but if you drill down into “abc” you will see all 3 events.
Using the User ID Across Sessions & Streams
Let’s walk through a more detailed example. Say that a user follows these steps to create 4 events over 3 unique sessions:
The user opens your iOS app and views the home screen without being authenticated.
The user authenticates and views a second screen (User ID is set to “ItsMyNewDevice”).
The user closes the app and returns 2 hours later without authenticating again (no User ID is set).
Finally, the same user opens your website where she is already authenticated and views the homepage (User ID is set again).
Here’s an approximation of what these raw events will look like in BigQuery:
Event Number | Session Number | Event Name | Platform | User_ID | User Pseudo ID |
---|---|---|---|---|---|
1 | 1 | Home_Screen_View | iOS | null | 11111 |
2 | 1 | Second_Screen_View | iOS | ItsMyNewDevice | 11111 |
3 | 2 | Home_Screen_View | iOS | ItsMyNewDevice | 11111 |
4 | 3 | Home_Page_View | WEB | ItsMyNewDevice | 2222.3333 |
Data Collection
First, let's look at how the data is collected. There are two things to notice in this chart:
The User Pseudo ID is different on each device. Again, you can learn more about the various identifiers in my post on Setting the User ID, but this is because the User Pseudo ID is really a device identifier, and not a user identifier.
The User ID will persist across sessions on a mobile app automatically. As you recall, the User ID was not set in event #3, but it still appears in the data because it persisted on the mobile device. My recommended best practice is still to set the user ID once per authenticated session, but as long as the data is not deleted the User ID will actually persist in this way.
Reporting Identity
Now let's look at how GA4 will use these identifiers to generate your reports. The question is: How many users will the example scenario create in my reports?
Standard Reports
The standard reports will all show 1 single user. You can verify this by creating a test property and firing only those example events for a single date.
Exploration Report
If you open a "User Explorer" report, you will see one single Effective User ID listed (it says "App-Instance ID" in the screenshot below because Google changed the name in the UI).
For testing, you can copy the user_id to a user scoped custom dimension (I call it "uid" in the upcoming screenshot), but do not deploy this to your live users or your reports will be subject to high cardinality issues.
If you do this and then click on the Effective user ID, you will see a list of events generated by this user. The "uid" User Property will tell you if the user_id was set with this event. You can see in the screenshot below that it was not set on my initial events.
So, this confirms that Google Analytics 4 was smart enough to attribute the unauthenticated events to the correct user_id, despite the fact that the user_id was not set with the first few events.
You can also apply "UID" as a dimension to the User Explorer view. In the screenshot below you can see that during my test I sent 6 events without the user_id, and then 3 with the user_id in a single session. The "Effective user ID" was correctly backwards applied to the first 6 events as we saw before, but you can also see that only 1 session was counted.
WARNING |
---|
If you replicate this test you might be tempted to create a filter for your user_id to find yourself, but you cannot do this. This would filter out all events where the User ID was Null, and they will not appear in the detailed User Activity report. |