Standardising Google Analytics data for improved analysis
Elina Kiukas
Senior Analytics SpecialistHave you ever tried performing analysis and building visualizations on or combining your web analytics data only to realize that the data is messy, confusing and needs a lot of transformation in order to be usable? Data transformations are often unavoidable but we can reduce the time associated with them by designing the data collection right. In this blog post I’ll share some of the benefits of standardising your Google Analytics data and go through topics to take into consideration before deploying new measurement.
Usable data is accurate, consistent and accessible. It is measured by its reliability and ease of analysis. A while back my colleague Chris wrote a great post about the importance of ensuring data accuracy and reliability in Google Analytics. While accuracy is the foundation for analysis you may also have found out that your data is messy with different naming conventions or the same data exists in different formats and is tracked with different identification and indexing.
Incoherent and hard-to-use data often appears in Google Analytics as a result of setups being built-up over time without real commitment, and multiple parties being involved in implementation without common documentation and processes in place.
So what can you do to make the lives of your data engineers, analysts, or anyone else who needs to make sense of your Google Analytics data much much easier, while potentially also unlocking new analysis potential? Read on about why standardisation is important and what measures to take to standardise and to support standardisation.
Why standardise your data?
There are many more advanced topics to web analytics than data standardisation. However, before getting to the funkier stuff, it is important to make sure that your data is of high quality and highly usable. Standardisation will help your teams to better and quicker understand and handle data to unlock those cool analyses you want to perform.
Standardisation is especially important when you are managing multiple properties across markets, brands or applications (web & app) but it is no less useful when talking about only one view or one property with multiple views. Here are a few reasons why:
- While analyzing markets and brands separately is necessary, you also want to make it possible to generate insights across all of them. Especially in the Nordics where data volumes are smaller than in many other countries, aggregating data across markets may unlock new analysis potential.
- In order to do meaningful comparisons you need data that is comparable.
- You want to save time and effort in understanding and handling your data and reduce transformations needed to combine it and to get it in comparable format.
- Perhaps you have other systems that Google Analytics data needs to be able to sync with which requires data to be in a certain format.
In practice the need for standardised data is often experienced when creating a roll-up property in Google Analytics, when creating a dashboard template in Google Datastudio (or a similar tool) or when creating any type of data export from Google Analytics. All data extraction and combination in these situations rely on naming conventions, formatting and indexing of data.
Below I’ll go through what should be taken into account when making sure Google Analytics data and setups are aggregable and comparable. I’ll also discuss some measures to support standardisation in an organization.
What measures to take in Google Analytics?
Use coherent naming conventions
Good naming conventions are coherent and descriptive. They help users in understanding, organizing and searching data. Templates help future implementations following the same naming logic.
- Create a naming convention plan and template for your audiences, events, campaign tags, custom parameters, channel and content groupings and others.
- For audiences specify at least the brand/market, audience condition and duration. These Google support articles describe the anatomy of naming the event fields and the campaign parameters.
- If you’re handling both websites and apps you may want to try out the recently launched Google Analytics App + Web property that unifies event and parameter measurement between the two.
Be wary of data formats
What may complicate analysis is when the same data exists in multiple letter cases and languages, or your sales data is in different currencies.
- On the view level, use the letter case filters available in Google Analytics for event fields, campaign parameters, ecommerce data, internal search terms, page URLs and many more, which transform all values into the same case.
- Track important fields, such as product names and categories, in English, or both in English and a local language in order to combine them across markets.
- If it’s important to compare sales but also have sales in local currencies, you may use calculated metrics in the GA UI to retrieve a secondary currency or you can send a secondary currency in the local currency dimensions and metrics in your transaction hit.
Pay attention to data indexes
Goals and custom dimensions & metrics are queried with index numbers at export via the Google Analytics API. Goals also rely on index numbers in Data Studio templates. Say you want to use a goal completion of newsletter subscription in your Data Studio template that allows you to apply data from multiple GA views. As in the below example, the same goal, however, is in index 2 in the Finnish view and in index 1 in the Swedish view. In a Data Studio template you will need two different fields for them and to remember that goal index 1 in the Finnish view stands for another goal than newsletter subscription.
- To reduce the chances of accidentally combining data that does not belong together use the same indexes for the same data across your setup.
Use settings that enable aggregating and splitting data flexibly
When planning your setup, make use of standard and custom dimensions that enable you to both aggregate and split your data flexibly.
- Design your event fields (category, action, label) and your campaign tagging (source, medium, campaign, term, content) so that you are able to both aggregate similar data together as well as go more granular in your analysis, for example from channel to ad level in your campaign analysis.
- Track custom dimensions that enable you to do the same. For example page type (home, product, article, register etc.) help you to aggregate pageviews. On the other hand, e.g. identifiers for the market/brand enable you to filter data in your aggregated datasets.
- Also collect custom dimensions that enable you to parse data on visitor, session and hit level: GA client ID, session ID and hit timestamp as discussed in this blog post by Simo Ahava
- Take advantage of channel and content groupings to aggregate your traffic sources and page URLs, while keeping the raw source and page level data available.
- Prevent your pageviews of the same page splitting across multiple rows due to unnecessary URL parameters (e.g. precisdigital.com vs. precisdigital.com?random_id=123) by removing those parameters in the view settings. If you need those parameters for analysis purposes, use content grouping to aggregate those pageviews.
How to support standardisation?
Create a measurement plan and make sure someone owns it
A measurement plan is a written documentation of data collection that lays out what is being tracked and specifies in which format tracking takes place and how it is named. A measurement plan document can also function as a template and reference for future setups. When creating a measurement plan, the needs of different markets, brands, teams and other stakeholders should be mapped as well as possible in order to create a setup that serves most needs in an efficient manner. Other benefits of documenting a tracking setup is that users can refer back to it to understand the setup and new people are easier to onboard to the tracking. Assign a person or a team that is responsible for owning, executing, updating and communicating the measurement plan in the organization.
Accommodate for domain specific customizations
Drawbacks of standardisation may be the lesser considerations for local and domain specific reporting and analysis needs as well as inflexible and slow implementation of those.
In order to accommodate for both the cross property/view standardisation as well as for customization, you can leave space for custom settings in the measurement plan. For example, reserve the first 15 goals for company-wide settings and the rest 5 for view specific ones.
Make use of tools for bulk implementation
In Google Analytics interface some setups such as audiences, goals and channel groupings have a sharing function which enables you to copy a certain set of, for example, audiences and apply them to another property. Furthermore, you have the possibility to create scripts to upload settings at bulk to properties/views via the Google Analytics API. Scripts can also be used for custom dimensions and metrics which don’t have the built-in sharing setting.
On the data collection side Google Tag Manager comes in handy in implementing tracking in bulk. Ideally you can use a single GTM container (or as few containers as possible) on all of your domains when the measurement is standardised. Not only will this help to ensure data standardisation is maintained across domains but it also speeds up the implementation process for any new tags.
Communicate the value of following standards
Following standards may feel boring or restricting to users and topics such as naming conventions and data indexes may seem very nitty gritty. Help everyone involved in collecting data, building Google Analytics settings and tagging campaigns understand the benefits of standardisation and how the details contribute to the larger picture in using data.
Moving on
There are definitely more things you can and should take into account when standardising your Google Analytics data but above are the ones that I think are the most essential. As the first next steps, I recommend the following:
- Audit your current tracking setup with the above in mind and see if you have messy data issues that make data comparisons, aggregations or integrations hard.
- Make sure your organization has a person or a team that is responsible for measurement.
- Start mapping and documenting requirements on what you should be able to do with your data. These you can later turn into technical requirements in a measurement plan.