A Python tool for formatting GA4 data to match and be backfilled with historical GA3 data in BigQuery.

Overview

GA3toGA4

A Python tool for formatting GA4 data to match and be backfilled with historical GA3 data in BigQuery.

Open In Colab

About

Welcome to GA3 to GA4 tool

⚠️ Warning: This tool should be considered alpha software and likely to break. It has been tested on a handful of sites and is shared as a proof-of-concept. Use at your own risk. While we will not support this open-source tool, please submit issues so we can make it better.

GA3 to GA4 is a tool for creating and filling a Bigquery table with available GA4 data and formatting it to be compatible with historical data pulled from the Google Analytics Reporting API v4 for GA3 data. This tool has been designed to run in google colab and to automate a majority of the process.

This notebook aims to provide 3 main functions

  1. Create table with select GA4 data formatted to GA3 data style.
  2. Set up a scheduled query for a daily recurring pull of new GA4 data.
  3. Backfill formatted GA4 data with historical GA3 data



Pulled GA3 & GA4 Data

GA3 GA4
ga:date date
ga:landingPagePath landing_page
ga:country country
ga:region region
ga:city city
ga:source utm_source
ga:medium utm_medium
ga:campaign utm_campaign
ga:users users
ga:newUsers new_users
ga:entrances entrances
ga:sessions sessions
ga:uniquePageviews unique_page_views
ga:timeOnPage engagment_time_sec_per_session
ga:conversion_event conversions
ga:transactionRevenue ecommerce_revenue
ga:transactions ecommerce_transactions

The table above shows what GA3 data is matched with what GA4 data by default. You can edit these values by changing ga4-to-ga3-query.sql and process_ga3.py.

Note: ga:conversion_event is only a placeholder for whatever GA3 conversion event you wish to pull.




How To Use

Google Cloud project set-up

This tool assumes that you have a Google Cloud project created already. This project will contain your from dataset (raw GA4 data from a daily export) and your to dataset (select GA4 data formatted to GA3 style) that will be created by this tool.

Ensure that BigQuery Daily Export is set up for your chosen GA4 account to your desired Google Cloud project and that you have a table with your raw GA4 data. If you don’t, follow the sets here to set it up. This should be set up without Streaming Export, and should just be a Daily Export.

Part 1: GA3 to GA4 SQL Workflow

Install BigQuery Data Transfer

This is needed to set up the reoccuring queries.

You will need to restart the runtime of this notebook after running this cell.

This can be done by going to Runtime > Restart runtime or pressing CTRL + M + .

Restart runtime location

Load needed libraries

Run cell to load in the libraries needed for this tool.

Sign in and Authenticate

Run the cells and input the project name of your Google Cloud Project. You will be asked to sign in. Choose the account with your Google Cloud project on, which should be the same account you are using in colab.

These cells give access to the Google Drive of the account you sign into, as well as setting up a service account and enabling the needed api permissions within Google Cloud.

You will have to add the created service account with read access to your GA3 data.

You do this by going to Google Analytics, selecting the GA3 view that you want to use, clicking Admin in the bottom right, looking in the property settings and selecting property access management. Once there, select the blue button on the top right to add another user. There, you add the service account email with viewer permissions.

Adding service account to GA3 property visual

Specify configuration parameters for the Source and Target Datasets

For the BigQuery From Table (Source Dataset) you need to input the following:

  • from_dataset_id : name of the dataset in your project with your GA4 data.
  • from_table_prefix : prefix of the tables in source dataset. Most of the time will be “events_”.
  • from_conversion_event : Optional, set conversion event from GA4 you want to pull.
  • from_initial_pull_prior_days : How many days back from initial pull do you want.

For the BigQuery To Table (Target Dataset) you will need to input the following:

  • to_dataset_id : Name of the dataset you want created/to use in your project to hold modified GA4 data.
  • to_table_name : Name of the table you want created in the target dataset.

Authenticate BigQuery

Run cell to authenticate BigQuery Client.

Built Initial Table

Run to pull GA4 data and create the target dataset and table, will error if table is already created under specified name and location.

Set Up Daily Reoccuring Pull

Run to set up the daily scheduled query in Google Cloud. If it errors, check that you have restarted your runtime after installing BigQuery Data Transfer.

Review Data

Run to pull data from the target dataset for review.

Part 2: Backfil GA3 data

Specify configuration parameters for the GA3 View

You need to input the following:

  • ga3_view_id : The GA3 view ID that you want to use to backfill your GA4 data with.
  • pull_start_date : The beginning of the timeframe of GA3 data that you want to pull.
  • website_url : Full URL of the website for the view, including any leading https://. This will be added to the landing_page pulled for GA3 data in order to match the format of GA4 data.

Backfill GA3 Data

Run to backfil GA3 data to your GA4 data.

Note: If you want to add a conversion event, you will need to add it to the metrics list and to the order dictionary in in lib/process_ga3.py.

Review Data

Run to pull data from the target dataset for review.




Acknowledgements

  • Thank you to JR, Joe, and Savannah for the work on the creation of this tool.
  • Thanks to Derek Perkins of Nozzle for SQL code review.
  • This tool was created and released open for the SEO community by Locomotive.agency.




Feedback

If any errors, concerns, or areas for improvement are found, please open an issue for it on this repo. If you want to change something, please make a pull request. We would love to feature dashboards users have made with the data. Please let us know if you have any to share!

You might also like...

Something tiny but historical~Classic Cryptography!

Classic-Cryptography Something tiny but historical~Classic Cryptography! Here are some tools, representing achievements of classic cryptography. These

May 16, 2022

A python algorithm that generates a round robin match scheduling using the circle method.

RROBIN-PY | Round Robin Match Scheduling Generator This generates a round robin match scheduling with an input of N number of teams. This uses the cir

Apr 7, 2022

Python package that provides a utility function to recursively get files that match a pattern.

Get Files List Python package that provides a utility function to recursively get files that match a pattern. Installation Install using pip: pip inst

Jul 31, 2022

Match drivers, flats and refugee groups

Installation guide This is a pretty standard django app, install should be like this (tested on Ubuntu 20.04): # Install python3-venv if not available

Mar 25, 2022

Formats your Image to match with r/place's pixelated form and color palette.

reddit-Place-Image-Formatter Formats your Image to match with r/place's pixelated form and color palette. HOW TO USE: run.py, Console Menu: To create

Apr 16, 2022

Explore-and-Match: A New Paradigm for Temporal Video Grounding with Natural Language

Explore and Match: A New Paradigm for Temporal Video Grounding with Natural Language Implementation of "Explore and Match: A New Paradigm for Temporal

Dec 16, 2022

Code for Watch and Match: Supercharging Imitation with Regularized Optimal Transport

Code for Watch and Match: Supercharging Imitation with Regularized Optimal Transport

Watch and Match: Supercharging Imitation with Regularized Optimal Transport This is a repository containing the code for the paper "Watch and Match: S

Dec 21, 2022

The script reads a JSON file (or folder) and generates an ELAN file to match.

JSON to ELAN The script looks in a folder, and generates an ELAN file to match each JSON file. JSON format It has been written for the JSON output fro

Aug 4, 2022

Check external IP address against current IP address and if these are not a match, update GoDaddy Records.

External IP Check Check external IP address of the host against the currently stored IP address and if these are not a match, update GoDaddy DNS Recor

Aug 23, 2022
Comments
  • Table GA3 does not exist

    Table GA3 does not exist

    Hi again my friend!

    I have managed to set up all the steps you indicate in the notebook but I am stuck in step 4: backfill GA3 data

    My question is the following: is it necessary to have a table created in bigquery with the GA3 data version? If yes, is it possible to export the GA3 data to a bigquery table? because I have only seen that it can be done with the GA4 version.

    Thank so much :)

    Best. Rai

    question 
    opened by raismartup 4
  • notebook.ipynb exist???

    notebook.ipynb exist???

    Hello!

    Congralutations for your project!!! It´s fantastic.

    I have a question about this. In the repository you talk about a notebook but I don't see one notebok.ipynb, should there really be one?

    Thank so much :)

    documentation question 
    opened by raismartup 1
Owner
LOCOMOTIVE®
An enterprise technical SEO agency.
LOCOMOTIVE®
Formatting support for Python using the Black formatter

Formatter extension for Visual Studio Code using black A Visual Studio Code extension with support for the black formatter. The extension ships with b

Microsoft 67 Dec 31, 2022
Minecraft Formatting Codes for Python 3.x

MCFC Minecraft Formatting Codes for Python 3.x Install pip install mcfc Functions colors(): Show information and available colors. f_print(text: str):

WoidZero 3 Nov 24, 2022
You can take notes in raw python. This is a very simple program without lots of formatting choices.

You can take notes in raw python. This is a very simple program without lots of formatting choices. However, it's a fun project to have an alternative to the boring Word :)

Matthew Tam 2 Aug 16, 2022
Sanitization/Formatting server for Lead CSV files

Sanitization Server LMS needs to load data from various sources. This server cleans that data and loads them in a specific format. Usage Run the pytho

null 2 May 27, 2022
Epix API is an open-source Application Programming Interface for simple text formatting

Epix API is an open-source Application Programming Interface for simple text formatting. Such As making inputs non case sensitive. This will be simple and targeted towards newer programmers like myself.

Eveldun 1 Oct 28, 2022
GitHub repository for GenErode, a Snakemake pipeline for the analysis of whole-genome sequencing data from historical and modern samples to study patterns of genome erosion.

GenErode pipeline (C) Jonas Söderberg GitHub repository for GenErode, a Snakemake pipeline for the analysis of whole-genome sequencing data from histo

NBIS -- National Bioinformatics Infrastructure Sweden 12 Nov 26, 2022
Find support & resistance zones in historical data

find-SRs This program detects support & resistance zones in historical [forex] data. It's a project of the reddit r/algotrading community. It's intend

null 5 Aug 16, 2022
Investing method to download last 30 days of financial historical data from Investing.com

Investing.com Financial Historical Data Retriever Tired of "403 Error Retry later" error from InvestPy package. Decided to create this tiny script tha

Amir C 2 Nov 4, 2022
Implementation of 'An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale' ICLR 2020 and a documentation about its historical and technical background.

Vision Transformer Implementation of 'An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale' ICLR 2020 (arXiv, PDF) and a documen

Yu Geunhyeok 1 Aug 25, 2022
All of 7451 AvengerRobotics' Match Scouting Data

Match Scouting Data All of 7451 AvengerRobotics' raw match scouting data 2022-present. Accessing Data The data in this repository can be viewed by nav

AvengerRobotics 1 Nov 17, 2022