Building a Data Agency

How you can offer "data pipelines as a service" on Google BigQuery, following our end-to-end process

What You'll Learn

Down with chaos! 

At CIFL, we've built data pipelines (mostly in Bigquery) for a wide array of businesses. 

What we've found, is that coding is the easy part.

The hard part, as usual, is getting your team (or your client's teams) on the same page throughout the development process - so that what gets built actually gets used when it's done.

This course focuses on the processes that can ensure a good result - whether you're building a data pipeline internally for your company, or offering data pipelining as a service to your clients.

We'll dive into:

  1. How to decide if you even need a data pipeline at all
  2. How to roadmap a pipeline, to get buy-in + validation upfront
  3. How to break a pipeline roadmap into bite-sized development sprints to deliver small wins
  4. What roles are required to actually build a reporting pipeline, and how to hire for them
  5. How to validate data to build trust in the outcome

This course is a labor of love - these are the lessons learned over a few years deep in the weeds of building data pipelines for clients.

Hope you enjoy it as much as we enjoy sharing it.

Many thanks to Supermetrics and Stitch for providing demo accounts for use in building the course.

Mahalo,
David
Commissioner, Coding is for Losers

What's included?

Video Icon 57 videos File Icon 3 files Text Icon 6 text files

Contents

Getting Started + FAQs
***ALL THE TEMPLATE LINKS***
**GETTING HELP**
The business of data, end-to-end
8 mins
Why'd we republish this course?
3 mins
Who is this for, and what will you learn?
2 mins
What is a data pipeline?
[THROWBACK ALERT] WTF is ADP?
Additional (FREE) CIFL Courses
Getting Started with BigQuery SQL
Data Studio the Lazy Way
0.1 The Sales Flow
0.1.1 Finding your niche
8 mins
0.1.2 Building an inbound content strategy
8 mins
0.1.3 Why we arrived at the sprint pricing model
7 mins
0.1.4 Why we publish pricing
4 mins
0.1.5 The initial sales call
6 mins
0.1.6 The roadmapping process
7 mins
0.1.7 Deal closing + contracts
6 mins
0.2 Staffing & Resourcing
0.2.1 The sprint flow + roles
6 mins
0.2.2 Hiring reporting + data modeling analysts
5 mins
COMING SOON - Making a staffing plan & budget
1.1 Planning Development Sprints
1.1.0 Meet the Tracking Plan
5 mins
1.1.1 Breaking the roadmap into a Tracking Plan
6 mins
1.1.2 Mapping our raw data source requirements
5 mins
1.1.3 What you'll do with data source schemas
3 mins
1.1.4 Mapping out data source schemas
8 mins
1.1.5 Populating key starter questions for reporting
4 mins
1.1.6 Scoping out each Site or Client
5 mins
2.1 Data Feeds - Getting Started
2.1.1 Intro to data feeds
2 mins
2.1.2 BigQuery initial setup
4 mins
2.1.3 Setting up your BigQuery tables
7 mins
2.1.4 Pushing data from Sheets to BigQuery
6 mins
2.1.5 Supermetrics quickstart for beginners
3 mins
2.1.6 Pulling data from unsupported APIs into the Tracking Plan
9 mins
2.2 Data Feeds - Stitch
2.2.1 Stitch initial setup
3 mins
2.2.2 Pulling GA data using Stitch
7 mins
2.2.3 Pulling Adwords data using Stitch
4 mins
2.2.4 Pulling FB Ads data using Stitch
4 mins
3.1 Intro to dbt
3.1.1 Intro to dbt for SQL data modeling
4 mins
3.1.2 Planning your Data Models
7 mins
3.1.3 Creating your dbt project
8 mins
3.1.4 Connect your BigQuery database to dbt
8 mins
3.1.5 Managing your dbt project via Github
7 mins
3.2 Data modeling with dbt
3.2.1 Writing your 'processing' level SQL queries
9 mins
3.2.2 Writing your 'join' level SQL models
6 mins
3.2.3 Sidenote: on debugging dbt models
7 mins
3.2.4 Pro tip: standardizing URL structure
3 mins
3.2.5 Pro tip: using dbt macros
6 mins
3.2.6 Writing your 'admin' level SQL models
3 mins
3.2.7 Writing your 'math' and 'visualization' level SQL models
8 mins
COMING SOON - Data documentation in dbt
COMING SOON - Data + schema testing in dbt
3.3 Productionalizing your dbt project
3.3.1 Intro to productionalizing your pipeline
2 mins
3.3.2 Using dbt cloud to run your SQL models on a schedule
5 mins
3.3.3 Scheduling your data pipeline orchestrations
5 mins
3.3.4 Testing changes to your data pipeline
3 mins
3.3.5 QCing data using Supermetrics as a check
5 mins
4.1 Visualizations in Data Studio
4.1.1 The "PDA" reporting design framework
4 mins
4.1.2 Designing reports in the Tracking Plan
4 mins
4.1.3 Executing the reporting build
5 mins
4.1.4 Reviewing reporting
4 mins
4.1.5 Pulling data from BigQuery into Sheets
4 mins
5.1 Sprint wrapup + review
5.1.1 Conditions for closing out a sprint
4 mins
5.1.2 Wiring up reporting with live data
6 mins
5.1.3 Sharing draft models + visualizations with clients
5 mins
5.1.4 Transitioning to support mode
5 mins
Congrats!
Wapow! You made it.
2 mins
Interested in working with CIFL?
2 mins

Mastering Google Sheets, Data Studio and BigQuery

Helping you wrangle your data + automate your work, without (hardly ever) leaving the Google stack.

FAQ

How much does the toolbelt cost to build a pipeline?

The tools required to build data pipelines are:

  1. Stitch ($100 / month for base plan) or Supermetrics ($89 / month) for pulling data from APIs
  2. Google Sheets + Apps Script (Free)
  3. Google BigQuery (or similar database) for data warehousing (free 12month trial + $300 credit, cost varies by usage after that)
  4. dbt to model data using SQL (Free + open-source)
  5. dbt cloud to run dbt models on a schedule (free for one seat, $50/seat after that)

So the bottom line is - the cost of your pipeline depends on the size of your pipeline.  If you're pulling + storing a small amount of data, it may be completely free.  

If you're pulling a large amount of data, it will be more expensive.  The raw tool cost for most of our data pipeline clients is around $400-500 per month.