Getting Started with Snowflake: A Pizza Challenge

Recently, I found an interesting guided project at Coursera: ‘Snowflake for Beginners’. The guided project gives a better understanding of how Snowflake features work. Throughout the course, I focus on the Rare Burger restaurant compared to its peers in New York City. First of all, I retrieve the data from the ‘Marketplace’. After, I create a data warehouse, and I explore the data by using SQL. Lastly, I make a dashboard with Snowflake. After following the course, I have some time left with my free trial account. So, let’s practice some more with Snowflake!

In 2018, I visited New York City. I had Pizza at Joe’s Pizza located at Times Square. Joe’s Pizza is a fast-food restaurant where you can buy slices of pizza, either to take away or eat at the restaurant. When you decide to eat at the restaurant, you can eat your pizza at a cozy place with pictures of celebrities on the wall! As in the guided project, I create a data warehouse with the New York City restaurant data. After, I create a nice dashboard with Snowflake, focusing on Pizza restaurants in New York City.

Retrieving Restaurant Data from the Snowflake Marketplace

At the Snowflake Marketplace, lots of datasets are available. I search for the topic ‘restaurants’ in the search bar. Then, I select some criteria: I would like to see all restaurant data which is available for free and instantly accessible. I choose the sample dataset that is provided by SafeGraph. I click on the tile to open more information on the dataset. The dataset ‘Cross Shopping Insights – NYC Restaurants’ has free data available on restaurants in New York. It contains customer spending insights for restaurants in New York City. Interesting questions that arise when analyzing this data include:

  • What is the best location to open a new pizza restaurant, based on the address and postal code?
  • What location attracts the most customers?
  • How much do customers spend on average?
  • What is the median spend per customer?

Restaurant Data

After getting the data from the ‘Marketplace’, the sample database is visible in the data menu. The data menu contains all databases requested from the ‘Marketplace’. You can create your own database as well. Let’s focus on creating a database from the sample restaurant data that is available in the ‘Marketplace’. Each database has an information schema and a public schema (also: data feeds). The information schema provides access to the metadata; this is the system catalog or data dictionary. On the other hand, the public schema contains the data tables with actual information. I create a new database by clicking “+ Database” in the right corner of the screen. I name the newly created database ‘Pizza Challenge‘.

After adding data to the database, there is an option to preview the data. The data preview shows the first rows of the data table. Each database can have multiple tables. I have created 2 tables within the pizza challenge database. I primarily focus on the monthly revenue table.  You can also view the table details and the columns. The table details include the variables and variable types.

Worksheets

A worksheet in Snowflake enables users to run SQL queries or Python code as well as perform data loading and manipulation tasks. I choose to run a SQL query. First, I would like to add 2 tables to the Pizza Challenge database. I create the tables by using the following SQL query:

USE ROLE ACCOUNTADMIN;
CREATE DATABASE IF NOT EXISTS PIZZACHALLENGE;
USE DATABASE PIZZACHALLENGE;
USE SCHEMA PUBLIC;

CREATE OR REPLACE TABLE MONTHLY_REVENUE (
  "LOCATION_NAME" varchar(50),
  "REGION" varchar2(50),
  "POSTAL_CODE" varchar2(255),
  "STREET_ADDRESS" varchar2(255) ,
  "RAW_TOTAL_SPEND" FLOAT ,
  "RAW_NUM_TRANSACTIONS" number,
  "RAW_NUM_CUSTOMERS" number,
  "ONLINE_SPEND" FLOAT ,
  "MEDIAN_SPEND_PER_TRANSACTION" FLOAT ,
  "MEDIAN_SPEND_PER_CUSTOMER" FLOAT
);


CREATE OR REPLACE TABLE DAILY_SALES (
	BRAND VARCHAR(50),
	REGION VARCHAR(50),
	DATE DATE,
	SALES FLOAT
);

After, I add the data to the database. I use the dataset on restaurants in New York City that I retrieved from the ‘Marketplace‘. Now, I only include the pizza restaurants in New York City. After adding the data to the table, the data is visible within the tables in the database.

select LOCATION_NAME, REGION, POSTAL_CODE, STREET_ADDRESS, RAW_TOTAL_SPEND, RAW_NUM_TRANSACTIONS, RAW_NUM_CUSTOMERS, ONLINE_SPEND, MEDIAN_SPEND_PER_TRANSACTION, MEDIAN_SPEND_PER_CUSTOMER from MONTHLY_REVENUE

UNION

select LOCATION_NAME, REGION, POSTAL_CODE, STREET_ADDRESS, RAW_TOTAL_SPEND, RAW_NUM_TRANSACTIONS, RAW_NUM_CUSTOMERS, ONLINE_SPEND, MEDIAN_SPEND_PER_TRANSACTION, MEDIAN_SPEND_PER_CUSTOMER from FREE_SAMPLE_CROSS_SHOPPING_INSIGHTS__NYC_RESTAURANTS.PUBLIC.SPEND_CROSS_SHOPPING_SAMPLE 
where CATEGORY_TAGS like '%Pizza%';

Creating a Data Warehouse

Next, I create a virtual warehouse in Snowflake. Warehouses are required to run queries and perform data manipulation activities, such as loading data into tables and altering data, i.e. updating or removing rows and/ or columns. Compared to warehouses, a database in the Snowflake data platform is a virtual cloud to store data. In the trial account, there has been created a warehouse for you named “Compute_WH”.

I create a new warehouse, and I name it “PizzaChallenge_WH”. I choose the size of the warehouse. A larger warehouse costs more credits. Therefore, I pick the small warehouse which is 2 credits/ hour. In the advanced options, you can select the warehouse type, auto-resume, auto-suspend, and the number of minutes until the warehouse suspends.

Classic Console

After creating a new data warehouse, the warehouse must be filled with data. One way to do this is to switch to the classic console and load the data into the database in the classic console. I load the data from a CSV file into the table. Now, a new warehouse has been created. Let’s go back to Snowsight and create a nice dashboard.

Snowflake Dashboard

Let’s create a Snowflake Dashboard. Creating a dashboard is quite simple. Click on ‘Dashboard’ in the menu on the left and then click on “+ dashboard” in the right corner on the top. Give the dashboard a name. I name the dashboard “NYC-Pizza”. Add a new tile to get started.

Now, let’s add some graphs. I start with exploring the data. When you create a new tile to get started, a new worksheet opens. I use SQL to do some analysis. The dataset includes the following columns: location, postal code, street address, online spending, number of customers, median spend per customer, median spend per transaction, total spending, and the number of transactions.

Let’s focus on online spending and the number of customers first. I filter the data on street addresses. New York is a huge city, and therefore, I only focus on 2 famous streets in New York: Lexington Avenue and Broadway.

Online Spending

First of all, I create a bar plot with the top 10 pizza restaurants with the largest online spending. I write the query, and I run the query to get the results (also: the data I want to display in the graph). After running the query, I click on ‘chart’ to create a graph. You can select the chart type in the menu on the right. Snowflake has 5 types of charts: line charts, bar charts, scatterplots, heatgrids, and scorecards. I select a bar chart to display the results. Also, I select some criteria for appearance: the orientation of the bar chart is horizontal and I limit the number of bars to 10.

select LOCATION_NAME, ONLINE_SPEND from MONTHLY_REVENUE 
where STREET_ADDRESS like '%Broadway%' or STREET_ADDRESS like '%Lexington%'
order by ONLINE_SPEND desc ;

Marinara Pizza has the most online spending by customers. Joe’s Pizza has the 8th largest online spending.

Number of Customers

I duplicate the tile in the dashboard. This time, I add the total number of customers to the bar chart. The chart shows that Joe’s Pizza has the largest number of customers in the area.

select LOCATION_NAME, STREET_ADDRESS, RAW_NUM_CUSTOMERS from MONTHLY_REVENUE 
where STREET_ADDRESS like '%Broadway%' or STREET_ADDRESS like '%Lexington%'
order by RAW_NUM_CUSTOMERS desc;
Number of Customers Compared to Competition

Also, I compare the number of customers by location vs competitors. Joe’s Pizza is doing really well compared to its competitors.

select LOCATION_NAME, STREET_ADDRESS, RAW_NUM_CUSTOMERS, POSTAL_CODE from MONTHLY_REVENUE 
where STREET_ADDRESS like '%Broadway%' or STREET_ADDRESS like '%Lexington%';
select LOCATION_NAME, STREET_ADDRESS, RAW_NUM_CUSTOMERS, POSTAL_CODE from MONTHLY_REVENUE where POSTAL_CODE > 10002;
Heatgrids and Scorecards

Aside from bar charts, you can create heatgrids and scorecards as well. A scorecard is a good indicator of whether a goal is on track or not. Within companies, you can quickly focus on objects that are falling behind.

I create 3 headgrids, containing the online spending, the median spend per customer, and the total number of customers, including each location of Joe’s Pizza. Joe’s Pizza has 4 restaurants in New York City. Joe’s Pizza located at 1435 Broadway has the largest number of customers compared to the other Joe’s Pizza restaurant locations. Also, I create scorecards including the average online and total spend per customer of all locations.

select * from MONTHLY_REVENUE where LOCATION_NAME like '%Joe''s Pizza%';
select LOCATION_NAME, RAW_TOTAL_SPEND, RAW_NUM_CUSTOMERS, RAW_TOTAL_SPEND / RAW_NUM_CUSTOMERS as AVERAGE_SPEND from MONTHLY_REVENUE where LOCATION_NAME like '%Joe''s Pizza%';
select LOCATION_NAME, ONLINE_SPEND, RAW_NUM_CUSTOMERS, ONLINE_SPEND / RAW_NUM_CUSTOMERS as AVERAGE_SPEND_ONLINE from MONTHLY_REVENUE where LOCATION_NAME like '%Joe''s Pizza%';
Snowflake Dashboard: Pizza Restaurants in New York City

The Snowflake dashboard is complete. This is how you build a dashboard with Snowflake.

Connect Snowflake with Power Bi

Another option is to connect Snowflake with Power BI Desktop. Most companies store their data in data warehouses. Power BI is a powerful tool to create reports to gain insightful business information. I got curious and I open my Power BI Desktop.

You can load multiple sources of data into Power BI. I choose to load the Snowflake data into Power BI. Therefore, I copy the link from the classic console and put it in the box below ‘server’ in Power BI. The warehouse is the pizza challenge data warehouse I created in Snowflake. I pick the monthly revenue table from the pizza database to load into a Power BI table.