Learn more
How To

Enhancing Data Analysis with Google BigQuery's AI/ML Functions

Google BigQuery is a powerful, fully-managed data warehouse that now integrates advanced ML and AI functionalities directly into its platform. This blog post provides an overview of the key AI/ML functions in BigQuery, enabling analysts to build, train, and deploy models using SQL commands, making sophisticated data analysis more accessible. Discover how BigQuery ML simplifies machine learning workflows and elevates data analysis capabilities.

Britton Stamper
July 22, 2024
Enhancing Data Analysis with Google BigQuery's AI/ML Functions
Table of Contents

Google BigQuery is a powerful, fully-managed data warehouse that enables fast SQL queries and interactive analysis of large datasets. To further empower data analysts, Google has integrated advanced machine learning (ML) and artificial intelligence (AI) functionalities directly into BigQuery. These tools allow analysts to build, train, and deploy ML models using simple SQL commands, making sophisticated data analysis more accessible. This blog post provides an overview of the key AI/ML functions in BigQuery that analysts should be aware of to elevate their work.

BigQuery ML: Overview

BigQuery ML (BQML) allows users to create and execute machine learning models in BigQuery using standard SQL queries. This integration simplifies the process of adding machine learning capabilities to your data analysis workflows without needing to move data or learn new programming languages.

Key AI/ML Functions in BigQuery

1. Creating Models with CREATE MODEL

Use-Case: Building and training machine learning models directly in BigQuery.

Description: The CREATE MODEL statement in BigQuery ML is used to create various types of machine learning models, such as linear regression, logistic regression, k-means clustering, and more. This command simplifies the model-building process by allowing users to define and train models using SQL.

Example Implementation:

-- Create a linear regression model to predict sales
CREATE OR REPLACE MODEL my_dataset.sales_model
OPTIONS(model_type='linear_reg') AS
SELECT
    total_spent,
    num_items,
    customer_age,
    customer_income,
    sales
FROM
    my_dataset.sales_data;

2. Training Models with ML.TRAINING_INFO

Use-Case: Monitoring the training process of machine learning models.

Description: The ML.TRAINING_INFO function provides detailed information about the training process of a machine learning model, including metrics such as loss, accuracy, and other relevant parameters.

Example Implementation:

-- Retrieve training information for the sales model
SELECT
    iteration,
    loss,
    evaluation_loss
FROM
    ML.TRAINING_INFO(MODEL my_dataset.sales_model);

3. Evaluating Models with ML.EVALUATE

Use-Case: Evaluating the performance of trained machine learning models.

Description: The ML.EVALUATE function computes evaluation metrics for a trained machine learning model, such as mean absolute error (MAE), mean squared error (MSE), and R2 score for regression models, and precision, recall, and accuracy for classification models.

Example Implementation:

-- Evaluate the sales model
SELECT
    *
FROM
    ML.EVALUATE(MODEL my_dataset.sales_model, (
        SELECT
            total_spent,
            num_items,
            customer_age,
            customer_income,
            sales
        FROM
            my_dataset.sales_data
    ));

4. Making Predictions with ML.PREDICT

Use-Case: Generating predictions using trained machine learning models.

Description: The ML.PREDICT function applies a trained machine learning model to new data, generating predictions based on the model's learned patterns.

Example Implementation:

-- Make predictions using the sales model
SELECT
    total_spent,
    num_items,
    customer_age,
    customer_income,
    predicted_sales
FROM
    ML.PREDICT(MODEL my_dataset.sales_model, (
        SELECT
            total_spent,
            num_items,
            customer_age,
            customer_income
        FROM
            my_dataset.new_sales_data
    ));

5. Hyperparameter Tuning with ML.TUNE

Use-Case: Optimizing hyperparameters to improve model performance.

Description: The ML.TUNE function helps in tuning hyperparameters of machine learning models, allowing users to find the best set of parameters that enhance model performance.

Example Implementation:

-- Hyperparameter tuning for the sales model
CREATE OR REPLACE MODEL my_dataset.tuned_sales_model
OPTIONS(
    model_type='linear_reg',
    max_iterations=100,
    learn_rate=0.1,
    l2_reg=0.01,
    early_stop=True
) AS
SELECT
    total_spent,
    num_items,
    customer_age,
    customer_income,
    sales
FROM
    my_dataset.sales_data;

6. Clustering with CREATE MODEL (k-means)

Use-Case: Segmenting data into clusters based on feature similarity.

Description: Using k-means clustering in BigQuery ML helps identify distinct groups within the data, making it useful for customer segmentation, anomaly detection, and other clustering tasks.

Example Implementation:

-- Create a k-means clustering model
CREATE OR REPLACE MODEL my_dataset.customer_segments
OPTIONS(
    model_type='kmeans',
    num_clusters=3
) AS
SELECT
    total_spent,
    num_items,
    customer_age,
    customer_income
FROM
    my_dataset.customer_data;

7. Time Series Forecasting with ARIMA_PLUS

Use-Case: Forecasting future values based on historical time series data.

Description: The ARIMA_PLUS model in BigQuery ML is used for time series forecasting, which can be applied to predict sales, stock levels, demand, and other time-dependent variables.

Example Implementation:

-- Create an ARIMA_PLUS model for sales forecasting
CREATE OR REPLACE MODEL my_dataset.sales_forecast
OPTIONS(model_type='arima_plus') AS
SELECT
    date,
    sales
FROM
    my_dataset.sales_data;

8. Deep Learning with TensorFlow Models

Use-Case: Integrating TensorFlow models with BigQuery for advanced deep learning tasks.

Description: BigQuery ML supports importing TensorFlow models, enabling users to leverage pre-trained models or custom deep learning models within BigQuery.

Example Implementation:

-- Import and use a TensorFlow model in BigQuery
CREATE OR REPLACE MODEL my_dataset.tf_model
OPTIONS(
    model_type='tensorflow',
    model_path='gs://my_bucket/tf_model'
) AS
SELECT
    *
FROM
    my_dataset.input_data;

-- Make predictions with the TensorFlow model
SELECT
    *
FROM
    ML.PREDICT(MODEL my_dataset.tf_model, (
        SELECT
            *
        FROM
            my_dataset.new_data
    ));

Conclusion

Google BigQuery's integration of AI and ML functionalities provides a powerful toolkit for data analysts. From creating and evaluating models to making predictions and optimizing hyperparameters, BigQuery ML simplifies the machine learning workflow by allowing analysts to use SQL for advanced data analysis tasks. By leveraging these functions, analysts can unlock deeper insights from their data, streamline their workflows, and drive more informed decision-making processes. Whether you're forecasting sales, segmenting customers, or applying deep learning models, BigQuery ML offers the tools you need to elevate your data analysis capabilities.

We're here to help!

Get the Semantic Layer Guide!

Everything that a data leader needs to understand and deploy metrics at scale

Download The Full Guide

Core Semantic Layer Concepts

Benefits and ROI

Implementation Steps

Get started with the next generation of data applications

Create an account to connect your business and elevate how your operate.

ABOUT THE AUTHOR
Britton Stamper

Britton is the CTO of Push.ai and oversees Product, Design, and Engineering. He's been a passionate builder, analyst and designer who loves all things data products and growth. You can find him reading books at a coffee shop or finding winning strategies in board games and board rooms.

Enjoyed this read?

Stay up to date with the latest product updates and insights sent straight to your inbox!

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.