Creating and Using Custom Models in Snowflake

The landscape of data analytics is rapidly evolving, with new tools and functionalities continuously being developed to enhance the capabilities of analytics teams. Snowflake Cortex has introduced several innovative machine learning (ML) and artificial intelligence (AI) features over the past couple of years, making it easier for teams to implement and scale their AI/ML workflows directly within the Snowflake environment. This blog post explores these new functionalities and how they can be leveraged for various use-cases.

Creating Custom Models

To use custom models with Snowflake, you typically follow these steps:

1. Data Preparation

Collect and prepare your dataset. Ensure it is cleaned and formatted correctly. Store your dataset in a Snowflake table to make it accessible for the next steps.

2. Model Training

Export your dataset from Snowflake to a machine learning environment such as Jupyter Notebooks, AWS SageMaker, Google Colab, or any other platform that supports Python and machine learning libraries like TensorFlow, PyTorch, or scikit-learn. Train your machine learning model using this environment.

3. Model Export

Once your model is trained, save it in a format that can be easily imported into Snowflake. Common formats include ONNX, PMML, or any other serialized model format that ensures compatibility with various deployment options.

4. Model Deployment

Deploy your model to an environment accessible by Snowflake, such as an external REST API, AWS Lambda, Google Cloud Function, or an Azure Function. This deployment allows Snowflake to call the model for inference, enabling integration with Snowflake’s data processing capabilities.

Using Custom Models with Snowflake Cortex

Functionality: ML_CLASSIFY

Use-Case: Classifying emails as spam or not spam.

Implementation Steps:

1. Model Training

Export email data from Snowflake to your machine learning environment. Train a model using a library like scikit-learn:

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline
import joblib

# Load your dataset
email_data = load_email_data()  # Assume this function loads data from Snowflake

# Split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(email_data['text'], email_data['label'], test_size=0.2)

# Create a pipeline that vectorizes the data then applies Naive Bayes
model = make_pipeline(CountVectorizer(), MultinomialNB())

# Train the model
model.fit(X_train, y_train)

# Save the model
joblib.dump(model, 'email_classification_model.pkl')

2. Model Deployment

Deploy the saved model (`email_classification_model.pkl`) to an environment accessible by Snowflake. For example, you could deploy it as an AWS Lambda function:

import json
import joblib
from sklearn.pipeline import Pipeline

# Load the model
model = joblib.load('email_classification_model.pkl')

def lambda_handler(event, context):
    # Extract the email text from the event
    email_text = event['text']
    
    # Predict the classification
    prediction = model.predict([email_text])
    
    # Return the prediction
    return {
        'statusCode': 200,
        'body': json.dumps({'classification': prediction[0]})
    }

3. Using the Model in Snowflake

Use Snowflake's external function feature to call the deployed model.

CREATE OR REPLACE FUNCTION classify_email(email_text STRING)
RETURNS STRING
LANGUAGE JAVASCRIPT
RUNTIME_VERSION = '3.1'
HANDLER = 'classify_email'
RETURNS NULL ON NULL INPUT
COMMENT = 'Classify email as spam or not spam using a custom model deployed on AWS Lambda'
EXTERNAL FUNCTION (
    URL = 'https://your_lambda_function_url.amazonaws.com/dev/classify_email',
    METHOD = 'POST',
    HEADERS = ( 'Content-Type' = 'application/json' ),
    BODY_TEMPLATE = '{ "text": "{{email_text}}" }'
);

-- Use the function in a query
SELECT
    email_id,
    classify_email(email_content) AS classification
FROM
    email_data;

Functionality: ML_INFER‍

Use-Case: Running inference using custom pre-trained models for predictive analytics.

Implementation Steps:

1. Model Training

Similar to the email classification example, train your model in a suitable environment and save it in an appropriate format.

2. Model Deployment

Deploy your model in a way that Snowflake can access, such as an AWS Lambda function or any RESTful service.

3. Using the Model in Snowflake

Call the external function from Snowflake to get predictions.

CREATE OR REPLACE FUNCTION predict_outcome(feature1 FLOAT, feature2 FLOAT)
RETURNS FLOAT
LANGUAGE JAVASCRIPT
RUNTIME_VERSION = '3.1'
HANDLER = 'predict_outcome'
RETURNS NULL ON NULL INPUT
COMMENT = 'Predict outcome using a custom model deployed on AWS Lambda'
EXTERNAL FUNCTION (
    URL = 'https://your_lambda_function_url.amazonaws.com/dev/predict_outcome',
    METHOD = 'POST',
    HEADERS = ( 'Content-Type' = 'application/json' ),
    BODY_TEMPLATE = '{ "feature1": "{{feature1}}", "feature2": "{{feature2}}" }'
);

-- Use the function in a query
SELECT
    data_point_id,
    predict_outcome(feature1, feature2) AS prediction
FROM
    new_data;

Conclusion

By leveraging custom models with Snowflake Cortex, teams can utilize powerful machine learning models directly within their Snowflake environment. This integration enables advanced analytics capabilities, such as classification and predictive modeling, using custom-trained models. The ability to call these models via external functions allows for seamless incorporation into existing data workflows, providing robust and scalable solutions for various business needs.

For more detailed information on setting up and using external functions with Snowflake, refer to the Snowflake Documentation.

‍