Learn more
How To

Using LLMs with Python, Jupyter and PostgreSQL

Leverage the capabilities of large language models (LLMs) directly within PostgreSQL workflows by integrating with external APIs or using pre-trained models via the transformers library. This approach enables advanced NLP tasks like text summarization and sentiment analysis. This blog post provides an overview and examples of how to implement these integrations to enhance your PostgreSQL-based data analysis workflows.

Britton Stamper
July 22, 2024
Using LLMs with Python, Jupyter and PostgreSQL
Table of Contents

To leverage the capabilities of large language models (LLMs) directly within your PostgreSQL workflows, you can integrate with external APIs or use pre-trained models via the transformers library. Here's how you can apply this approach:

Local Text Summarization with transformers

What is the transformers Library?

The transformers library is an open-source Python library developed by Hugging Face. It provides a unified API for using pre-trained transformer models, such as BERT, GPT-3, and others, for various natural language processing (NLP) tasks. These models are capable of tasks like text summarization, sentiment analysis, translation, and more. The library supports both TensorFlow and PyTorch, making it flexible for different machine learning environments.

Use Case: Summarizing Customer Feedback

This example demonstrates how to extract customer feedback data from PostgreSQL, summarize the text using a transformer model, and store the summarized text back into PostgreSQL.

Steps:

1. Install the necessary libraries:

pip install psycopg2-binary transformers

2. Python Script to Summarize Text:

import psycopg2
from transformers import pipeline

# Connect to PostgreSQL
conn = psycopg2.connect(
    dbname="your_db",
    user="your_user",
    password="your_password",
    host="your_host",
    port="your_port"
)
cur = conn.cursor()

# Extract data
cur.execute("SELECT feedback_id, feedback_text FROM customer_feedback")
rows = cur.fetchall()

# Summarize text using Hugging Face transformers
summarizer = pipeline("summarization")
summarized_feedback = []

for row in rows:
    feedback_id, feedback_text = row
    summary = summarizer(feedback_text, max_length=50, min_length=25, do_sample=False)[0]['summary_text']
    summarized_feedback.append((summary, feedback_id))

# Insert summarized text back into PostgreSQL
cur.executemany("UPDATE customer_feedback SET feedback_summary = %s WHERE feedback_id = %s", summarized_feedback)
conn.commit()

# Close connection
cur.close()
conn.close()

Example: Sentiment Analysis with transformers

Use Case: Sentiment Analysis on Customer Reviews

This example shows how to use a pre-trained sentiment analysis model to analyze customer reviews and update the results in PostgreSQL.

Steps:

1. Install the necessary libraries:

pip install psycopg2-binary transformers

2. Python Script for Sentiment Analysis:

import psycopg2
from transformers import pipeline

# Connect to PostgreSQL
conn = psycopg2.connect(
    dbname="your_db",
    user="your_user",
    password="your_password",
    host="your_host",
    port="your_port"
)
cur = conn.cursor()

# Extract data
cur.execute("SELECT review_id, review_text FROM customer_reviews")
rows = cur.fetchall()

# Perform sentiment analysis using Hugging Face transformers
sentiment_analyzer = pipeline("sentiment-analysis")
sentiments = []

for row in rows:
    review_id, review_text = row
    sentiment = sentiment_analyzer(review_text)[0]
    sentiments.append((sentiment['label'], sentiment['score'], review_id))

# Insert sentiment analysis results back into PostgreSQL
cur.executemany(
    "UPDATE customer_reviews SET sentiment_label = %s, sentiment_score = %s WHERE review_id = %s", 
    sentiments
)
conn.commit()

# Close connection
cur.close()
conn.close()

Using the OpenAI API

  1. Install Required Libraries:You will need the psycopg2 library to connect to PostgreSQL and the requests library to interact with the OpenAI API. Install these using pip:
  2. bashCopy code
    pip install psycopg2-binary requests
  3. Setup OpenAI API Key:Ensure you have your OpenAI API key. You can get it from the OpenAI website.

Example Code

Step 1: Extract Data from PostgreSQL

First, extract data from your PostgreSQL database. In this example, we’ll assume you have a table named text_data with columns id and text.

import psycopg2

def fetch_data():
    conn = psycopg2.connect(
        dbname="your_db",
        user="your_user",
        password="your_password",
        host="your_host",
        port="your_port"
    )
    cur = conn.cursor()
    cur.execute("SELECT id, text FROM text_data")
    rows = cur.fetchall()
    cur.close()
    conn.close()
    return rows

Step 2: Send Request to OpenAI API

Next, send the extracted data to the OpenAI API and get the response.

import requests

def query_openai_api(text):
    url = "https://api.openai.com/v1/engines/davinci-codex/completions"
    headers = {
        "Content-Type": "application/json",
        "Authorization": f"Bearer YOUR_OPENAI_API_KEY"
    }
    data = {
        "prompt": text,
        "max_tokens": 50
    }
    response = requests.post(url, headers=headers, json=data)
    return response.json()['choices'][0]['text']

Step 3: Store Response Back into PostgreSQL

Finally, store the response from OpenAI back into PostgreSQL.

import psycopg2

def store_response(id, response):
    conn = psycopg2.connect(
        dbname="your_db",
        user="your_user",
        password="your_password",
        host="your_host",
        port="your_port"
    )
    cur = conn.cursor()
    cur.execute("UPDATE text_data SET response = %s WHERE id = %s", (response, id))
    conn.commit()
    cur.close()
    conn.close()

Main Script

Combine the above functions in the main script to process the data.

if __name__ == "__main__":
    # Step 1: Fetch data from PostgreSQL
    rows = fetch_data()

    # Step 2: Process each row and send a request to OpenAI API
    for row in rows:
        id, text = row
        response = query_openai_api(text)

        # Step 3: Store the response back into PostgreSQL
        store_response(id, response)

Explanation

  1. fetch_data(): Connects to PostgreSQL and fetches data from the text_data table.
  2. query_openai_api(): Sends the text to the OpenAI API and returns the generated response.
  3. store_response(): Stores the OpenAI API response back into PostgreSQL.
  4. Main Script: Fetches data, processes each row by sending it to the OpenAI API, and stores the responses back in the database.

Conclusion

Using LLM APIs and the transformers library with PostgreSQL allows you to perform advanced NLP tasks such as text summarization and sentiment analysis directly within your existing data workflows. This integration leverages the strengths of powerful transformer models to enhance the capabilities of your PostgreSQL database, enabling more sophisticated data analysis and insights.

We're here to help!

Get the Semantic Layer Guide!

Everything that a data leader needs to understand and deploy metrics at scale

Download The Full Guide

Core Semantic Layer Concepts

Benefits and ROI

Implementation Steps

Get started with the next generation of data applications

Create an account to connect your business and elevate how your operate.

ABOUT THE AUTHOR
Britton Stamper

Britton is the CTO of Push.ai and oversees Product, Design, and Engineering. He's been a passionate builder, analyst and designer who loves all things data products and growth. You can find him reading books at a coffee shop or finding winning strategies in board games and board rooms.

Enjoyed this read?

Stay up to date with the latest product updates and insights sent straight to your inbox!

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.