Integrate with Azure Blob Storage

This guide describes how to integrate Azure Blob Storage within your Charmed Kubeflow deployment.

Requirements

Assign permissions

Before starting to work with Azure Blob Storage, ensure that you have the necessary permissions to access the storage account. Follow this guide to self-assign the Storage Blob Data Contributor role.

Create and connect to a new notebook

From the Kubeflow dashboard, navigate to Notebooks, and click on New Notebook. Select a JupyterLab environment, and connect to the newly created notebook.

Install required packages

On the JupyterLab launcher, click on Terminal to start a new terminal session. Next, install the packages required to connect to your Azure account and interact with it using the Python client library:

pip install azure-cli azure-storage-blob azure-identity

The installation may take a few minutes to complete.

Sign in to your Azure account

Sign in to Azure through the Azure CLI using the following command:

az login

Confirm that you have successfully logged in to your account:

az account show

Connect to your Azure account via the Python client

Add a new tab in your JupyterLab environment, and then create a new Python 3 notebook. Within a notebook cell, run the following code to connect to your Azure account:

import os, uuid
from azure.identity import AzureCliCredential
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

try:
    print("Azure Blob Storage Python quickstart sample")

    account_url = "https://<storageaccountname>.blob.core.windows.net"
    default_credential = AzureCliCredential()

    blob_service_client = BlobServiceClient(account_url, credential=default_credential)

except Exception as ex:
    print('Exception:')
    print(ex)

Replace the <storageaccountname> token with the storage account name you want to interact with.

Create a new blob container

You can create a new blob container by creating a new text file in the data directory and upload it as follows:

local_path = "./data"
os.mkdir(local_path)

local_file_name = str(uuid.uuid4()) + ".txt"
upload_file_path = os.path.join(local_path, local_file_name)

file = open(file=upload_file_path, mode='w')
file.write("Hello, World!")
file.close()

container_name = str(uuid.uuid4())
blob_client = blob_service_client.get_blob_client(container=container_name, blob=local_file_name)

print("\nUploading to Azure Storage as blob:\n\t" + local_file_name)

with open(file=upload_file_path, mode="rb") as data:
    blob_client.upload_blob(data)

Establish the local file name defining the local_file_name variable and the new container name defining the container_name variable.

See Naming and Referencing Containers, Blobs, and Metadata for more information about naming containers.

List the blobs in a container

You can list all blobs in a specified container as follows:

print("\nListing blobs...")

blob_list = container_client.list_blobs()
for blob in blob_list:
    print("\t" + blob.name)

Download blobs

You can download blobs and save them to your local file system. Use the following code to download the blob specified by its name:

download_file_path = os.path.join(local_path, str.replace(local_file_name ,'.txt', 'DOWNLOAD.txt'))
container_client = blob_service_client.get_container_client(container= container_name) 
print("\nDownloading blob to \n\t" + download_file_path)

with open(file=download_file_path, mode="wb") as download_file:
 download_file.write(container_client.download_blob(blob.name).readall())

Clean up resources

Clean up the resources created throughout this guide by running the following code:

print("\nPress the Enter key to begin clean up")
input()

print("Deleting blob container...")
container_client.delete_container()

print("Deleting the local source and downloaded files...")
os.remove(upload_file_path)
os.remove(download_file_path)
os.rmdir(local_path)

print("Done")

Alternatively, you can also use the Azure CLI along with this guide to do so.