Azure: Blob Storage

Overview

Azure Blob Storage is a cloud file storage system that uses storage accounts to organize containers (similar to “buckets” for other storage providers) in which to store arbitrary files referred to as ‘blobs’. This Parsons integration currently only implements block blobs, not page or append blobs.

Note

Authentication

This connector requires authentication credentials for an Azure Blob Storage storage account. The azure-storage-blob library is used for this connector, and examples of how to create and use multiple types of credentials are included in the documentation.

Quickstart

This class requires a credentials argument and either an account_name or an account_url argument that includes the account name. You can store these as environmental variables (AZURE_CREDENTIAL, AZURE_ACCOUNT_NAME, and AZURE_ACCOUNT_URL, respectively) or pass them in as arguments:

Instantiate class

from parsons import AzureBlobStorage

# First approach: Use API credentials via environmental variables
azure_blob = AzureBlobStorage()

# Second approach: Pass API credentials as arguments
azure_blob = AzureBlobStorage(account_name='my_account_name', credential='1234')

List containers and blobs

# Get all container names for a storage account
container_names = azure_blob.list_containers()

# Get all blob names for a storage account and container
blob_names = azure_blob.list_blobs(container_names[0])

Create a blob from a file or Table

# Upload a CSV file from a local file path and set the content type
azure_blob.put_blob('blob_name', 'test1.csv', './test1.csv', content_type='text/csv')

# Create a Table and upload it as a JSON blob
table = Table([{'first': 'Test', 'last': 'Person'}])
azure_blob.upload_table(table, 'blob_name', 'test2.json', data_type='json')

Download a blob

# Download to a temporary file path
temp_file_path = azure_blob.download_blob('blob_name', 'test.csv')

# Download to a specific file path
azure_blob.download_blob('blob_name', 'test.csv', local_path='/tmp/test.csv')

API

class parsons.AzureBlobStorage(account_name=None, credential=None, account_domain='blob.core.windows.net', account_url=None)[source]

Instantiate AzureBlobStorage Class for a given Azure storage account.

Args:
account_name: str

The name of the Azure storage account to use. Not required if AZURE_ACCOUNT_NAME environment variable is set, or if account_url is supplied.

credential: str

An account shared access key with access to the Azure storage account, an SAS token string, or an instance of a TokenCredentials class. Not required if AZURE_CREDENTIAL environment variable is set.

account_domain: str

The domain of the Azure storage account, defaults to “blob.core.windows.net”. Not required if AZURE_ACCOUNT_DOMAIN environment variable is set or if account_url is supplied.

account_url: str

The account URL for the Azure storage account including the account name and domain. Not required if AZURE_ACCOUNT_URL environment variable is set.

Returns:

AzureBlobStorage

list_containers()[source]

Returns a list of container names for the storage account

Returns:
list[str]

List of container names

container_exists(container_name)[source]

Verify that a container exists within the storage account

Args:
container_name: str

The name of the container

Returns:

bool

get_container(container_name)[source]

Returns a container client

Args:
container_name: str

The name of the container

Returns:

ContainerClient

create_container(container_name, metadata=None, public_access=None, **kwargs)[source]

Create a container

Args:
container_name: str

The name of the container

metadata: Optional[dict[str, str]]

A dict with metadata to associated with the container.

public_access: Optional[Union[PublicAccess, str]]

Settings for public access on the container, can be ‘container’ or ‘blob’ if not None

kwargs:

Additional arguments to be supplied to the Azure Blob Storage API. See Azure Blob Storage SDK documentation for more info.

Returns:

ContainerClient

delete_container(container_name)[source]

Delete a container.

Args:
container_name: str

The name of the container

Returns:

None

list_blobs(container_name, name_starts_with=None)[source]

List all of the names of blobs in a container

Args:
container_name: str

The name of the container

name_starts_with: Optional[str]

A prefix to filter blob names

Returns:
list[str]

A list of blob names

blob_exists(container_name, blob_name)[source]

Verify that a blob exists in the specified container

Args:
container_name: str

The container name

blob_name: str

The blob name

Returns:

bool

get_blob(container_name, blob_name)[source]

Get a blob object

Args:
container_name: str

The container name

blob_name: str

The blob name

Returns:

BlobClient

get_blob_url(container_name, blob_name, account_key=None, permission=None, expiry=None, start=None)[source]

Get a URL with a shared access signature for a blob

Args:
container_name: str

The container name

blob_name: str

The blob name

account_key: Optional[str]

An account shared access key for the storage account. Will default to the key used on initialization if one was provided as the credential, but required if it was not.

permission: Optional[Union[BlobSasPermissions, str]]

Permissions associated with the blob URL. Can be either a BlobSasPermissions object or a string where ‘r’, ‘a’, ‘c’, ‘w’, and ‘d’ correspond to read, add, create, write, and delete permissions respectively.

expiry: Optional[Union[datetime, str]]

The datetime when the URL should expire. Defaults to UTC.

start: Optional[Union[datetime, str]]

The datetime when the URL should become valid. Defaults to UTC. If it is None, the URL becomes active when it is first created.

Returns:
str

URL with shared access signature for blob

put_blob(container_name, blob_name, local_path, **kwargs)[source]

Puts a blob (aka file) in a bucket

Args:
container_name: str

The name of the container to store the blob

blob_name: str

The name of the blob to be stored

local_path: str

The local path of the file to upload

kwargs:

Additional arguments to be supplied to the Azure Blob Storage API. See Azure Blob Storage SDK documentation for more info. Any keys that belong to the ContentSettings object will be provided to that class directly.

Returns:

BlobClient

download_blob(container_name, blob_name, local_path=None)[source]

Downloads a blob from a container into the specified file path or a temporary file path

Args:
container_name: str

The container name

blob_name: str

The blob name

local_path: Optional[str]

The local path where the file will be downloaded. If not specified, a temporary file will be created and returned, and that file will be removed automatically when the script is done running.

Returns:
str

The path of the downloaded file

delete_blob(container_name, blob_name)[source]

Delete a blob in a specified container.

Args:
container_name: str

The container name

blob_name: str

The blob name

Returns:

None

upload_table(table, container_name, blob_name, data_type='csv', **kwargs)[source]

Load the data from a Parsons table into a blob.

Args:
table: obj

A Parsons Table

container_name: str

The container name to upload the data into

blob_name: str

The blob name to upload the data into

data_type: str

The file format to use when writing the data. One of: csv or json

kwargs:

Additional keyword arguments to supply to put_blob

Returns:

BlobClient