Client

Client represents a higher level interface to datasets API.

class abeja.datasets.Client(organization_id: typing.Union[str, NoneType] = None, credential: typing.Union[typing.Dict[str, str], NoneType] = None, timeout: typing.Union[int, NoneType] = None) → None

A High-Level client for Dataset API

from abeja.datasets import Client

client = Client()
datasets

Get datasets object

Request syntax:
datasets = client.datasets
Returns:
Datasets
get_dataset(dataset_id: str) → abeja.datasets.dataset.Dataset

Get dataset for specific dataset_id

Request syntax:
response = client.get_dataset(dataset_id='1234567890123')
Params:
  • dataset_id (str): dataset id
Return type:
Dataset

Dataset

class abeja.datasets.dataset.Dataset(api: abeja.datasets.api.client.APIClient, organization_id: str, dataset_id: str, name: typing.Union[str, NoneType] = None, type: typing.Union[str, NoneType] = None, props: typing.Union[dict, NoneType] = None, total_count: typing.Union[int, NoneType] = None, created_at: typing.Union[str, NoneType] = None, updated_at: typing.Union[str, NoneType] = None, **kwargs) → None

a model class for a dataset

Properties:
  • organization_id (str)
  • dataset_id (str)
  • name (str)
  • type (str)
  • props (dict)
  • total_count (int)
  • created_at (datetime)
  • updated_at (datetime)
dataset_items

Get dataset Items object

Request syntax:
dataset = client.get_dataset(dataset_id='1410805969256')
dataset_items = dataset.dataset_items
Returns:
DatasetItem object

Datasets

class abeja.datasets.dataset.Datasets(api: abeja.datasets.api.client.APIClient, organization_id: str) → None

a class for handling datasets

create(name: str, type: str, props: dict) → abeja.datasets.dataset.Dataset

create a dataset

API reference: POST /organizations/<organization_id>/datasets/

Request Syntax:
name = "test-dataset"
dataset_type = "classification"
props = {
    "categories": [
        {
            "labels": [
                {
                    "label_id": 1,
                    "label": "dog"
                },
                {
                    "label_id": 2,
                    "label": "cat"
                },
                {
                    "label_id": 3,
                    "label": "others"
                }
            ],
            "category_id": 1,
            "name": "cats_dogs"
        }
    ]
}
response = datasets.create(name, dataset_type, props)
Params:
  • name (str): dataset name
  • type (str): dataset types eg: classification, detection
  • props (dict): properties of dataset
Return type:
Dataset object
delete(dataset_id: str) → abeja.datasets.dataset.Dataset

delete a dataset

Request syntax:
response = datasets.delete(dataset_id='1377232365920')
Params:
  • dataset_id (str): dataset id
Response type:
Dataset object
get(dataset_id: str) → abeja.datasets.dataset.Dataset

get a dataset

Request syntax:
response = datasets.get(dataset_id=1410805969256)
Params:
  • dataset_id (str): dataset id
Return type:
Dataset object
list() → typing.List[abeja.datasets.dataset.Dataset]

Get dataset list

Request syntax:
response = datasets.list()
Response type:
list of Dataset object

Dataset Item

class abeja.datasets.dataset_item.DatasetItem(api: abeja.datasets.api.client.APIClient, organization_id: str, dataset_id: str, dataset_item_id: str, **kwargs) → None

a model class for DatasetItem

Properties:
  • organization_id (str)
  • dataset_id (str)
  • dataset_item_id (int)
  • attributes (dict)
  • created_at (datetime)
  • updated_at (datetime)
  • source_data (list)
asdict()

Dataset Items

class abeja.datasets.dataset_item.DatasetItems(api: abeja.datasets.api.client.APIClient, organization_id: str, dataset_id: str) → None

a class for a dataset item

from abeja.datasets import Client

client = Client()
dataset = client.get_dataset(dataset_id='1410805969256')
dataset_items = dataset.dataset_items
bulk_update(bulk_attributes: dict) → abeja.datasets.dataset_item.DatasetItem

Update a datset item in bulk.

Request syntax:
bulk_attributes = [
    {
        "dataset_item_id": 1111111111111,
        "attributes": {
            "classification": [
                {
                    "category_id": 1,
                    "label_id": 1
                }
            ],
            "custom_format": {
                "anything": "something"
                   },
            "detection": [
                {
                    "category_id": 1,
                    "label_id": 2,
                    "rect": {
                        "xmin": 22,
                        "ymin": 145,
                        "xmax": 140,
                        "ymax": 220
                    }
                }
            ]
        }
    }
]

response = dataset_items.bulk_update(bulk_attributes=bulk_attributes)
Params:
  • bulk_attributes (dict): list of attributes.
Return type:
return the updateed dataset item list DatasetItem object
create(source_data: typing.List[dict], attributes: dict) → abeja.datasets.dataset_item.DatasetItem

create a item in dataset

Request syntax:
source_data = [
    {
        "data_type": "image/jpeg",
        "data_uri": "datalake://1200123803688/20170815T044617-f20dde80-1e3b-4496-bc06-1b63b026b872",
        "height": 500,
        "width": 200
    }
]

attributes = {
    "classification": [
        {
            "category_id": 1,
            "label_id": 1,
        }
    ],
    "detection": [
        {
            "category_id": 1,
            "label_id": 2,
            "rect": {
                "xmin": 22,
                "ymin": 145,
                "xmax": 140,
                "ymax": 220
            }
        }
    ]
    "custom": [
        {
            "anything": "something"
        }
    ]
}

response = dataset_items.create(source_data=source_data, attributes=attributes)
Params:
  • source_data (list): meta data annotated to source data.
  • attribute (dict): list of source data stored in external storage.
Return type:
DatasetItem object
delete(dataset_item_id: str) → abeja.datasets.dataset_item.DatasetItem

Delete a datset item.

Request syntax:
response = dataset_items.delete(dataset_item_id=0)
Params:
-dataset_item_id (int): dataset item id
Return type:
return the deleted dataset item DatasetItem object
get(dataset_item_id: str) → abeja.datasets.dataset_item.DatasetItem

get a item in dataset

Request syntax:
response = dataset_items.get(dataset_item_id=0)
Params:
  • dataset_item_id (int): dataset item id
Return type:
DatasetItem object
list(next_page_token: typing.Union[str, NoneType] = None, limit: typing.Union[int, NoneType] = None, prefetch: bool = False) → abeja.datasets.dataset_item.DatasetItemIterator

generate all dataset_items in a dataset

Request syntax:
dataset_item_iter = dataset_items.list()

# list all dataset items
dataset_items = list(dataset_item_iter)

# or get the first dataset item
dataset_item = next(dataset_item_iter)
Params:
  • next_page_token (str) : next page token to get the next items. [optional]
  • limit (int): limit of items. [optional]
  • prefetch (bool) : False by default. if True, download source_data of all dataset_item concurrently (therefore the order of dataset_items can be changed) and save them in the path specified in environment variable as ABEJA_STORAGE_DIR_PATH or current directory by default. [optional]
Return type:
DatasetItemIterator object
update(dataset_item_id: str, attributes: dict) → abeja.datasets.dataset_item.DatasetItem

Update a datset item.

Request syntax:
attributes = {
    "classification": [
        {
            "category_id": 1,
            "label_id": 1,
        }
    ],
    "detection": [
        {
            "category_id": 1,
            "label_id": 2,
            "rect": {
                "xmin": 22,
                "ymin": 145,
                "xmax": 140,
                "ymax": 220
            }
        }
    ]
    "custom": [
        {
            "anything": "something"
        }
    ]
}

response = dataset_items.update(dataset_item_id=0, attributes=attributes)
Params:
  • dataset_item_id (int): dataset item id
  • attribute (dict): list of source data stored in external storage.
Return type:
return the updateed dataset item DatasetItem object