Client represents a higher level interface to datasets API.
abeja.datasets.
Client
(organization_id: Optional[str] = None, credential: Optional[Dict[str, str]] = None, timeout: Optional[int] = None)¶A High-Level client for Dataset API
from abeja.datasets import Client
client = Client()
get_dataset
(dataset_id: str) → abeja.datasets.dataset.Dataset¶Get dataset for specific dataset_id
response = client.get_dataset(dataset_id='1234567890123')
dataset_id (str): dataset id
abeja.datasets.dataset.
Dataset
(api: abeja.datasets.api.client.APIClient, organization_id: str, dataset_id: str, name: Optional[str] = None, type: Optional[str] = None, props: Optional[dict] = None, total_count: Optional[int] = None, created_at: Optional[str] = None, updated_at: Optional[str] = None, **kwargs)¶a model class for a dataset
organization_id (str)
dataset_id (str)
name (str)
type (str)
props (dict)
total_count (int)
created_at (datetime)
updated_at (datetime)
dataset_items
¶Get dataset Items object
dataset = client.get_dataset(dataset_id='1410805969256')
dataset_items = dataset.dataset_items
DatasetItem
object
abeja.datasets.dataset.
Datasets
(api: abeja.datasets.api.client.APIClient, organization_id: str)¶a class for handling datasets
create
(name: str, type: str, props: dict) → abeja.datasets.dataset.Dataset¶create a dataset
API reference: POST /organizations/<organization_id>/datasets/
name = "test-dataset"
dataset_type = "classification"
props = {
"categories": [
{
"labels": [
{
"label_id": 1,
"label": "dog"
},
{
"label_id": 2,
"label": "cat"
},
{
"label_id": 3,
"label": "others"
}
],
"category_id": 1,
"name": "cats_dogs"
}
]
}
response = datasets.create(name, dataset_type, props)
name (str): dataset name
type (str): dataset types eg: classification, detection
props (dict): properties of dataset
Dataset
object
delete
(dataset_id: str) → abeja.datasets.dataset.Dataset¶delete a dataset
response = datasets.delete(dataset_id='1377232365920')
dataset_id (str): dataset id
Dataset
object
get
(dataset_id: str) → abeja.datasets.dataset.Dataset¶get a dataset
response = datasets.get(dataset_id=1410805969256)
dataset_id (str): dataset id
Dataset
object
list
() → List[abeja.datasets.dataset.Dataset]¶Get dataset list
response = datasets.list()
list of Dataset
object
abeja.datasets.dataset_item.
DatasetItem
(api: abeja.datasets.api.client.APIClient, organization_id: str, dataset_id: str, dataset_item_id: str, **kwargs)¶a model class for DatasetItem
organization_id (str)
dataset_id (str)
dataset_item_id (int)
attributes (dict)
created_at (datetime)
updated_at (datetime)
source_data (list)
asdict
()¶abeja.datasets.dataset_item.
DatasetItems
(api: abeja.datasets.api.client.APIClient, organization_id: str, dataset_id: str)¶a class for a dataset item
from abeja.datasets import Client
client = Client()
dataset = client.get_dataset(dataset_id='1410805969256')
dataset_items = dataset.dataset_items
bulk_update
(bulk_attributes: dict) → abeja.datasets.dataset_item.DatasetItem¶Update a datset item in bulk.
bulk_attributes = [
{
"dataset_item_id": 1111111111111,
"attributes": {
"classification": [
{
"category_id": 1,
"label_id": 1
}
],
"custom_format": {
"anything": "something"
},
"detection": [
{
"category_id": 1,
"label_id": 2,
"rect": {
"xmin": 22,
"ymin": 145,
"xmax": 140,
"ymax": 220
}
}
]
}
}
]
response = dataset_items.bulk_update(bulk_attributes=bulk_attributes)
bulk_attributes (dict): list of attributes.
return the updated dataset item list
DatasetItem
object
create
(source_data: List[dict], attributes: dict) → abeja.datasets.dataset_item.DatasetItem¶create a item in dataset
source_data = [
{
"data_type": "image/jpeg",
"data_uri": "datalake://1200123803688/20170815T044617-f20dde80-1e3b-4496-bc06-1b63b026b872",
"height": 500,
"width": 200
}
]
attributes = {
"classification": [
{
"category_id": 1,
"label_id": 1,
}
],
"detection": [
{
"category_id": 1,
"label_id": 2,
"rect": {
"xmin": 22,
"ymin": 145,
"xmax": 140,
"ymax": 220
}
}
]
"custom": [
{
"anything": "something"
}
]
}
response = dataset_items.create(source_data=source_data, attributes=attributes)
source_data (list): meta data annotated to source data.
attribute (dict): list of source data stored in external storage.
DatasetItem
object
delete
(dataset_item_id: str) → abeja.datasets.dataset_item.DatasetItem¶Delete a datset item.
response = dataset_items.delete(dataset_item_id=0)
-dataset_item_id (int): dataset item id
return the deleted dataset item
DatasetItem
object
get
(dataset_item_id: str) → abeja.datasets.dataset_item.DatasetItem¶get a item in dataset
response = dataset_items.get(dataset_item_id=0)
dataset_item_id (int): dataset item id
DatasetItem
object
list
(next_page_token: Optional[str] = None, limit: Optional[int] = None, prefetch: bool = False) → abeja.datasets.dataset_item.DatasetItemIterator¶generate all dataset_items in a dataset
dataset_item_iter = dataset_items.list()
# list all dataset items
dataset_items = list(dataset_item_iter)
# or get the first dataset item
dataset_item = next(dataset_item_iter)
next_page_token (str) : next page token to get the next items. [optional]
limit (int): limit of items. [optional]
prefetch (bool) : False by default. if True, download source_data of all dataset_item
concurrently (therefore the order of dataset_items can be changed) and save them in
the path specified in environment variable as ABEJA_STORAGE_DIR_PATH
or current
directory by default. [optional]
DatasetItemIterator
object
update
(dataset_item_id: str, attributes: dict) → abeja.datasets.dataset_item.DatasetItem¶Update a datset item.
attributes = {
"classification": [
{
"category_id": 1,
"label_id": 1,
}
],
"detection": [
{
"category_id": 1,
"label_id": 2,
"rect": {
"xmin": 22,
"ymin": 145,
"xmax": 140,
"ymax": 220
}
}
]
"custom": [
{
"anything": "something"
}
]
}
response = dataset_items.update(dataset_item_id=0, attributes=attributes)
dataset_item_id (int): dataset item id
attribute (dict): list of source data stored in external storage.
return the updated dataset item
DatasetItem
object