Client¶

Client represents a higher level interface to training API.

class abeja.training.Client(organization_id: Optional[str] = None, credential: Optional[Dict[str, str]] = None, timeout: Optional[int] = None, max_retry_count: Optional[int] = None)¶

A High-Level client for Training API

from abeja.training import Client

client = Client(organization_id='1234567890123')

Params:

organization_id (str): The organization ID. Takes from os.environ['ABEJA_ORGANIZATION_ID'] if omitted.
credential (dict): [optional] This parameter will be passed to its undering APIClient. See the section Client Parameter for more details about how to specify this parameter.
timeout (int): [optional] This parameter will be passed to its undering APIClient.
max_retry_count (int): [optional] This parameter will be passed to its undering APIClient.

job_definitions() → abeja.training.job_definition.JobDefinitions¶

Get a adapter object for handling training job definitions in the organization.

Request syntax:

adapter = client.job_definitions()
definition = adapter.get(job_definition_name)

Return type:

JobDefinitions object

Entity classes¶

JobDefinition¶

class abeja.training.JobDefinition(api: abeja.training.api.client.APIClient, organization_id: str, job_definition_id: str, name: str, version_count: int, model_count: int, notebook_count: int, tensorboard_count: int, versions: Optional[List[abeja.training.job_definition_version.JobDefinitionVersion]], jobs: Optional[list], archived: bool, created_at: str, modified_at: str)¶

Training job definition object.

property archived¶: Get whether this job definition is archived or not.

property created_at¶: Get the created date string (ISO 8601) of this job definition.

classmethod from_response(api: abeja.training.api.client.APIClient, organization_id: str, response: Dict[str, Any]) → abeja.training.job_definition.JobDefinition¶

Construct an object from API response.

NOTE: For convenient, this method DOES NOT validate the input response and always returns an object filled with default values.

property job_definition_id¶: Get the job definition ID of this job definition.

job_definition_versions() → abeja.training.job_definition_version.JobDefinitionVersions¶

Return a adapter object for handling training job definition versions under this job definition.

Request syntax:

adapter = definition.job_definition_versions()
version = adapter.get(job_definition_version_id=1)

Return type:

JobDefinitionVersions object

jobs() → abeja.training.job.Jobs¶

Return a adapter object for handling training jobs under this job definition.

Request syntax:

adapter = definition.jobs()
job = adapter.get(job_id='1234567890123')

Return type:

Jobs object

property model_count¶: Get the model count of this job definition.

models() → abeja.training.model.Models¶

Return a adapter object for handling training models under this job definition.

Request syntax:

adapter = definition.models()
model = adapter.get(model_id='1234567890123')

Return type:

Models object

property modified_at¶: Get the modified date string (ISO 8601) of this job definition.

property name¶: Get the name of this job definition.

property notebook_count¶: Get the notebook count of this job definition.

property organization_id¶: Get the organization ID of this job definition.

property tensorboard_count¶: Get the tensorboard count of this job definition.

property version_count¶: Get the version count of this job definition.

property versions¶: Get the versions of this job definition.

JobDefinitionVersion¶

class abeja.training.JobDefinitionVersion(api: abeja.training.api.client.APIClient, organization_id: str, job_definition_id: str, job_definition_version_id: int, handler: str, image: abeja.common.docker_image_name.DockerImageName, environment: Dict[str, str], description: str, archived: bool, created_at: str, modified_at: str, job_definition: Optional[abeja.training.job_definition.JobDefinition] = None)¶

Training job definition version object.

property archived¶: Get whether this job definition is archived or not.

property created_at¶: Get the created date string (ISO 8601) of this job definition version.

property description¶: Get the description of this job definition version.

property environment¶: Get the environment variables of this job definition version.

classmethod from_response(api: abeja.training.api.client.APIClient, organization_id: str, response: Dict[str, Any], job_definition: Optional[abeja.training.job_definition.JobDefinition] = None) → abeja.training.job_definition_version.JobDefinitionVersion¶

Construct an object from API response.

NOTE: For convenient, this method DOES NOT validate the input response and always returns an object filled with default values.

property handler¶: Get the handler of this job definition version.

property image¶: Get the DockerImageName of this job definition version.

property job_definition¶

property job_definition_id¶: Get the job_definition ID of this job definition version.

property job_definition_version_id¶: Get the version of this job definition version.

property modified_at¶: Get the modified date string (ISO 8601) of this job definition version.

property organization_id¶: Get the organization ID of this job definition version.

Job¶

class abeja.training.Job(api: abeja.training.api.client.APIClient, organization_id: str, job_definition_id: str, job_definition_version_id: int, job_id: str, instance_type: abeja.common.instance_type.InstanceType, exec_env: abeja.common.exec_env.ExecEnv, environment: Dict[str, str], statistics: Optional[abeja.training.statistics.Statistics], status_message: Optional[str], status: abeja.training.job_status.JobStatus, description: str, datasets: Dict[str, str], creator: Optional[abeja.user.user.User], archived: bool, start_time: str, completion_time: str, created_at: str, modified_at: str, job_definition: Optional[abeja.training.job_definition.JobDefinition] = None, job_definition_version: Optional[abeja.training.job_definition_version.JobDefinitionVersion] = None)¶

Training job object.

property archived¶: Get the archived of this job.

property completion_time¶: Get the completion time of this job.

property created_at¶: Get the created datetime string of this job.

property creator¶: Get the creator of this job.

property datasets¶: Get the datasets of this job.

property description¶: Get the description of this job.

property environment¶: Get environment variables of this job.

property exec_env¶: Get the execution environment which this job runs.

classmethod from_response(api: abeja.training.api.client.APIClient, organization_id: str, response: Dict[str, Any], job_definition: Optional[abeja.training.job_definition.JobDefinition] = None, job_definition_version: Optional[abeja.training.job_definition_version.JobDefinitionVersion] = None) → abeja.training.job.Job¶

Construct an object from API response.

NOTE: For convenient, this method DOES NOT validate the input response and always returns an object filled with default values.

property instance_type¶: Get the instance type of this job.

property job_definition¶: Get the job definition of this job.

property job_definition_id¶: Get the job definition id of this job.

property job_definition_version¶: Get the job definition version of this job.

property job_definition_version_id¶: Get the job definition version id of this job.

property job_id¶: Get the id of this job.

property modified_at¶: Get the modified datetime string of this job.

property organization_id¶: Get the organization id of this job.

property start_time¶: Get the start time of this job.

property statistics¶: Get the statistics of this job.

property status¶: Get the current status of this job.

property status_message¶: Get the status_message of this job.

Model¶

class abeja.training.Model(api: abeja.training.api.client.APIClient, organization_id: str, job_definition_id: str, job_id: Optional[str], model_id: str, description: Optional[str], metrics: Dict[str, Any], environment: Dict[str, str], exec_env: abeja.common.exec_env.ExecEnv, creator: Optional[abeja.user.user.User], archived: bool, created_at: str, modified_at: str, job_definition: Optional[abeja.training.job_definition.JobDefinition] = None, job: Optional[abeja.training.job.Job] = None)¶

Training model object.

Training model object is a representation of a machine learning model file.

Training Job can generate single or multiple training models.
You can upload your local model files which are on the local machine.

property archived¶: Get the archived of this model.

property created_at¶: Get the created_at of this model.

property creator¶: Get the creator of this model.

property description¶: Get the description of this model.

property environment¶: Get the environment of this model.

property exec_env¶: Get the exec_env of this model.

classmethod from_response(api: abeja.training.api.client.APIClient, organization_id: str, response: Dict[str, Any], job_definition: Optional[abeja.training.job_definition.JobDefinition] = None, job: Optional[abeja.training.job.Job] = None) → abeja.training.model.Model¶

Construct an object from API response.

NOTE: For convenient, this method DOES NOT validate the input response and always returns an object filled with default values.

property job¶: Get the job of this model.

property job_definition¶: Get the job definition of this model.

property job_definition_id¶: Get the job definition id of this model.

property job_id¶: Get the Job ID of this model. Returns None if the model doesn’t have a back reference to a job.

property metrics¶: Get the metrics of this model.

property model_id¶: Get the model_id of this model.

property modified_at¶: Get the modified_at of this model.

property organization_id¶: Get the organization id of this model.

Adapter classes¶

JobDefinitions¶

class abeja.training.JobDefinitions(api: abeja.training.api.client.APIClient, organization_id: str)¶

The training job definition adapter class.

archive(name: str)¶

Archive a training job definition.

Request Syntax:

definitions.archive(name=job_definition_name)

Params:

name (str): The identifier of a training job definition. It can be either name or job_definition_id.

create(name: str) → abeja.training.job_definition.JobDefinition¶

Create a new training job definition.

Request Syntax:

definition = definitions.create(name)

Params:

name (str): training job definition name

Return type:

JobDefinition object

delete(name: str)¶

Delete a training job definition.

Request Syntax:

definitions.delete(name=job_definition_name)

Params:

name (str): The identifier of a training job definition. It can be either name or job_definition_id.

get(name: str, include_jobs: Optional[bool] = False) → abeja.training.job_definition.JobDefinition¶

Get a training job definition.

Request Syntax:

definition = definitions.get(name=job_definition_name)

Params:

name (str): The identifier of a training job definition. It can be either name or job_definition_id.
include_jobs (bool): If True, also returns training jobs in response. (Default: False)

Return type:

JobDefinition object

list(filter_archived: Optional[bool] = None, offset: Optional[int] = None, limit: Optional[int] = None) → abeja.training.common.SizedIterable[abeja.training.job_definition.JobDefinition]¶

Returns an iterator object that iterates training job definitions under this object.

This method returns an instance of SizedIterable, so you can get the total number of training job definitions.

Params:

filter_archived (bool): [optional] If true, include archived jobs, otherwise exclude archived jobs. (default: false)
offset (int): [optional] paging offset.
limit (int): [optional] paging limit.

Return type:

SizedIterable[JobDefinition]

property organization_id¶: Get the organization ID.

unarchive(name: str)¶

Unarchive a training job definition.

Request Syntax:

definitions.unarchive(name=job_definition_name)

Params:

name (str): The identifier of a training job definition. It can be either name or job_definition_id.

JobDefinitionVersions¶

class abeja.training.JobDefinitionVersions(api: abeja.training.api.client.APIClient, job_definition: abeja.training.job_definition.JobDefinition)¶

The training job definition version adapter class.

archive(job_definition_version_id: int)¶

Archive a training job definition version.

Request Syntax:

versions.archive(job_definition_version_id=5)

Params:

job_definition_version_id (int): the version number

create(source: Union[List[str], IO], handler: str, image: abeja.common.docker_image_name.DockerImageName, environment: Optional[Dict[str, Any]] = None, description: Optional[str] = None)¶

Create a new training job definition version.

Request Syntax:

from abeja.common.docker_image_name import ALL_GPU_19_10

version = versions.create(
    source=['train.py'],
    handler='train:handler',
    image=ALL_GPU_19_10,
    environment={'key': 'value'},
    description='new version')

Params:

source (List[str] | IO): an input source for training code. It’s one of: - zip or tar.gz archived file-like object. - a list of file paths.
image (DockerImageName): runtime environment
environment (Optional[dict]): user defined parameters set as environment variables
description (Optional[str]): description

Return type:

JobDefinitionVersion object

delete(job_definition_version_id: int)¶

Delete a training job definition version.

Request Syntax:

versions.delete(job_definition_version_id=5)

Params:

job_definition_version_id (int): the version number

get(job_definition_version_id: int) → abeja.training.job_definition_version.JobDefinitionVersion¶

Get a training job definition version.

Request Syntax:

version = versions.get(job_definition_version_id=5)

Params:

job_definition_version_id (int): the version number

Return type:

JobDefinitionVersion object

property job_definition_id¶: Get the job definition ID.

property job_definition_name¶: Get the job definition name.

list(filter_archived: Optional[bool] = None) → abeja.training.common.SizedIterable[abeja.training.job_definition_version.JobDefinitionVersion]¶

Returns an iterator object that iterates training job definition versions under this object.

This method returns an instance of SizedIterable, so you can get the total number of training job definition versions.

Params:

filter_archived (bool): [optional] If true, include archived jobs, otherwise exclude archived jobs. (default: false)

Return type:

SizedIterable[JobDefinitionVersion]

property organization_id¶: Get the organization ID.

unarchive(job_definition_version_id: int)¶

Unarchive a training job definition version.

Request Syntax:

versions.unarchive(job_definition_version_id=5)

Params:

job_definition_version_id (int): the version number

update(job_definition_version_id: int, description: str) → abeja.training.job_definition_version.JobDefinitionVersion¶

Update a training job definition version.

Request Syntax:

version = versions.update(job_definition_version_id=5, description='new version')

Params:

job_definition_version_id (int): the version number

Return type:

JobDefinitionVersion object

Jobs¶

class abeja.training.Jobs(api: abeja.training.api.client.APIClient, job_definition: abeja.training.job_definition.JobDefinition)¶

The training jobs adapter class.

archive(job_id: str) → None¶

Archive a training job.

Request Syntax:

job = jobs.archive(job_id)

Params:

job_id (str): Job ID

create(job_definition_version_id: int, instance_type: abeja.common.instance_type.InstanceType, datasets: Optional[Dict[str, str]] = None, environment: Optional[Dict[str, Any]] = None, description: Optional[str] = None, export_log: Optional[bool] = None) → abeja.training.job.Job¶

Create a new training job.

Request Syntax:

job = jobs.create(
    job_definition_version_id=5,
    instance_type=InstanceType.parse('gpu-1'))

Params:

job_definition_version_id (int): training job version
instance_type (InstanceType): instance type of running environment
datasets (dict): [optional] datasets, combination of alias and dataset_id
environment (dict): [optional] user defined parameters set as environment variables
description (str): [optional] description of this job
export_log (bool): [optional] If true, include the log in the model. This feature is only available with 19.04 or later images. (default: false)

Return type:

Job object

get(job_id: str) → abeja.training.job.Job¶

Get a training job.

Request Syntax:

job = jobs.get(job_id)

Params:

job_id (str): Job ID

Return type:

Job object

get_artifacts(job_id: str) → abeja.training.job.JobArtifacts¶

Get artifacts object of this job.

Request Syntax:

job = jobs.get_artifacts(job_id)

Params:

job_id (str): Job ID

property job_definition_id¶: Get the job definition ID.

property job_definition_name¶: Get the job definition name.

list(filter_archived: Optional[bool] = None, offset: Optional[int] = None, limit: Optional[int] = None) → abeja.training.common.SizedIterable[abeja.training.job.Job]¶

Returns an iterator object that iterates training jobs under this object.

This method returns an instance of SizedIterable, so you can get the total number of training jobs.

Params:

filter_archived (bool): [optional] If true, include archived jobs, otherwise exclude archived jobs. (default: false)
offset (int): [optional] paging offset.
limit (int): [optional] paging limit.

Return type:

SizedIterable[Job]

property organization_id¶: Get the organization ID.

stop(job_id: str) → None¶

Stop a training job.

Request Syntax:

job = jobs.stop(job_id)

Params:

job_id (str): Job ID

unarchive(job_id: str) → None¶

Unarchive a training job.

Request Syntax:

job = jobs.unarchive(job_id)

Params:

job_id (str): Job ID

update_statistics(job_id: str, statistics: Optional[abeja.training.statistics.Statistics]) → Optional[abeja.training.job.Job]¶

Notify a job statistics for ABEJA Platform.

Request Syntax:

from abeja.training import Statistics

statistics = Statistics(num_epochs=10, epoch=1)
statistics.add_stage(name=Statistics.STAGE_TRAIN, accuracy=90.0, loss=0.10)
statistics.add_stage(name=Statistics.STAGE_VALIDATION, accuracy=75.0, loss=0.07)

jobs.update_statistics(job_id, statistics)

Params:

job_id (str): Job ID
statistics (Statistics): statistics

Return type:

Job object

Models¶

class abeja.training.Models(api: abeja.training.api.client.APIClient, job_definition: abeja.training.job_definition.JobDefinition)¶

The training models adapter class.

archive(model_id: str) → None¶

Archive a training model.

Request Syntax:

model = models.archive(model_id)

Params:

model_id (str): Job ID

create(model_data: IO, job_id: Optional[str] = None, environment: Optional[Dict[str, Any]] = None, metrics: Optional[Dict[str, Any]] = None, description: Optional[str] = None) → abeja.training.model.Model¶

Create a new training model.

Request Syntax:

model = models.create(
    model_data,
    environment={'BATCH_SIZE': 32, 'EPOCHS': 50},
    metrics={'acc': 0.76, 'loss': 1.99})

Params:

model_data (IO): An input source for ML model. It must be a zip archived file like object
job_id (str): [optional] job identifer
environment (dict): [optional] user defined parameters set as environment variables
metrics (dict): [optional] user defined metrics for this model
description (str): [optional] description

Return type:

Model object

get(model_id: str) → abeja.training.model.Model¶

Get a training model.

Request Syntax:

model = models.get(model_id)

Params:

model_id (str): Model ID

Return type:

Model object

get_download_uri(model_id: str) → str¶

Get download URL for training model.

Request Syntax:

uri = models.get_download_uri(model_id)

Params:

model_id (str): Model ID

Return type:

str

property job_definition_id¶: Get the job definition ID.

property job_definition_name¶: Get the job definition name.

list(filter_archived: Optional[bool] = None) → abeja.training.common.SizedIterable[abeja.training.model.Model]¶

Returns an iterator object that iterates training models under this object.

This method returns an instance of SizedIterable, so you can get the total number of training models.

Params:

filter_archived (bool): [optional] If true, include archived models, otherwise exclude archived models. (default: false)

Return type:

SizedIterable[Model]

property organization_id¶: Get the organization ID.

unarchive(model_id: str) → None¶

Unarchive a training model.

Request Syntax:

model = models.unarchive(model_id)

Params:

model_id (str): Job ID

update(model_id: str, description: str) → abeja.training.model.Model¶

Update a training model.

Request Syntax:

model = models.update(model_id, 'description')

Params:

model_id (str): Model ID
description (str): description

Return type:

Model object

Subresources¶

Statistics¶

class abeja.training.Statistics(num_epochs: Optional[int] = None, epoch: Optional[int] = None, progress_percentage: Optional[float] = None, **kwargs)¶

STAGE_TRAIN = 'train'¶

STAGE_VALIDATION = 'validation'¶

add_stage(name: str, accuracy: Optional[float] = None, loss: Optional[float] = None, **kwargs) → None¶

add stage information

Params:

name (str): name of stage. It have prepared STAGE_TRAIN and STAGE_VALIDATION as constants, but you can set arbitrary character strings.
accuracy (float): accuracy rate that value needs between 0 and 1.
loss (float): loss rate that value needs between 0 and 1.

Returns:

None

Raises:

ValueError

classmethod from_response(response: Optional[Dict[str, Any]]) → Optional[abeja.training.statistics.Statistics]¶

get_statistics() → dict¶

get stage information

Return type:

dict

Returns:

Response Syntax:

{
    'num_epochs': 10,
    'epoch': 1,
    'progress_percentage': 90,
}

JobStatus¶

class abeja.training.JobStatus(value)¶

Set of job statuses which indicates a job is pending, running or failed and what ever.

PENDING: Necessary resources for running job is currently prepared
ACTIVE: The job is actively running
STOPPED: The job was stopped by user
COMPLETE: The job was successfully completed
FAILED: The job was failed by some reason

JobArtifacts¶

class abeja.training.JobArtifacts(download_uri: str)¶

property download_uri¶: Return the download URI where artifacts archive file exists.

classmethod from_response(response: Dict[str, Any]) → abeja.training.job.JobArtifacts¶

Table Of Contents

Client¶

Entity classes¶

JobDefinition¶

JobDefinitionVersion¶

Job¶

Model¶

Adapter classes¶

JobDefinitions¶

JobDefinitionVersions¶

Jobs¶

Models¶

Subresources¶

Statistics¶

JobStatus¶

JobArtifacts¶