Client

Client represents a higher level interface to training API.

class abeja.training.Client(organization_id: str = None, credential: typing.Dict[str, str] = None, timeout: typing.Union[int, NoneType] = None, max_retry_count: typing.Union[int, NoneType] = None) → None

A High-Level client for Training API

from abeja.training import Client

client = Client(organization_id='1234567890123')
Params:
  • organization_id (str): The organization ID. Takes from os.environ['ABEJA_ORGANIZATION_ID'] if omitted.
  • credential (dict): [optional] This parameter will be passed to its undering APIClient. See the section Client Parameter for more details about how to specify this parameter.
  • timeout (int): [optional] This parameter will be passed to its undering APIClient.
  • max_retry_count (int): [optional] This parameter will be passed to its undering APIClient.
job_definitions() → abeja.training.job_definition.JobDefinitions

Get a adapter object for handling training job definitions in the organization.

Request syntax:
adapter = client.job_definitions()
definition = adapter.get(job_definition_name)
Return type:
JobDefinitions object

Entity classes

JobDefinition

class abeja.training.JobDefinition(api: abeja.training.api.client.APIClient, organization_id: str, job_definition_id: str, name: str, version_count: int, model_count: int, notebook_count: int, tensorboard_count: int, versions: typing.Union[typing.List[_ForwardRef('job_definition_version.JobDefinitionVersion')], NoneType], jobs: typing.Union[list, NoneType], archived: bool, created_at: str, modified_at: str) → None

Training job definition object.

archived

Get whether this job definition is archived or not.

created_at

Get the created date string (ISO 8601) of this job definition.

classmethod from_response(api: abeja.training.api.client.APIClient, organization_id: str, response: typing.Dict[str, typing.Any]) → abeja.training.job_definition.JobDefinition

Construct an object from API response.

NOTE: For convenient, this method DOES NOT validate the input response and always returns an object filled with default values.

job_definition_id

Get the job definition ID of this job definition.

job_definition_versions() → abeja.training.job_definition_version.JobDefinitionVersions

Return a adapter object for handling training job definition versions under this job definition.

Request syntax:
adapter = definition.job_definition_versions()
version = adapter.get(job_definition_version_id=1)
Return type:
JobDefinitionVersions object
jobs() → abeja.training.job.Jobs

Return a adapter object for handling training jobs under this job definition.

Request syntax:
adapter = definition.jobs()
job = adapter.get(job_id='1234567890123')
Return type:
Jobs object
model_count

Get the model count of this job definition.

models() → abeja.training.model.Models

Return a adapter object for handling training models under this job definition.

Request syntax:
adapter = definition.models()
model = adapter.get(model_id='1234567890123')
Return type:
Models object
modified_at

Get the modified date string (ISO 8601) of this job definition.

name

Get the name of this job definition.

notebook_count

Get the notebook count of this job definition.

organization_id

Get the organization ID of this job definition.

tensorboard_count

Get the tensorboard count of this job definition.

version_count

Get the version count of this job definition.

versions

Get the versions of this job definition.

JobDefinitionVersion

class abeja.training.JobDefinitionVersion(api: abeja.training.api.client.APIClient, organization_id: str, job_definition_id: str, job_definition_version_id: int, handler: str, image: abeja.common.docker_image_name.DockerImageName, environment: typing.Dict[str, str], description: str, archived: bool, created_at: str, modified_at: str, job_definition: typing.Union[_ForwardRef('job_definition.JobDefinition'), NoneType] = None) → None

Training job definition version object.

archived

Get whether this job definition is archived or not.

created_at

Get the created date string (ISO 8601) of this job definition version.

description

Get the description of this job definition version.

environment

Get the environment variables of this job definition version.

classmethod from_response(api: abeja.training.api.client.APIClient, organization_id: str, response: typing.Dict[str, typing.Any], job_definition: typing.Union[_ForwardRef('job_definition.JobDefinition'), NoneType] = None) → abeja.training.job_definition_version.JobDefinitionVersion

Construct an object from API response.

NOTE: For convenient, this method DOES NOT validate the input response and always returns an object filled with default values.

handler

Get the handler of this job definition version.

image

Get the DockerImageName of this job definition version.

job_definition
job_definition_id

Get the job_definition ID of this job definition version.

job_definition_version_id

Get the version of this job definition version.

modified_at

Get the modified date string (ISO 8601) of this job definition version.

organization_id

Get the organization ID of this job definition version.

Job

class abeja.training.Job(api: abeja.training.api.client.APIClient, organization_id: str, job_definition_id: str, job_definition_version_id: int, job_id: str, instance_type: abeja.common.instance_type.InstanceType, exec_env: abeja.common.exec_env.ExecEnv, environment: typing.Dict[str, str], statistics: typing.Union[abeja.training.statistics.Statistics, NoneType], status_message: typing.Union[str, NoneType], status: abeja.training.job_status.JobStatus, description: str, datasets: typing.Dict[str, str], creator: typing.Union[abeja.user.user.User, NoneType], archived: bool, start_time: str, completion_time: str, created_at: str, modified_at: str, job_definition: typing.Union[_ForwardRef('job_definition.JobDefinition'), NoneType] = None, job_definition_version: typing.Union[_ForwardRef('job_definition_version.JobDefinitionVersion'), NoneType] = None) → None

Training job object.

archived

Get the archived of this job.

completion_time

Get the completion time of this job.

created_at

Get the created datetime string of this job.

creator

Get the creator of this job.

datasets

Get the datasets of this job.

description

Get the description of this job.

environment

Get environment variables of this job.

exec_env

Get the execution environment which this job runs.

classmethod from_response(api: abeja.training.api.client.APIClient, organization_id: str, response: typing.Dict[str, typing.Any], job_definition: typing.Union[_ForwardRef('job_definition.JobDefinition'), NoneType] = None, job_definition_version: typing.Union[_ForwardRef('job_definition_version.JobDefinitionVersion'), NoneType] = None) → abeja.training.job.Job

Construct an object from API response.

NOTE: For convenient, this method DOES NOT validate the input response and always returns an object filled with default values.

instance_type

Get the instance type of this job.

job_definition

Get the job definition of this job.

job_definition_id

Get the job definition id of this job.

job_definition_version

Get the job definition version of this job.

job_definition_version_id

Get the job definition version id of this job.

job_id

Get the id of this job.

modified_at

Get the modified datetime string of this job.

organization_id

Get the organization id of this job.

start_time

Get the start time of this job.

statistics

Get the statistics of this job.

status

Get the current status of this job.

status_message

Get the status_message of this job.

Model

class abeja.training.Model(api: abeja.training.api.client.APIClient, organization_id: str, job_definition_id: str, job_id: typing.Union[str, NoneType], model_id: str, description: typing.Union[str, NoneType], metrics: typing.Dict[str, typing.Any], environment: typing.Dict[str, str], exec_env: abeja.common.exec_env.ExecEnv, creator: typing.Union[abeja.user.user.User, NoneType], archived: bool, created_at: str, modified_at: str, job_definition: typing.Union[_ForwardRef('job_definition.JobDefinition'), NoneType] = None, job: typing.Union[_ForwardRef('job.Job'), NoneType] = None) → None

Training model object.

Training model object is a representation of a machine learning model file.

  • Training Job can generate single or multiple training models.
  • You can upload your local model files which are on the local machine.
archived

Get the archived of this model.

created_at

Get the created_at of this model.

creator

Get the creator of this model.

description

Get the description of this model.

environment

Get the environment of this model.

exec_env

Get the exec_env of this model.

classmethod from_response(api: abeja.training.api.client.APIClient, organization_id: str, response: typing.Dict[str, typing.Any], job_definition: typing.Union[_ForwardRef('job_definition.JobDefinition'), NoneType] = None, job: typing.Union[_ForwardRef('job.Job'), NoneType] = None) → abeja.training.model.Model

Construct an object from API response.

NOTE: For convenient, this method DOES NOT validate the input response and always returns an object filled with default values.

job

Get the job of this model.

job_definition

Get the job definition of this model.

job_definition_id

Get the job definition id of this model.

job_id

Get the Job ID of this model. Returns None if the model doesn’t have a back reference to a job.

metrics

Get the metrics of this model.

model_id

Get the model_id of this model.

modified_at

Get the modified_at of this model.

organization_id

Get the organization id of this model.

Adapter classes

JobDefinitions

class abeja.training.JobDefinitions(api: abeja.training.api.client.APIClient, organization_id: str) → None

The training job definition adapter class.

archive(name: str)

Archive a training job definition.

Request Syntax:
definitions.archive(name=job_definition_name)
Params:
  • name (str): The identifier of a training job definition. It can be either name or job_definition_id.
create(name: str) → abeja.training.job_definition.JobDefinition

Create a new training job definition.

Request Syntax:
definition = definitions.create(name)
Params:
  • name (str): training job definition name
Return type:
JobDefinition object
delete(name: str)

Delete a training job definition.

Request Syntax:
definitions.delete(name=job_definition_name)
Params:
  • name (str): The identifier of a training job definition. It can be either name or job_definition_id.
get(name: str, include_jobs: typing.Union[bool, NoneType] = False) → abeja.training.job_definition.JobDefinition

Get a training job definition.

Request Syntax:
definition = definitions.get(name=job_definition_name)
Params:
  • name (str): The identifier of a training job definition. It can be either name or job_definition_id.
  • include_jobs (bool): If True, also returns training jobs in response. (Default: False)
Return type:
JobDefinition object
list(filter_archived: typing.Union[bool, NoneType] = None, offset: typing.Union[int, NoneType] = None, limit: typing.Union[int, NoneType] = None) → abeja.training.common.SizedIterable[abeja.training.job_definition.JobDefinition]

Returns an iterator object that iterates training job definitions under this object.

This method returns an instance of SizedIterable, so you can get the total number of training job definitions.

Params:
  • filter_archived (bool): [optional] If true, include archived jobs, otherwise exclude archived jobs. (default: false)
  • offset (int): [optional] paging offset.
  • limit (int): [optional] paging limit.
Return type:
SizedIterable[JobDefinition]
organization_id

Get the organization ID.

unarchive(name: str)

Unarchive a training job definition.

Request Syntax:
definitions.unarchive(name=job_definition_name)
Params:
  • name (str): The identifier of a training job definition. It can be either name or job_definition_id.

JobDefinitionVersions

class abeja.training.JobDefinitionVersions(api: abeja.training.api.client.APIClient, job_definition: abeja.training.job_definition.JobDefinition) → None

The training job definition version adapter class.

archive(job_definition_version_id: int)

Archive a training job definition version.

Request Syntax:
versions.archive(job_definition_version_id=5)
Params:
  • job_definition_version_id (int): the version number
create(source: typing.Union[typing.List[str], typing.IO[~AnyStr]], handler: str, image: abeja.common.docker_image_name.DockerImageName, environment: typing.Union[typing.Dict[str, typing.Any], NoneType] = None, description: typing.Union[str, NoneType] = None)

Create a new training job definition version.

Request Syntax:
from abeja.common.docker_image_name import ALL_GPU_19_10

version = versions.create(
    source=['train.py'],
    handler='train:handler',
    image=ALL_GPU_19_10,
    environment={'key': 'value'},
    description='new version')
Params:
  • source (List[str] | IO): an input source for training code. It’s one of: - zip or tar.gz archived file-like object. - a list of file paths.
  • image (DockerImageName): runtime environment
  • environment (Optional[dict]): user defined parameters set as environment variables
  • description (Optional[str]): description
Return type:
JobDefinitionVersion object
delete(job_definition_version_id: int)

Delete a training job definition version.

Request Syntax:
versions.delete(job_definition_version_id=5)
Params:
  • job_definition_version_id (int): the version number
get(job_definition_version_id: int) → abeja.training.job_definition_version.JobDefinitionVersion

Get a training job definition version.

Request Syntax:
version = versions.get(job_definition_version_id=5)
Params:
  • job_definition_version_id (int): the version number
Return type:
JobDefinitionVersion object
job_definition_id

Get the job definition ID.

job_definition_name

Get the job definition name.

list(filter_archived: typing.Union[bool, NoneType] = None) → abeja.training.common.SizedIterable[abeja.training.job_definition_version.JobDefinitionVersion]

Returns an iterator object that iterates training job definition versions under this object.

This method returns an instance of SizedIterable, so you can get the total number of training job definition versions.

Params:
  • filter_archived (bool): [optional] If true, include archived jobs, otherwise exclude archived jobs. (default: false)
Return type:
SizedIterable[JobDefinitionVersion]
organization_id

Get the organization ID.

unarchive(job_definition_version_id: int)

Unarchive a training job definition version.

Request Syntax:
versions.unarchive(job_definition_version_id=5)
Params:
  • job_definition_version_id (int): the version number
update(job_definition_version_id: int, description: str) → abeja.training.job_definition_version.JobDefinitionVersion

Update a training job definition version.

Request Syntax:
version = versions.update(job_definition_version_id=5, description='new version')
Params:
  • job_definition_version_id (int): the version number
Return type:
JobDefinitionVersion object

Jobs

class abeja.training.Jobs(api: abeja.training.api.client.APIClient, job_definition: abeja.training.job_definition.JobDefinition) → None

The training jobs adapter class.

archive(job_id: str) → None

Archive a training job.

Request Syntax:
job = jobs.archive(job_id)
Params:
  • job_id (str): Job ID
create(job_definition_version_id: int, instance_type: abeja.common.instance_type.InstanceType, datasets: typing.Union[typing.Dict[str, str], NoneType] = None, environment: typing.Union[typing.Dict[str, typing.Any], NoneType] = None, description: typing.Union[str, NoneType] = None, export_log: typing.Union[bool, NoneType] = None) → abeja.training.job.Job

Create a new training job.

Request Syntax:
job = jobs.create(
    job_definition_version_id=5,
    instance_type=InstanceType.parse('gpu-1'))
Params:
  • job_definition_version_id (int): training job version
  • instance_type (InstanceType): instance type of running environment
  • datasets (dict): [optional] datasets, combination of alias and dataset_id
  • environment (dict): [optional] user defined parameters set as environment variables
  • description (str): [optional] description of this job
  • export_log (bool): [optional] If true, include the log in the model. This feature is only available with 19.04 or later images. (default: false)
Return type:
Job object
get(job_id: str) → abeja.training.job.Job

Get a training job.

Request Syntax:
job = jobs.get(job_id)
Params:
  • job_id (str): Job ID
Return type:
Job object
get_artifacts(job_id: str) → abeja.training.job.JobArtifacts

Get artifacts object of this job.

Request Syntax:
job = jobs.get_artifacts(job_id)
Params:
  • job_id (str): Job ID
job_definition_id

Get the job definition ID.

job_definition_name

Get the job definition name.

list(filter_archived: typing.Union[bool, NoneType] = None, offset: typing.Union[int, NoneType] = None, limit: typing.Union[int, NoneType] = None) → abeja.training.common.SizedIterable[abeja.training.job.Job]

Returns an iterator object that iterates training jobs under this object.

This method returns an instance of SizedIterable, so you can get the total number of training jobs.

Params:
  • filter_archived (bool): [optional] If true, include archived jobs, otherwise exclude archived jobs. (default: false)
  • offset (int): [optional] paging offset.
  • limit (int): [optional] paging limit.
Return type:
SizedIterable[Job]
organization_id

Get the organization ID.

stop(job_id: str) → None

Stop a training job.

Request Syntax:
job = jobs.stop(job_id)
Params:
  • job_id (str): Job ID
unarchive(job_id: str) → None

Unarchive a training job.

Request Syntax:
job = jobs.unarchive(job_id)
Params:
  • job_id (str): Job ID
update_statistics(job_id: str, statistics: typing.Union[abeja.training.statistics.Statistics, NoneType]) → typing.Union[abeja.training.job.Job, NoneType]

Notify a job statistics for ABEJA Platform.

Request Syntax:
from abeja.training import Statistics

statistics = Statistics(num_epochs=10, epoch=1)
statistics.add_stage(name=Statistics.STAGE_TRAIN, accuracy=90.0, loss=0.10)
statistics.add_stage(name=Statistics.STAGE_VALIDATION, accuracy=75.0, loss=0.07)

jobs.update_statistics(job_id, statistics)
Params:
  • job_id (str): Job ID
  • statistics (Statistics): statistics
Return type:
Job object

Models

class abeja.training.Models(api: abeja.training.api.client.APIClient, job_definition: abeja.training.job_definition.JobDefinition) → None

The training models adapter class.

archive(model_id: str) → None

Archive a training model.

Request Syntax:
model = models.archive(model_id)
Params:
  • model_id (str): Job ID
create(model_data: typing.IO[AnyStr], job_id: typing.Union[str, NoneType] = None, environment: typing.Union[typing.Dict[str, typing.Any], NoneType] = None, metrics: typing.Union[typing.Dict[str, typing.Any], NoneType] = None, description: typing.Union[str, NoneType] = None) → abeja.training.model.Model

Create a new training model.

Request Syntax:
model = models.create(
    model_data,
    environment={'BATCH_SIZE': 32, 'EPOCHS': 50},
    metrics={'acc': 0.76, 'loss': 1.99})
Params:
  • model_data (IO): An input source for ML model. It must be a zip archived file like object
  • job_id (str): [optional] job identifer
  • environment (dict): [optional] user defined parameters set as environment variables
  • metrics (dict): [optional] user defined metrics for this model
  • description (str): [optional] description
Return type:
Model object
get(model_id: str) → abeja.training.model.Model

Get a training model.

Request Syntax:
model = models.get(model_id)
Params:
  • model_id (str): Model ID
Return type:
Model object
get_download_uri(model_id: str) → str

Get download URL for training model.

Request Syntax:
uri = models.get_download_uri(model_id)
Params:
  • model_id (str): Model ID
Return type:
str
job_definition_id

Get the job definition ID.

job_definition_name

Get the job definition name.

list(filter_archived: typing.Union[bool, NoneType] = None) → abeja.training.common.SizedIterable[abeja.training.model.Model]

Returns an iterator object that iterates training models under this object.

This method returns an instance of SizedIterable, so you can get the total number of training models.

Params:
  • filter_archived (bool): [optional] If true, include archived models, otherwise exclude archived models. (default: false)
Return type:
SizedIterable[Model]
organization_id

Get the organization ID.

unarchive(model_id: str) → None

Unarchive a training model.

Request Syntax:
model = models.unarchive(model_id)
Params:
  • model_id (str): Job ID
update(model_id: str, description: str) → abeja.training.model.Model

Update a training model.

Request Syntax:
model = models.update(model_id, 'description')
Params:
  • model_id (str): Model ID
  • description (str): description
Return type:
Model object

Subresources

Statistics

class abeja.training.Statistics(num_epochs: int = None, epoch: int = None, progress_percentage: float = None, **kwargs) → None
STAGE_TRAIN = 'train'
STAGE_VALIDATION = 'validation'
add_stage(name: str, accuracy: float = None, loss: float = None, **kwargs) → None

add stage information

Params:
  • name (str): name of stage. It have prepared STAGE_TRAIN and STAGE_VALIDATION as constants, but you can set arbitrary character strings.
  • accuracy (float): accuracy rate that value needs between 0 and 1.
  • loss (float): loss rate that value needs between 0 and 1.
Returns:
None
Raises:
  • ValueError
classmethod from_response(response: typing.Union[typing.Dict[str, typing.Any], NoneType]) → typing.Union[_ForwardRef('Statistics'), NoneType]
get_statistics() → dict

get stage information

Return type:
dict
Returns:

Response Syntax:

{
    'num_epochs': 10,
    'epoch': 1,
    'progress_percentage': 90,
}

JobStatus

class abeja.training.JobStatus

Set of job statuses which indicates a job is pending, running or failed and what ever.

  • PENDING: Necessary resources for running job is currently prepared
  • ACTIVE: The job is actively running
  • STOPPED: The job was stopped by user
  • COMPLETE: The job was successfully completed
  • FAILED: The job was failed by some reason

JobArtifacts

class abeja.training.JobArtifacts(download_uri: str) → None
download_uri

Return the download URI where artifacts archive file exists.

classmethod from_response(response: typing.Dict[str, typing.Any]) → abeja.training.job.JobArtifacts