Client

Client represents a higher level interface to training API.

class abeja.training.Client(organization_id: str | None = None, credential: Dict[str, str] | None = None, timeout: int | None = None, max_retry_count: int | None = None)

A High-Level client for Training API

from abeja.training import Client

client = Client(organization_id='1234567890123')
Params:
  • organization_id (str): The organization ID. Takes from os.environ['ABEJA_ORGANIZATION_ID'] if omitted.

  • credential (dict): [optional] This parameter will be passed to its undering APIClient. See the section Client Parameter for more details about how to specify this parameter.

  • timeout (int): [optional] This parameter will be passed to its undering APIClient.

  • max_retry_count (int): [optional] This parameter will be passed to its undering APIClient.

job_definitions() JobDefinitions

Get a adapter object for handling training job definitions in the organization.

Request syntax:
adapter = client.job_definitions()
definition = adapter.get(job_definition_name)
Return type:

JobDefinitions object

Entity classes

JobDefinition

class abeja.training.JobDefinition(api: APIClient, organization_id: str, job_definition_id: str, name: str, version_count: int, model_count: int, notebook_count: int, tensorboard_count: int, versions: List[JobDefinitionVersion] | None, jobs: list | None, archived: bool, created_at: str, modified_at: str)

Training job definition object.

property archived: bool

Get whether this job definition is archived or not.

property created_at: str

Get the created date string (ISO 8601) of this job definition.

classmethod from_response(api: APIClient, organization_id: str, response: Dict[str, Any]) JobDefinition

Construct an object from API response.

NOTE: For convenient, this method DOES NOT validate the input response and always returns an object filled with default values.

property job_definition_id: str

Get the job definition ID of this job definition.

job_definition_versions() JobDefinitionVersions

Return a adapter object for handling training job definition versions under this job definition.

Request syntax:
adapter = definition.job_definition_versions()
version = adapter.get(job_definition_version_id=1)
Return type:

JobDefinitionVersions object

jobs() Jobs

Return a adapter object for handling training jobs under this job definition.

Request syntax:
adapter = definition.jobs()
job = adapter.get(job_id='1234567890123')
Return type:

Jobs object

property model_count: int

Get the model count of this job definition.

models() Models

Return a adapter object for handling training models under this job definition.

Request syntax:
adapter = definition.models()
model = adapter.get(model_id='1234567890123')
Return type:

Models object

property modified_at: str

Get the modified date string (ISO 8601) of this job definition.

property name: str

Get the name of this job definition.

property notebook_count: int

Get the notebook count of this job definition.

property organization_id: str

Get the organization ID of this job definition.

property tensorboard_count: int

Get the tensorboard count of this job definition.

property version_count: int

Get the version count of this job definition.

property versions: List[JobDefinitionVersion] | None

Get the versions of this job definition.

JobDefinitionVersion

class abeja.training.JobDefinitionVersion(api: APIClient, organization_id: str, job_definition_id: str, job_definition_version_id: int, handler: str, image: DockerImageName, environment: Dict[str, str], description: str, archived: bool, created_at: str, modified_at: str, job_definition: JobDefinition | None = None)

Training job definition version object.

property archived: bool

Get whether this job definition is archived or not.

property created_at: str

Get the created date string (ISO 8601) of this job definition version.

property description: str

Get the description of this job definition version.

property environment: Dict[str, str]

Get the environment variables of this job definition version.

classmethod from_response(api: APIClient, organization_id: str, response: Dict[str, Any], job_definition: JobDefinition | None = None) JobDefinitionVersion

Construct an object from API response.

NOTE: For convenient, this method DOES NOT validate the input response and always returns an object filled with default values.

property handler: str

Get the handler of this job definition version.

property image: DockerImageName

Get the DockerImageName of this job definition version.

property job_definition: JobDefinition
property job_definition_id: str

Get the job_definition ID of this job definition version.

property job_definition_version_id: int

Get the version of this job definition version.

property modified_at: str

Get the modified date string (ISO 8601) of this job definition version.

property organization_id: str

Get the organization ID of this job definition version.

Job

class abeja.training.Job(api: APIClient, organization_id: str, job_definition_id: str, job_definition_version_id: int, job_id: str, instance_type: InstanceType, exec_env: ExecEnv, environment: Dict[str, str], statistics: Statistics | None, status_message: str | None, status: JobStatus, description: str, datasets: Dict[str, str], creator: User | None, archived: bool, start_time: str, completion_time: str, created_at: str, modified_at: str, job_definition: JobDefinition | None = None, job_definition_version: JobDefinitionVersion | None = None)

Training job object.

property archived: bool

Get the archived of this job.

property completion_time: str

Get the completion time of this job.

property created_at: str

Get the created datetime string of this job.

property creator: User | None

Get the creator of this job.

property datasets: Dict[str, str]

Get the datasets of this job.

property description: str

Get the description of this job.

property environment: Dict[str, str]

Get environment variables of this job.

property exec_env: ExecEnv

Get the execution environment which this job runs.

classmethod from_response(api: APIClient, organization_id: str, response: Dict[str, Any], job_definition: JobDefinition | None = None, job_definition_version: JobDefinitionVersion | None = None) Job

Construct an object from API response.

NOTE: For convenient, this method DOES NOT validate the input response and always returns an object filled with default values.

property instance_type: InstanceType

Get the instance type of this job.

property job_definition: JobDefinition

Get the job definition of this job.

property job_definition_id: str

Get the job definition id of this job.

property job_definition_version: JobDefinitionVersion

Get the job definition version of this job.

property job_definition_version_id: int

Get the job definition version id of this job.

property job_id: str

Get the id of this job.

property modified_at: str

Get the modified datetime string of this job.

property organization_id: str

Get the organization id of this job.

property start_time: str

Get the start time of this job.

property statistics: Statistics | None

Get the statistics of this job.

property status: JobStatus

Get the current status of this job.

property status_message: str | None

Get the status_message of this job.

Model

class abeja.training.Model(api: APIClient, organization_id: str, job_definition_id: str, job_id: str | None, model_id: str, description: str | None, metrics: Dict[str, Any], environment: Dict[str, str], exec_env: ExecEnv, creator: User | None, archived: bool, created_at: str, modified_at: str, job_definition: JobDefinition | None = None, job: Job | None = None)

Training model object.

Training model object is a representation of a machine learning model file.

  • Training Job can generate single or multiple training models.

  • You can upload your local model files which are on the local machine.

property archived: bool

Get the archived of this model.

property created_at: str

Get the created_at of this model.

property creator: User | None

Get the creator of this model.

property description: str | None

Get the description of this model.

property environment: Dict[str, str]

Get the environment of this model.

property exec_env: ExecEnv

Get the exec_env of this model.

classmethod from_response(api: APIClient, organization_id: str, response: Dict[str, Any], job_definition: JobDefinition | None = None, job: Job | None = None) Model

Construct an object from API response.

NOTE: For convenient, this method DOES NOT validate the input response and always returns an object filled with default values.

property job: Job | None

Get the job of this model.

property job_definition: JobDefinition

Get the job definition of this model.

property job_definition_id: str

Get the job definition id of this model.

property job_id: str | None

Get the Job ID of this model. Returns None if the model doesn’t have a back reference to a job.

property metrics: Dict[str, Any]

Get the metrics of this model.

property model_id: str

Get the model_id of this model.

property modified_at: str

Get the modified_at of this model.

property organization_id: str

Get the organization id of this model.

Adapter classes

JobDefinitions

class abeja.training.JobDefinitions(api: APIClient, organization_id: str)

The training job definition adapter class.

archive(name: str)

Archive a training job definition.

Request Syntax:
definitions.archive(name=job_definition_name)
Params:
  • name (str): The identifier of a training job definition. It can be either name or job_definition_id.

create(name: str) JobDefinition

Create a new training job definition.

Request Syntax:
definition = definitions.create(name)
Params:
  • name (str): training job definition name

Return type:

JobDefinition object

delete(name: str)

Delete a training job definition.

Request Syntax:
definitions.delete(name=job_definition_name)
Params:
  • name (str): The identifier of a training job definition. It can be either name or job_definition_id.

get(name: str, include_jobs: bool | None = False) JobDefinition

Get a training job definition.

Request Syntax:
definition = definitions.get(name=job_definition_name)
Params:
  • name (str): The identifier of a training job definition. It can be either name or job_definition_id.

  • include_jobs (bool): If True, also returns training jobs in response. (Default: False)

Return type:

JobDefinition object

list(filter_archived: bool | None = None, offset: int | None = None, limit: int | None = None) SizedIterable[JobDefinition]

Returns an iterator object that iterates training job definitions under this object.

This method returns an instance of SizedIterable, so you can get the total number of training job definitions.

Params:
  • filter_archived (bool): [optional] If true, include archived jobs, otherwise exclude archived jobs. (default: false)

  • offset (int): [optional] paging offset.

  • limit (int): [optional] paging limit.

Return type:

SizedIterable[JobDefinition]

property organization_id: str

Get the organization ID.

unarchive(name: str)

Unarchive a training job definition.

Request Syntax:
definitions.unarchive(name=job_definition_name)
Params:
  • name (str): The identifier of a training job definition. It can be either name or job_definition_id.

JobDefinitionVersions

class abeja.training.JobDefinitionVersions(api: APIClient, job_definition: JobDefinition)

The training job definition version adapter class.

archive(job_definition_version_id: int)

Archive a training job definition version.

Request Syntax:
versions.archive(job_definition_version_id=5)
Params:
  • job_definition_version_id (int): the version number

create(source: List[str] | IO, handler: str, image: DockerImageName, environment: Dict[str, Any] | None = None, description: str | None = None)

Create a new training job definition version.

Request Syntax:
from abeja.common.docker_image_name import ALL_GPU_19_10

version = versions.create(
    source=['train.py'],
    handler='train:handler',
    image=ALL_GPU_19_10,
    environment={'key': 'value'},
    description='new version')
Params:
  • source (List[str] | IO): an input source for training code. It’s one of: - zip or tar.gz archived file-like object. - a list of file paths.

  • image (DockerImageName): runtime environment

  • environment (Optional[dict]): user defined parameters set as environment variables

  • description (Optional[str]): description

Return type:

JobDefinitionVersion object

delete(job_definition_version_id: int)

Delete a training job definition version.

Request Syntax:
versions.delete(job_definition_version_id=5)
Params:
  • job_definition_version_id (int): the version number

get(job_definition_version_id: int) JobDefinitionVersion

Get a training job definition version.

Request Syntax:
version = versions.get(job_definition_version_id=5)
Params:
  • job_definition_version_id (int): the version number

Return type:

JobDefinitionVersion object

property job_definition_id: str

Get the job definition ID.

property job_definition_name: str

Get the job definition name.

list(filter_archived: bool | None = None) SizedIterable[JobDefinitionVersion]

Returns an iterator object that iterates training job definition versions under this object.

This method returns an instance of SizedIterable, so you can get the total number of training job definition versions.

Params:
  • filter_archived (bool): [optional] If true, include archived jobs, otherwise exclude archived jobs. (default: false)

Return type:

SizedIterable[JobDefinitionVersion]

property organization_id: str

Get the organization ID.

unarchive(job_definition_version_id: int)

Unarchive a training job definition version.

Request Syntax:
versions.unarchive(job_definition_version_id=5)
Params:
  • job_definition_version_id (int): the version number

update(job_definition_version_id: int, description: str) JobDefinitionVersion

Update a training job definition version.

Request Syntax:
version = versions.update(job_definition_version_id=5, description='new version')
Params:
  • job_definition_version_id (int): the version number

Return type:

JobDefinitionVersion object

Jobs

class abeja.training.Jobs(api: APIClient, job_definition: JobDefinition)

The training jobs adapter class.

archive(job_id: str) None

Archive a training job.

Request Syntax:
job = jobs.archive(job_id)
Params:
  • job_id (str): Job ID

create(job_definition_version_id: int, instance_type: InstanceType, datasets: Dict[str, str] | None = None, environment: Dict[str, Any] | None = None, description: str | None = None, export_log: bool | None = None) Job

Create a new training job.

Request Syntax:
job = jobs.create(
    job_definition_version_id=5,
    instance_type=InstanceType.parse('gpu-1'))
Params:
  • job_definition_version_id (int): training job version

  • instance_type (InstanceType): instance type of running environment

  • datasets (dict): [optional] datasets, combination of alias and dataset_id

  • environment (dict): [optional] user defined parameters set as environment variables

  • description (str): [optional] description of this job

  • export_log (bool): [optional] If true, include the log in the model. This feature is only available with 19.04 or later images. (default: false)

Return type:

Job object

get(job_id: str) Job

Get a training job.

Request Syntax:
job = jobs.get(job_id)
Params:
  • job_id (str): Job ID

Return type:

Job object

get_artifacts(job_id: str) JobArtifacts

Get artifacts object of this job.

Request Syntax:
job = jobs.get_artifacts(job_id)
Params:
  • job_id (str): Job ID

property job_definition_id: str

Get the job definition ID.

property job_definition_name: str

Get the job definition name.

list(filter_archived: bool | None = None, offset: int | None = None, limit: int | None = None) SizedIterable[Job]

Returns an iterator object that iterates training jobs under this object.

This method returns an instance of SizedIterable, so you can get the total number of training jobs.

Params:
  • filter_archived (bool): [optional] If true, include archived jobs, otherwise exclude archived jobs. (default: false)

  • offset (int): [optional] paging offset.

  • limit (int): [optional] paging limit.

Return type:

SizedIterable[Job]

property organization_id: str

Get the organization ID.

stop(job_id: str) None

Stop a training job.

Request Syntax:
job = jobs.stop(job_id)
Params:
  • job_id (str): Job ID

unarchive(job_id: str) None

Unarchive a training job.

Request Syntax:
job = jobs.unarchive(job_id)
Params:
  • job_id (str): Job ID

update_statistics(job_id: str, statistics: Statistics | None) Job | None

Notify a job statistics for ABEJA Platform.

Request Syntax:
from abeja.training import Statistics

statistics = Statistics(num_epochs=10, epoch=1)
statistics.add_stage(name=Statistics.STAGE_TRAIN, accuracy=90.0, loss=0.10)
statistics.add_stage(name=Statistics.STAGE_VALIDATION, accuracy=75.0, loss=0.07)

jobs.update_statistics(job_id, statistics)
Params:
  • job_id (str): Job ID

  • statistics (Statistics): statistics

Return type:

Job object

Models

class abeja.training.Models(api: APIClient, job_definition: JobDefinition)

The training models adapter class.

archive(model_id: str) None

Archive a training model.

Request Syntax:
model = models.archive(model_id)
Params:
  • model_id (str): Job ID

create(model_data: IO, job_id: str | None = None, environment: Dict[str, Any] | None = None, metrics: Dict[str, Any] | None = None, description: str | None = None) Model

Create a new training model.

Request Syntax:
model = models.create(
    model_data,
    environment={'BATCH_SIZE': 32, 'EPOCHS': 50},
    metrics={'acc': 0.76, 'loss': 1.99})
Params:
  • model_data (IO): An input source for ML model. It must be a zip archived file like object

  • job_id (str): [optional] job identifer

  • environment (dict): [optional] user defined parameters set as environment variables

  • metrics (dict): [optional] user defined metrics for this model

  • description (str): [optional] description

Return type:

Model object

get(model_id: str) Model

Get a training model.

Request Syntax:
model = models.get(model_id)
Params:
  • model_id (str): Model ID

Return type:

Model object

get_download_uri(model_id: str) str

Get download URL for training model.

Request Syntax:
uri = models.get_download_uri(model_id)
Params:
  • model_id (str): Model ID

Return type:

str

property job_definition_id: str

Get the job definition ID.

property job_definition_name: str

Get the job definition name.

list(filter_archived: bool | None = None) SizedIterable[Model]

Returns an iterator object that iterates training models under this object.

This method returns an instance of SizedIterable, so you can get the total number of training models.

Params:
  • filter_archived (bool): [optional] If true, include archived models, otherwise exclude archived models. (default: false)

Return type:

SizedIterable[Model]

property organization_id: str

Get the organization ID.

unarchive(model_id: str) None

Unarchive a training model.

Request Syntax:
model = models.unarchive(model_id)
Params:
  • model_id (str): Job ID

update(model_id: str, description: str) Model

Update a training model.

Request Syntax:
model = models.update(model_id, 'description')
Params:
  • model_id (str): Model ID

  • description (str): description

Return type:

Model object

Subresources

Statistics

class abeja.training.Statistics(num_epochs: int | None = None, epoch: int | None = None, progress_percentage: float | None = None, **kwargs)
STAGE_TRAIN = 'train'
STAGE_VALIDATION = 'validation'
add_stage(name: str, accuracy: float | None = None, loss: float | None = None, **kwargs) None

add stage information

Params:
  • name (str): name of stage. It have prepared STAGE_TRAIN and STAGE_VALIDATION as constants, but you can set arbitrary character strings.

  • accuracy (float): accuracy rate that value needs between 0 and 1.

  • loss (float): loss rate that value needs between 0 and 1.

Returns:

None

Raises:
  • ValueError

classmethod from_response(response: Dict[str, Any] | None) Statistics | None
get_statistics() dict

get stage information

Return type:

dict

Returns:

Response Syntax:

{
    'num_epochs': 10,
    'epoch': 1,
    'progress_percentage': 90,
}
stages: Dict[str, Dict[str, Any]]

JobStatus

class abeja.training.JobStatus(value)

Set of job statuses which indicates a job is pending, running or failed and what ever.

  • PENDING: Necessary resources for running job is currently prepared

  • ACTIVE: The job is actively running

  • STOPPED: The job was stopped by user

  • COMPLETE: The job was successfully completed

  • FAILED: The job was failed by some reason

JobArtifacts

class abeja.training.JobArtifacts(download_uri: str)
property download_uri: str

Return the download URI where artifacts archive file exists.

classmethod from_response(response: Dict[str, Any]) JobArtifacts