Client represents a higher level interface to training API.
abeja.training.
Client
(organization_id: str = None, credential: typing.Dict[str, str] = None, timeout: typing.Union[int, NoneType] = None, max_retry_count: typing.Union[int, NoneType] = None) → None¶A High-Level client for Training API
from abeja.training import Client
client = Client(organization_id='1234567890123')
os.environ['ABEJA_ORGANIZATION_ID']
if omitted.APIClient
.
See the section Client Parameter for more details about how to specify this parameter.APIClient
.APIClient
.job_definitions
() → abeja.training.job_definition.JobDefinitions¶Get a adapter object for handling training job definitions in the organization.
adapter = client.job_definitions()
definition = adapter.get(job_definition_name)
JobDefinitions
objectabeja.training.
JobDefinition
(api: abeja.training.api.client.APIClient, organization_id: str, job_definition_id: str, name: str, version_count: int, model_count: int, notebook_count: int, tensorboard_count: int, versions: typing.Union[typing.List[_ForwardRef('job_definition_version.JobDefinitionVersion')], NoneType], jobs: typing.Union[list, NoneType], archived: bool, created_at: str, modified_at: str) → None¶Training job definition object.
archived
¶Get whether this job definition is archived or not.
created_at
¶Get the created date string (ISO 8601) of this job definition.
from_response
(api: abeja.training.api.client.APIClient, organization_id: str, response: typing.Dict[str, typing.Any]) → abeja.training.job_definition.JobDefinition¶Construct an object from API response.
NOTE: For convenient, this method DOES NOT validate the input response and always returns an object filled with default values.
job_definition_id
¶Get the job definition ID of this job definition.
job_definition_versions
() → abeja.training.job_definition_version.JobDefinitionVersions¶Return a adapter object for handling training job definition versions under this job definition.
adapter = definition.job_definition_versions()
version = adapter.get(job_definition_version_id=1)
JobDefinitionVersions
objectjobs
() → abeja.training.job.Jobs¶Return a adapter object for handling training jobs under this job definition.
adapter = definition.jobs()
job = adapter.get(job_id='1234567890123')
Jobs
objectmodel_count
¶Get the model count of this job definition.
models
() → abeja.training.model.Models¶Return a adapter object for handling training models under this job definition.
adapter = definition.models()
model = adapter.get(model_id='1234567890123')
Models
objectmodified_at
¶Get the modified date string (ISO 8601) of this job definition.
name
¶Get the name of this job definition.
notebook_count
¶Get the notebook count of this job definition.
organization_id
¶Get the organization ID of this job definition.
tensorboard_count
¶Get the tensorboard count of this job definition.
version_count
¶Get the version count of this job definition.
versions
¶Get the versions of this job definition.
abeja.training.
JobDefinitionVersion
(api: abeja.training.api.client.APIClient, organization_id: str, job_definition_id: str, job_definition_version_id: int, handler: str, image: abeja.common.docker_image_name.DockerImageName, environment: typing.Dict[str, str], description: str, archived: bool, created_at: str, modified_at: str, job_definition: typing.Union[_ForwardRef('job_definition.JobDefinition'), NoneType] = None) → None¶Training job definition version object.
archived
¶Get whether this job definition is archived or not.
created_at
¶Get the created date string (ISO 8601) of this job definition version.
description
¶Get the description of this job definition version.
environment
¶Get the environment variables of this job definition version.
from_response
(api: abeja.training.api.client.APIClient, organization_id: str, response: typing.Dict[str, typing.Any], job_definition: typing.Union[_ForwardRef('job_definition.JobDefinition'), NoneType] = None) → abeja.training.job_definition_version.JobDefinitionVersion¶Construct an object from API response.
NOTE: For convenient, this method DOES NOT validate the input response and always returns an object filled with default values.
handler
¶Get the handler of this job definition version.
image
¶Get the DockerImageName
of this job definition version.
job_definition
¶job_definition_id
¶Get the job_definition ID of this job definition version.
job_definition_version_id
¶Get the version of this job definition version.
modified_at
¶Get the modified date string (ISO 8601) of this job definition version.
organization_id
¶Get the organization ID of this job definition version.
abeja.training.
Job
(api: abeja.training.api.client.APIClient, organization_id: str, job_definition_id: str, job_definition_version_id: int, job_id: str, instance_type: abeja.common.instance_type.InstanceType, exec_env: abeja.common.exec_env.ExecEnv, environment: typing.Dict[str, str], statistics: typing.Union[abeja.training.statistics.Statistics, NoneType], status_message: typing.Union[str, NoneType], status: abeja.training.job_status.JobStatus, description: str, datasets: typing.Dict[str, str], creator: typing.Union[abeja.user.user.User, NoneType], archived: bool, start_time: str, completion_time: str, created_at: str, modified_at: str, job_definition: typing.Union[_ForwardRef('job_definition.JobDefinition'), NoneType] = None, job_definition_version: typing.Union[_ForwardRef('job_definition_version.JobDefinitionVersion'), NoneType] = None) → None¶Training job object.
archived
¶Get the archived of this job.
completion_time
¶Get the completion time of this job.
created_at
¶Get the created datetime string of this job.
creator
¶Get the creator of this job.
datasets
¶Get the datasets of this job.
description
¶Get the description of this job.
environment
¶Get environment variables of this job.
exec_env
¶Get the execution environment which this job runs.
from_response
(api: abeja.training.api.client.APIClient, organization_id: str, response: typing.Dict[str, typing.Any], job_definition: typing.Union[_ForwardRef('job_definition.JobDefinition'), NoneType] = None, job_definition_version: typing.Union[_ForwardRef('job_definition_version.JobDefinitionVersion'), NoneType] = None) → abeja.training.job.Job¶Construct an object from API response.
NOTE: For convenient, this method DOES NOT validate the input response and always returns an object filled with default values.
instance_type
¶Get the instance type of this job.
job_definition
¶Get the job definition of this job.
job_definition_id
¶Get the job definition id of this job.
job_definition_version
¶Get the job definition version of this job.
job_definition_version_id
¶Get the job definition version id of this job.
job_id
¶Get the id of this job.
modified_at
¶Get the modified datetime string of this job.
organization_id
¶Get the organization id of this job.
start_time
¶Get the start time of this job.
statistics
¶Get the statistics of this job.
status
¶Get the current status of this job.
status_message
¶Get the status_message of this job.
abeja.training.
Model
(api: abeja.training.api.client.APIClient, organization_id: str, job_definition_id: str, job_id: typing.Union[str, NoneType], model_id: str, description: typing.Union[str, NoneType], metrics: typing.Dict[str, typing.Any], environment: typing.Dict[str, str], exec_env: abeja.common.exec_env.ExecEnv, creator: typing.Union[abeja.user.user.User, NoneType], archived: bool, created_at: str, modified_at: str, job_definition: typing.Union[_ForwardRef('job_definition.JobDefinition'), NoneType] = None, job: typing.Union[_ForwardRef('job.Job'), NoneType] = None) → None¶Training model object.
Training model object is a representation of a machine learning model file.
archived
¶Get the archived of this model.
created_at
¶Get the created_at of this model.
creator
¶Get the creator of this model.
description
¶Get the description of this model.
environment
¶Get the environment of this model.
exec_env
¶Get the exec_env of this model.
from_response
(api: abeja.training.api.client.APIClient, organization_id: str, response: typing.Dict[str, typing.Any], job_definition: typing.Union[_ForwardRef('job_definition.JobDefinition'), NoneType] = None, job: typing.Union[_ForwardRef('job.Job'), NoneType] = None) → abeja.training.model.Model¶Construct an object from API response.
NOTE: For convenient, this method DOES NOT validate the input response and always returns an object filled with default values.
job
¶Get the job of this model.
job_definition
¶Get the job definition of this model.
job_definition_id
¶Get the job definition id of this model.
job_id
¶Get the Job ID of this model. Returns None
if the model doesn’t
have a back reference to a job.
metrics
¶Get the metrics of this model.
model_id
¶Get the model_id of this model.
modified_at
¶Get the modified_at of this model.
organization_id
¶Get the organization id of this model.
abeja.training.
JobDefinitions
(api: abeja.training.api.client.APIClient, organization_id: str) → None¶The training job definition adapter class.
archive
(name: str)¶Archive a training job definition.
definitions.archive(name=job_definition_name)
create
(name: str) → abeja.training.job_definition.JobDefinition¶Create a new training job definition.
definition = definitions.create(name)
JobDefinition
objectdelete
(name: str)¶Delete a training job definition.
definitions.delete(name=job_definition_name)
get
(name: str, include_jobs: typing.Union[bool, NoneType] = False) → abeja.training.job_definition.JobDefinition¶Get a training job definition.
definition = definitions.get(name=job_definition_name)
True
, also returns training jobs in response. (Default: False
)JobDefinition
objectlist
(filter_archived: typing.Union[bool, NoneType] = None, offset: typing.Union[int, NoneType] = None, limit: typing.Union[int, NoneType] = None) → abeja.training.common.SizedIterable[abeja.training.job_definition.JobDefinition]¶Returns an iterator object that iterates training job definitions under this object.
This method returns an instance of SizedIterable
, so you can
get the total number of training job definitions.
true
, include archived jobs, otherwise exclude archived jobs. (default: false
)organization_id
¶Get the organization ID.
unarchive
(name: str)¶Unarchive a training job definition.
definitions.unarchive(name=job_definition_name)
abeja.training.
JobDefinitionVersions
(api: abeja.training.api.client.APIClient, job_definition: abeja.training.job_definition.JobDefinition) → None¶The training job definition version adapter class.
archive
(job_definition_version_id: int)¶Archive a training job definition version.
versions.archive(job_definition_version_id=5)
create
(source: typing.Union[typing.List[str], typing.IO[~AnyStr]], handler: str, image: abeja.common.docker_image_name.DockerImageName, environment: typing.Union[typing.Dict[str, typing.Any], NoneType] = None, description: typing.Union[str, NoneType] = None)¶Create a new training job definition version.
from abeja.common.docker_image_name import ALL_GPU_19_10
version = versions.create(
source=['train.py'],
handler='train:handler',
image=ALL_GPU_19_10,
environment={'key': 'value'},
description='new version')
JobDefinitionVersion
objectdelete
(job_definition_version_id: int)¶Delete a training job definition version.
versions.delete(job_definition_version_id=5)
get
(job_definition_version_id: int) → abeja.training.job_definition_version.JobDefinitionVersion¶Get a training job definition version.
version = versions.get(job_definition_version_id=5)
JobDefinitionVersion
objectjob_definition_id
¶Get the job definition ID.
job_definition_name
¶Get the job definition name.
list
(filter_archived: typing.Union[bool, NoneType] = None) → abeja.training.common.SizedIterable[abeja.training.job_definition_version.JobDefinitionVersion]¶Returns an iterator object that iterates training job definition versions under this object.
This method returns an instance of SizedIterable
, so you can
get the total number of training job definition versions.
true
, include archived jobs, otherwise exclude archived jobs. (default: false
)organization_id
¶Get the organization ID.
unarchive
(job_definition_version_id: int)¶Unarchive a training job definition version.
versions.unarchive(job_definition_version_id=5)
update
(job_definition_version_id: int, description: str) → abeja.training.job_definition_version.JobDefinitionVersion¶Update a training job definition version.
version = versions.update(job_definition_version_id=5, description='new version')
JobDefinitionVersion
objectabeja.training.
Jobs
(api: abeja.training.api.client.APIClient, job_definition: abeja.training.job_definition.JobDefinition) → None¶The training jobs adapter class.
archive
(job_id: str) → None¶Archive a training job.
job = jobs.archive(job_id)
create
(job_definition_version_id: int, instance_type: abeja.common.instance_type.InstanceType, datasets: typing.Union[typing.Dict[str, str], NoneType] = None, environment: typing.Union[typing.Dict[str, typing.Any], NoneType] = None, description: typing.Union[str, NoneType] = None, export_log: typing.Union[bool, NoneType] = None) → abeja.training.job.Job¶Create a new training job.
job = jobs.create(
job_definition_version_id=5,
instance_type=InstanceType.parse('gpu-1'))
true
, include the log in the model.
This feature is only available with 19.04 or later images. (default: false
)Job
objectget
(job_id: str) → abeja.training.job.Job¶Get a training job.
job = jobs.get(job_id)
Job
objectget_artifacts
(job_id: str) → abeja.training.job.JobArtifacts¶Get artifacts object of this job.
job = jobs.get_artifacts(job_id)
job_definition_id
¶Get the job definition ID.
job_definition_name
¶Get the job definition name.
list
(filter_archived: typing.Union[bool, NoneType] = None, offset: typing.Union[int, NoneType] = None, limit: typing.Union[int, NoneType] = None) → abeja.training.common.SizedIterable[abeja.training.job.Job]¶Returns an iterator object that iterates training jobs under this object.
This method returns an instance of SizedIterable
, so you can
get the total number of training jobs.
true
, include archived jobs, otherwise exclude archived jobs. (default: false
)organization_id
¶Get the organization ID.
stop
(job_id: str) → None¶Stop a training job.
job = jobs.stop(job_id)
unarchive
(job_id: str) → None¶Unarchive a training job.
job = jobs.unarchive(job_id)
update_statistics
(job_id: str, statistics: typing.Union[abeja.training.statistics.Statistics, NoneType]) → typing.Union[abeja.training.job.Job, NoneType]¶Notify a job statistics for ABEJA Platform.
from abeja.training import Statistics
statistics = Statistics(num_epochs=10, epoch=1)
statistics.add_stage(name=Statistics.STAGE_TRAIN, accuracy=90.0, loss=0.10)
statistics.add_stage(name=Statistics.STAGE_VALIDATION, accuracy=75.0, loss=0.07)
jobs.update_statistics(job_id, statistics)
Statistics
): statisticsJob
objectabeja.training.
Models
(api: abeja.training.api.client.APIClient, job_definition: abeja.training.job_definition.JobDefinition) → None¶The training models adapter class.
archive
(model_id: str) → None¶Archive a training model.
model = models.archive(model_id)
create
(model_data: typing.IO[AnyStr], job_id: typing.Union[str, NoneType] = None, environment: typing.Union[typing.Dict[str, typing.Any], NoneType] = None, metrics: typing.Union[typing.Dict[str, typing.Any], NoneType] = None, description: typing.Union[str, NoneType] = None) → abeja.training.model.Model¶Create a new training model.
model = models.create(
model_data,
environment={'BATCH_SIZE': 32, 'EPOCHS': 50},
metrics={'acc': 0.76, 'loss': 1.99})
Model
objectget
(model_id: str) → abeja.training.model.Model¶Get a training model.
model = models.get(model_id)
Model
objectget_download_uri
(model_id: str) → str¶Get download URL for training model.
uri = models.get_download_uri(model_id)
job_definition_id
¶Get the job definition ID.
job_definition_name
¶Get the job definition name.
list
(filter_archived: typing.Union[bool, NoneType] = None) → abeja.training.common.SizedIterable[abeja.training.model.Model]¶Returns an iterator object that iterates training models under this object.
This method returns an instance of SizedIterable
, so you can
get the total number of training models.
true
, include archived models, otherwise exclude archived models. (default: false
)organization_id
¶Get the organization ID.
unarchive
(model_id: str) → None¶Unarchive a training model.
model = models.unarchive(model_id)
abeja.training.
Statistics
(num_epochs: int = None, epoch: int = None, progress_percentage: float = None, **kwargs) → None¶STAGE_TRAIN
= 'train'¶STAGE_VALIDATION
= 'validation'¶add_stage
(name: str, accuracy: float = None, loss: float = None, **kwargs) → None¶add stage information
from_response
(response: typing.Union[typing.Dict[str, typing.Any], NoneType]) → typing.Union[_ForwardRef('Statistics'), NoneType]¶get_statistics
() → dict¶get stage information
Response Syntax:
{
'num_epochs': 10,
'epoch': 1,
'progress_percentage': 90,
}
abeja.training.
JobStatus
¶Set of job statuses which indicates a job is pending, running or failed and what ever.