Client represents a higher level interface to training API.
A High-Level client for Training API
from abeja.training import Client
client = Client(organization_id='1234567890123')
organization_id (str): The organization ID. Takes from os.environ['ABEJA_ORGANIZATION_ID']
if omitted.
credential (dict): [optional] This parameter will be passed to its undering APIClient
.
See the section Client Parameter for more details about how to specify this parameter.
timeout (int): [optional] This parameter will be passed to its undering APIClient
.
max_retry_count (int): [optional] This parameter will be passed to its undering APIClient
.
Get a adapter object for handling training job definitions in the organization.
adapter = client.job_definitions()
definition = adapter.get(job_definition_name)
JobDefinitions
object
Training job definition object.
Get whether this job definition is archived or not.
Get the created date string (ISO 8601) of this job definition.
Construct an object from API response.
NOTE: For convenient, this method DOES NOT validate the input response and always returns an object filled with default values.
Get the job definition ID of this job definition.
Return a adapter object for handling training job definition versions under this job definition.
adapter = definition.job_definition_versions()
version = adapter.get(job_definition_version_id=1)
JobDefinitionVersions
object
Return a adapter object for handling training jobs under this job definition.
adapter = definition.jobs()
job = adapter.get(job_id='1234567890123')
Jobs
object
Get the model count of this job definition.
Return a adapter object for handling training models under this job definition.
adapter = definition.models()
model = adapter.get(model_id='1234567890123')
Models
object
Get the modified date string (ISO 8601) of this job definition.
Get the name of this job definition.
Get the notebook count of this job definition.
Get the organization ID of this job definition.
Get the tensorboard count of this job definition.
Get the version count of this job definition.
Get the versions of this job definition.
Training job definition version object.
Get whether this job definition is archived or not.
Get the created date string (ISO 8601) of this job definition version.
Get the description of this job definition version.
Get the environment variables of this job definition version.
Construct an object from API response.
NOTE: For convenient, this method DOES NOT validate the input response and always returns an object filled with default values.
Get the handler of this job definition version.
Get the DockerImageName
of this job definition version.
Get the job_definition ID of this job definition version.
Get the version of this job definition version.
Get the modified date string (ISO 8601) of this job definition version.
Get the organization ID of this job definition version.
Training job object.
Get the archived of this job.
Get the completion time of this job.
Get the created datetime string of this job.
Get the creator of this job.
Get the datasets of this job.
Get the description of this job.
Get environment variables of this job.
Construct an object from API response.
NOTE: For convenient, this method DOES NOT validate the input response and always returns an object filled with default values.
Get the instance type of this job.
Get the job definition of this job.
Get the job definition id of this job.
Get the job definition version of this job.
Get the job definition version id of this job.
Get the id of this job.
Get the modified datetime string of this job.
Get the organization id of this job.
Get the start time of this job.
Get the statistics of this job.
Get the status_message of this job.
Training model object.
Training model object is a representation of a machine learning model file.
Training Job can generate single or multiple training models.
You can upload your local model files which are on the local machine.
Get the archived of this model.
Get the created_at of this model.
Get the creator of this model.
Get the description of this model.
Get the environment of this model.
Construct an object from API response.
NOTE: For convenient, this method DOES NOT validate the input response and always returns an object filled with default values.
Get the job definition of this model.
Get the job definition id of this model.
Get the Job ID of this model. Returns None
if the model doesn’t
have a back reference to a job.
Get the metrics of this model.
Get the model_id of this model.
Get the modified_at of this model.
Get the organization id of this model.
The training job definition adapter class.
Archive a training job definition.
definitions.archive(name=job_definition_name)
name (str): The identifier of a training job definition. It can be either name or job_definition_id.
Create a new training job definition.
definition = definitions.create(name)
name (str): training job definition name
JobDefinition
object
Delete a training job definition.
definitions.delete(name=job_definition_name)
name (str): The identifier of a training job definition. It can be either name or job_definition_id.
Get a training job definition.
definition = definitions.get(name=job_definition_name)
name (str): The identifier of a training job definition. It can be either name or job_definition_id.
include_jobs (bool): If True
, also returns training jobs in response. (Default: False
)
JobDefinition
object
Returns an iterator object that iterates training job definitions under this object.
This method returns an instance of SizedIterable
, so you can
get the total number of training job definitions.
filter_archived (bool): [optional] If true
, include archived jobs, otherwise exclude archived jobs. (default: false
)
offset (int): [optional] paging offset.
limit (int): [optional] paging limit.
SizedIterable[JobDefinition]
Get the organization ID.
Unarchive a training job definition.
definitions.unarchive(name=job_definition_name)
name (str): The identifier of a training job definition. It can be either name or job_definition_id.
The training job definition version adapter class.
Archive a training job definition version.
versions.archive(job_definition_version_id=5)
job_definition_version_id (int): the version number
Create a new training job definition version.
from abeja.common.docker_image_name import ALL_GPU_19_10
version = versions.create(
source=['train.py'],
handler='train:handler',
image=ALL_GPU_19_10,
environment={'key': 'value'},
description='new version')
source (List[str] | IO): an input source for training code. It’s one of: - zip or tar.gz archived file-like object. - a list of file paths.
image (DockerImageName): runtime environment
environment (Optional[dict]): user defined parameters set as environment variables
description (Optional[str]): description
JobDefinitionVersion
object
Delete a training job definition version.
versions.delete(job_definition_version_id=5)
job_definition_version_id (int): the version number
Get a training job definition version.
version = versions.get(job_definition_version_id=5)
job_definition_version_id (int): the version number
JobDefinitionVersion
object
Get the job definition ID.
Get the job definition name.
Returns an iterator object that iterates training job definition versions under this object.
This method returns an instance of SizedIterable
, so you can
get the total number of training job definition versions.
filter_archived (bool): [optional] If true
, include archived jobs, otherwise exclude archived jobs. (default: false
)
SizedIterable[JobDefinitionVersion]
Get the organization ID.
Unarchive a training job definition version.
versions.unarchive(job_definition_version_id=5)
job_definition_version_id (int): the version number
Update a training job definition version.
version = versions.update(job_definition_version_id=5, description='new version')
job_definition_version_id (int): the version number
JobDefinitionVersion
object
The training jobs adapter class.
Archive a training job.
job = jobs.archive(job_id)
job_id (str): Job ID
Create a new training job.
job = jobs.create(
job_definition_version_id=5,
instance_type=InstanceType.parse('gpu-1'))
job_definition_version_id (int): training job version
instance_type (InstanceType): instance type of running environment
datasets (dict): [optional] datasets, combination of alias and dataset_id
environment (dict): [optional] user defined parameters set as environment variables
description (str): [optional] description of this job
export_log (bool): [optional] If true
, include the log in the model.
This feature is only available with 19.04 or later images. (default: false
)
Job
object
Get a training job.
job = jobs.get(job_id)
job_id (str): Job ID
Job
object
Get artifacts object of this job.
job = jobs.get_artifacts(job_id)
job_id (str): Job ID
Get the job definition ID.
Get the job definition name.
Returns an iterator object that iterates training jobs under this object.
This method returns an instance of SizedIterable
, so you can
get the total number of training jobs.
filter_archived (bool): [optional] If true
, include archived jobs, otherwise exclude archived jobs. (default: false
)
offset (int): [optional] paging offset.
limit (int): [optional] paging limit.
SizedIterable[Job]
Get the organization ID.
Stop a training job.
job = jobs.stop(job_id)
job_id (str): Job ID
Unarchive a training job.
job = jobs.unarchive(job_id)
job_id (str): Job ID
Notify a job statistics for ABEJA Platform.
from abeja.training import Statistics
statistics = Statistics(num_epochs=10, epoch=1)
statistics.add_stage(name=Statistics.STAGE_TRAIN, accuracy=90.0, loss=0.10)
statistics.add_stage(name=Statistics.STAGE_VALIDATION, accuracy=75.0, loss=0.07)
jobs.update_statistics(job_id, statistics)
job_id (str): Job ID
statistics (Statistics
): statistics
Job
object
The training models adapter class.
Archive a training model.
model = models.archive(model_id)
model_id (str): Job ID
Create a new training model.
model = models.create(
model_data,
environment={'BATCH_SIZE': 32, 'EPOCHS': 50},
metrics={'acc': 0.76, 'loss': 1.99})
model_data (IO): An input source for ML model. It must be a zip archived file like object
job_id (str): [optional] job identifer
environment (dict): [optional] user defined parameters set as environment variables
metrics (dict): [optional] user defined metrics for this model
description (str): [optional] description
Model
object
Get a training model.
model = models.get(model_id)
model_id (str): Model ID
Model
object
Get download URL for training model.
uri = models.get_download_uri(model_id)
model_id (str): Model ID
str
Get the job definition ID.
Get the job definition name.
Returns an iterator object that iterates training models under this object.
This method returns an instance of SizedIterable
, so you can
get the total number of training models.
filter_archived (bool): [optional] If true
, include archived models, otherwise exclude archived models. (default: false
)
SizedIterable[Model]
Get the organization ID.
Unarchive a training model.
model = models.unarchive(model_id)
model_id (str): Job ID
add stage information
name (str): name of stage. It have prepared STAGE_TRAIN and STAGE_VALIDATION as constants, but you can set arbitrary character strings.
accuracy (float): accuracy rate that value needs between 0 and 1.
loss (float): loss rate that value needs between 0 and 1.
None
ValueError
get stage information
dict
Response Syntax:
{
'num_epochs': 10,
'epoch': 1,
'progress_percentage': 90,
}
Set of job statuses which indicates a job is pending, running or failed and what ever.
PENDING: Necessary resources for running job is currently prepared
ACTIVE: The job is actively running
STOPPED: The job was stopped by user
COMPLETE: The job was successfully completed
FAILED: The job was failed by some reason
Return the download URI where artifacts archive file exists.