After you have launched a notebook, you need the following libraries to be imported, we’re taking the example of XGboost here:
import sagemaker
import boto3
from sagemaker.predictor import csv_serializer # Converts strings for HTTP POST requests on inferenceimport numpy as np # For performing matrix operations and numerical processing
import pandas as pd # For manipulating tabular data
from time import gmtime, strftime
import osregion = boto3.Session().region_name
smclient = boto3.Session().client('sagemaker')from sagemaker import get_execution_role
#the IAM role that you created when you created your #notebook instance. You pass the role to the tuning job.role = get_execution_role()
bucket = 'sagemaker-MyBucket'
print(role)
#replace with the name of your S3 bucket
prefix = 'sagemaker/DEMO-automatic-model-tuning-xgboost-dm'
Next Download the data and do EDA.
Hyperparameter Tuning
Hyperparameter tuning job specifications can be found here
from sagemaker.amazon.amazon_estimator import get_image_uri
training_image = get_image_uri(boto3.Session().region_name, 'xgboost')s3_input_train = 's3://{}/{}/train'.format(bucket, prefix)
training_job_definition = {
s3_input_validation ='s3://{}/{}/validation/'.format(bucket, prefix)
tuning_job_config = {
"ParameterRanges": {
"CategoricalParameterRanges": [],
"ContinuousParameterRanges": [
{
"MaxValue": "1",
"MinValue": "0",
"Name": "eta"
},
{
"MaxValue": "2",
"MinValue": "0",
"Name": "alpha"
},
{
"MaxValue": "10",
"MinValue": "1",
"Name": "min_child_weight"
}
],
"IntegerParameterRanges": [
{
"MaxValue": "10",
"MinValue": "1",
"Name": "max_depth"
}
]
},
"ResourceLimits": {
"MaxNumberOfTrainingJobs": 20,
"MaxParallelTrainingJobs": 3
},
"Strategy": "Bayesian",
"HyperParameterTuningJobObjective": {
"MetricName": "validation:auc",
"Type": "Maximize"
}
}
"AlgorithmSpecification": {
"TrainingImage": training_image,
"TrainingInputMode": "File"
},
"InputDataConfig": [
{
"ChannelName": "train",
"CompressionType": "None",
"ContentType": "csv",
"DataSource": {
"S3DataSource": {
"S3DataDistributionType": "FullyReplicated",
"S3DataType": "S3Prefix",
"S3Uri": s3_input_train
}
}
},
{
"ChannelName": "validation",
"CompressionType": "None",
"ContentType": "csv",
"DataSource": {
"S3DataSource": {
"S3DataDistributionType": "FullyReplicated",
"S3DataType": "S3Prefix",
"S3Uri": s3_input_validation
}
}
}
],
"OutputDataConfig": {
"S3OutputPath": "s3://{}/{}/output".format(bucket,prefix)
},
"ResourceConfig": {
"InstanceCount": 2,
"InstanceType": "ml.c4.2xlarge",
"VolumeSizeInGB": 10
},
"RoleArn": role,
"StaticHyperParameters": {
"eval_metric": "auc",
"num_round": "100",
"objective": "binary:logistic",
"rate_drop": "0.3",
"tweedie_variance_power": "1.4"
},
"StoppingCondition": {
"MaxRuntimeInSeconds": 43200
}
}tuning_job_name = "MyTuningJob"
smclient.create_hyper_parameter_tuning_job(HyperParameterTuningJobName = tuning_job_name,
HyperParameterTuningJobConfig = tuning_job_config,
TrainingJobDefinition = training_job_definition)
Monitoring can be done directly on the AWS console itself
Evaluating is very straight forward,You use a Jupyter notebook in your Amazon SageMaker notebook instance to train and evaluate your model.
- You either use AWS SDK for Python (Boto) or the high-level Python library that Amazon SageMaker provides to send requests to the model for inferences.
Say hello to the Amazon SageMaker Debugger!
It provides full visibility into model training by monitoring, recording, analyzing, and visualizing training process tensors. Using Amazon SageMaker Debugger Python SDK we can interact with objects that will help us debug the jobs. If you are more interested in the api, you can check it out here.
You can check the list of rules here.
- After I created a model using createmodel api. Speicify S3 path where the model artifacts are stored and the Docker registry path for the image that contains the inference code.
- Create an HTTPS endpoint configuration i.e: Configure the endpoint to elastically scale the deployed ML compute instances for each production variant job, for further details about the API, check CreateEndpointConfig api.
- Next launch it using the CreateEndpoint api
I will discuss more about the details of deployment in the next part.
Credit: BecomingHuman By: Arijit Mukherjee