With recommendation tools like Amazon Personalize, the human bias and error are removed from the equation. Amazon Personalize doesn’t have a favorite B-movie. Amazon Personalize doesn’t just love this one actor that you can’t stand. Amazon Personalize’s only goal is to ensure it knows you well enough to find hidden treasures for you to enjoy.
With careful tailoring and paying close attention to your trends without adding in unconscious (or conscious) bias, machine learning allows for seamless recommendations through continuous integration and realtime results.
- Store inventory and user demographics on Amazon S3.
- Automatically process and examine data via Amazon Personalize, identify what is meaningful, select the right algorithms, and train and optimize a personalization model that is customized for your data.
- (Output) Provides Amazon Personalize with an activity stream to generate realtime recommendations or request recommendations in bulk via a customized personalization API.
You know, in case you’ve been living under a rock.
Amazon S3, or Amazon Simple Storage Service, is a service offered by AWS providing object storage through a web service interface. Amazon S3 uses the same scalable storage infrastructure that Amazon.com uses to run its global e-commerce network.
The Amazon Personalize API has a group of supported actions to organize your data coming from the S3. It also has actions supported by Amazon Personalize Events and Amazon Personalize Runtime. This is here to help keep your data clean and well-structured as it preps to go through the Amazon Personalize service.
By this point in the article, you should have a high-level, conceptual understanding of this. This is the part where your data is processed and the model is trained to find recommendations based on trends and patterns.
That is the end result of the Personalize process, which in turn gets outputted with an activity stream and put right back into Amazon Personalize, to ensure the accurate and realtime recommendations Amazon boasts.
We’re going to make a simple recommendation engine to show you just how easy Amazon Personalize makes it. Make sure you have an AWS Account and activate it before starting this walkthrough, and if you don’t have an account yet, you should make one here.
Create the Training Data
In order to create training data in the example, download, modify, and save the movie ratings data Amazon gives as an example to an Amazon S3 bucket. Then give Amazon Personalize permission to read from the bucket.
- Download Amazon’s example zip file from MovieLens. Unzip the file. The user-interactions data is in the
ratings.csvfile. Open it!
- Inside that file, delete the rating column.
- Replace header row with:
USER_ID, ITEM_ID, TIMESTAMP***These headers must be exactly as shown for Amazon Personalize to recognize the data.***
- Save the file and upload it to your Amazon S3 bucket. For more information, see Uploading Files and Folders by Using Drag and Drop in the Amazon S3 Console User Guide.
- Grant Amazon Personalize permission to read the data in the bucket. For more information, see Uploading to an S3 Bucket, and check out the screenshot below to make sure you’re set up correctly.
Setting up the AWS SDKs
To confirm that your Python environment is properly configured to use with Amazon Personalize, the code below should display a list of recipes.
Import the training data
After you verify that your Python environment is configured correctly, import your data. To use a dataset for training, you need to do the following:
- Add a schema. The schema allows Amazon Personalize to parse the training dataset. For more on this step, check out the AWS documentation for Datasets and Schemas.
- Import the data. You create a dataset group which contains one or several datasets that Amazon Personalize can use for training.
- (Optional) Add an event tracker. To add an event to train a model, you must add a tracking ID to associate the event with your dataset group. Take a look at the following code to see how it should look. For more information, check out Getting a Tracking ID.
- (Also optional) Add an event record. To add more data in training and create a better model, you can use events. Events are recorded user activities such as a search, a view, or a purchase.
Create a solution
After importing your data, create a solution and solution version. The solution contains the configurations to train a model. A solution version is a trained model. For a code sample, see AWS’s documentation on Creating a Solution.
When you create a solution version, evaluate its performance before proceeding. For a code sample, see Evaluating a Solution Version.
Create a campaign
After you train and evaluate your solution version, you can deploy it using a campaign. A campaign is an endpoint used to host a solution version and make recommendations to users. For a code sample, see the following code sample:
And finally, getting recommendations
After you create a campaign, you can use it to get recommendations. For a code sample, see the code below on getting a recommendation based on contextual metadata.
AWS provide a Jupyter (iPython) notebook to help you explore the Amazon Personalize APIs. With one exception, the Jupyter notebook has the same prerequisites as the Python examples in this guide. The notebook uses different source data and you don’t need the to create training data.
To get the Jupyter notebook, clone or download the notebook from the Amazon Personalize Samples repository.