In the recommendation systems assignment on Data Mining II class, we have 8 users, and they have rated 8 different albums on a scale of 1 to 5 from
songData(Note that not all users have rated all albums). After finishing this assignment, we will find the recommendations for
userX = "Veronica" based on Pearson Correlation similarity.
Pearson Correlation Similarity Function
For the analysis of song recommendations, we want to build our own Pearson Correlation function.
# Class instantiation
def __init__ (self, ratingP, ratingQ):
self.ratings1 = ratingP
self.ratings2 = ratingQ# Pearson Correlation between two vectors
sum_p = 0
sum_q = 0
sum_pq = 0
square_sum_p = 0
square_sum_q = 0
The first step is that we set n to the number of common keys and do error check for n=0 condition and return -2 if n=0.
n = len(set(self.ratings1.keys()) & set(self.ratings2.keys()))
if n == 0:
Next step, we use a single for loop to calculate the partial sums in the computationally efficient form of the Pearson Correlation.
for k in (set(self.ratings1.keys())&set(self.ratings2.keys())):
p = self.ratings1[k]
q = self.ratings2[k]
sum_p += p
sum_q += q
sum_pq += p * q
square_sum_p += pow(p, 2)
square_sum_q += pow(q, 2)
Then, we calculate the numerator and denominator term for Pearson Correlation using relevant partial sums. Again, we do error check for denominator=0 condition and return -2 if denominator=0.
numerator = sum_pq - ((sum_p * sum_q)/n)
denominator = ((square_sum_p - pow(sum_p, 2)/n) * (square_sum_q - pow(sum_q, 2)/n)) ** 0.5
if denominator == 0:
Finally, we calculate the Pearson Correlation using the numerator and denominator and return the Pearson Correlation.
result = numerator/denominator
Find the Song Recommendations
In order to find the recommendations for
userX = "Veronica" based on Pearson Correlation similarity, we have to emphasize the problem definition that we want to find the recommendations for
userX = "Veronica".
userX = "Veronica"
userXRatings = songData[userX]
from operator import itemgetter
First, we find the similarity measure based on Pearson Correlation between userX’s ratings and each of the other user’s ratings.
userSimilarities = 
for userY, userYRatings in songData.items():
if userY != userX:
s = similarity(songData[userX], songData[userY])
pearsonXY = s.pearson()
lst = (userY, pearsonXY)
Then, we sort the list of tuples by highest similarity to lowest similarity and assign the sorted list to a variable called sortedUserSimilarities. Also, we set the variable called userXNN is the the user at the 0th position of the sorted list.
sortedUserSimilarities = sorted(userSimilarities, key=itemgetter(1), reverse=True)
userXNN = sortedUserSimilarities
Next step, we include albums rated by userXNN into userX and assign the list of (album, rating) tuples to a variable called userXRecos.
userXRecos = 
for x in songData[userXNN]:
if x not in songData[userX]:
Finally, we sort list of tuples by highest rating to lowest rating and assign sorted list to a variable userXSortedRecos.
userXSortedRecos = 
userXSortedRecos = sorted(userXRecos, key=itemgetter(1))
Based on the result we have, we can distinguish the song recommendations for
userX = "Veronica" are about
('Broken Bells', 2.0), ('Vampire Weekend', 2.0).
print ("Recommendations for", userX)
print (userXSortedRecos)[('Broken Bells', 2.0), ('Vampire Weekend', 2.0)]
Source code that created this article can be found in my Github.