## Problem Definition

In the recommendation systems assignment on Data Mining II class, we have 8 users, and they have rated 8 different albums on a scale of 1 to 5 from `songData`

(Note that not all users have rated all albums). After finishing this assignment, we will find the recommendations for `userX = "Veronica"`

based on Pearson Correlation similarity.

## Pearson Correlation Similarity Function

For the analysis of song recommendations, we want to build our own Pearson Correlation function.

# Class instantiation

def __init__ (self, ratingP, ratingQ):

self.ratings1 = ratingP

self.ratings2 = ratingQ# Pearson Correlation between two vectors

def pearson(self):

sum_p = 0

sum_q = 0

sum_pq = 0

square_sum_p = 0

square_sum_q = 0

The first step is that we set n to the number of common keys and do error check for n=0 condition and return -2 if n=0.

` n = len(set(self.ratings1.keys()) & set(self.ratings2.keys()))`

if n == 0:

return -2

Next step, we use a single for loop to calculate the partial sums in the computationally efficient form of the Pearson Correlation.

` for k in (set(self.ratings1.keys())&set(self.ratings2.keys())):`

p = self.ratings1[k]

q = self.ratings2[k]

sum_p += p

sum_q += q

sum_pq += p * q

square_sum_p += pow(p, 2)

square_sum_q += pow(q, 2)

Then, we calculate the numerator and denominator term for Pearson Correlation using relevant partial sums. Again, we do error check for denominator=0 condition and return -2 if denominator=0.

` numerator = sum_pq - ((sum_p * sum_q)/n)`

denominator = ((square_sum_p - pow(sum_p, 2)/n) * (square_sum_q - pow(sum_q, 2)/n)) ** 0.5

if denominator == 0:

return -2

Finally, we calculate the Pearson Correlation using the numerator and denominator and return the Pearson Correlation.

` result = numerator/denominator`

return result

## Find the Song Recommendations

In order to find the recommendations for `userX = "Veronica"`

based on Pearson Correlation similarity, we have to emphasize the problem definition that we want to find the recommendations for `userX = "Veronica"`

.

`userX = "Veronica"`

userXRatings = songData[userX]

from operator import itemgetter

First, we find the similarity measure based on Pearson Correlation between userX’s ratings and each of the other user’s ratings.

`userSimilarities = []`

for userY, userYRatings in songData.items():

if userY != userX:

s = similarity(songData[userX], songData[userY])

pearsonXY = s.pearson()

lst = (userY, pearsonXY)

userSimilarities.append(lst)

Then, we sort the list of tuples by highest similarity to lowest similarity and assign the sorted list to a variable called sortedUserSimilarities. Also, we set the variable called userXNN is the the user at the 0th position of the sorted list.

`sortedUserSimilarities = sorted(userSimilarities, key=itemgetter(1), reverse=True)`

userXNN = sortedUserSimilarities[0][0]

Next step, we include albums rated by userXNN into userX and assign the list of (album, rating) tuples to a variable called userXRecos.

`userXRecos = []`

for x in songData[userXNN]:

if x not in songData[userX]:

userXRecos.append((x, songData[userXNN][x]))

print(userXRecos)

Finally, we sort list of tuples by highest rating to lowest rating and assign sorted list to a variable userXSortedRecos.

`userXSortedRecos = []`

userXSortedRecos = sorted(userXRecos, key=itemgetter(1))

## Conclusion

Based on the result we have, we can distinguish the song recommendations for `userX = "Veronica"`

are about `('Broken Bells', 2.0), ('Vampire Weekend', 2.0)`

.

print ("Recommendations for", userX)

print (userXSortedRecos)[('Broken Bells', 2.0), ('Vampire Weekend', 2.0)]

Source code that created this article can be found in my Github.

Credit: BecomingHuman By: Shirley Chen