Calculating similarity scores in computers that talk to similar computers

For this part of the exercise, I look at 2 IP Address and calculate similarity using Euclidean distance and Pearson correlation. I created a small dataset that is a nested dictionary. I did manual calculations, but python’s Pandas can work the numbers easily. I calculate the distance of Lisa from Kirk by isolating 1.1.1.1 and 2.2.2.2 and plot those on a graph.  I do it for each of the combinations of people and each of the combinations of IP addresses. I even find people that are very similar and one that is not as similar.  This model can help understand clusters and identify baseline conversations between people and visited IP addresses. Somehow it all makes sense to me.

talkers={‘Lisa’: {‘1.1.1.1’: 2.5, ‘2.2.2.2’: 3.5,
‘3.3.3.3’: 3.0, ‘4.4.4.4’: 3.5, ‘5.5.5.5’: 2.5,
‘6.6.6.6’: 3.0},
‘Kirk’: {‘1.1.1.1’: 3.0, ‘2.2.2.2’: 3.5,
‘3.3.3.3’: 1.5, ‘4.4.4.4’: 5.0, ‘6.6.6.6’: 3.0,
‘5.5.5.5’: 3.5},
‘Phillip’: {‘1.1.1.1’: 2.5, ‘2.2.2.2’: 3.0,
‘4.4.4.4’: 3.5, ‘6.6.6.6’: 4.0},
‘Dan’: {‘2.2.2.2’: 3.5, ‘3.3.3.3’: 3.0,
‘6.6.6.6’: 4.5, ‘4.4.4.4’: 4.0,
‘5.5.5.5’: 2.5},
‘James’: {‘1.1.1.1’: 3.0, ‘2.2.2.2’: 4.0,
‘3.3.3.3’: 2.0, ‘4.4.4.4’: 3.0, ‘6.6.6.6’: 3.0,
‘5.5.5.5’: 2.0},
‘Britney’: {‘1.1.1.1.’: 3.0, ‘2.2.2.2’: 4.0,
‘6.6.6.6’: 3.0, ‘4.4.4.4’: 5.0, ‘5.5.5.5’: 3.5},
‘Toby’: {‘2.2.2.2′:4.5,’5.5.5.5′:1.0,’4.4.4.4’:4.0}}

Advertisements
  1. Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: