We Heart It A Brief History of Social Bookmarking Hi! Its me again! I’m back with more social media goodness to share. This time round, I’m touching on the brief history of social bookmarking and the advent of the image bookmarking phenomenon, PLUS a list of 10 image bookmarking sites (and 2 more!) and the seo benefits of image bookmarking. Bargain! UPDATE 17th May: Rand fishkin at SMX London has just confirmed that image ALT tags weigh more than H1 tags. As SEOs we are very much aware of the benefits of using social bookmarking as part of linkbuilding. Sites like Digg, Reddit and Stumbleupon are considered mandatory: bookmarking your blog posts and websites not only helps increases traffic to your webpage, it helps create a good mix of backlinks in your collection. From Social To Viral (The term viral here does not exclusively refer to videos that has generated a considerable number of hits in a short period of time, rather, an umbrella marketing term that refers to the use of existing social networks to produce an increase number of mentions / awareness on a particular topic, brand or trend) Sites like Digg, especially, has the potential of making your bookmarked link go viral. Essentially, you’re not just bookmarking a link, you are creating conversations around the topic in the link: Digg allows its users to comment on the link and share it with friends on twitter and facebook. Its no surprise that its popularity has spawned a great many number of digg-clone sites, most of them perusing the pligg tool to create their own social bookmarking sites. Not all of them are great but some of them are getting there: you can check out this massive list of digg-clone social bookmarking sites sorted according to page rank, alexa rank, dofollow and popularity: Social Bookmarking Sites Listed in Order of Pagerank, Alexa Rank, Popularity and DoFollow . Now here’s the thing: like directories, social bookmarking can be useful but also tedious and boring. Going through that list of social bookmarking sites you realize that not all of them have that sense of community, they try hard to emulate Digg and may succeed at its basic function, but the end result is just a mind-numbing collection of spammy looking links. The other problem is that: how many real humans go through these sites to search for information and inspiration? The Start of Image Bookmarking Enter image bookmarking. I love image bookmarking. Everybody loves looking at images. They are colorful, beautiful and they speak louder than a 500-word keyword rich article in an article website nobody reads. Image bookmarking came about after the popularity of design blogs: people don’t just want to rely on the sometimes infrequent updates of design blogs to get their daily dose of inspiration, they want to submit and share their own finds too. A List of 10 Image Bookmarking Sites + 2 more At the moment, I can only find 10 image bookmarking sites on the net. I am quite surprised this technique hasn’t caught on yet. WeHeartIt A simple image bookmarking site, open to everyone. Simply create an account and start submitting. They have a special bookmarklet which you can drag and drop into your browser so the next time you trawl the web and spot an amazing image, just click on it to submit to the site. Allows its members to heart their favorite image from the pool. The more hearts an image gets, the more popular it is. mages in here fall mostly into the photography catergory, the kind that is heavily filtered, warm-lensed and vintage looking. Vi.sualize.us Supposedly the first ever image bookmarking website. The owner wanted to create a bookmarking site that is not elitist and is open to all as well as mantaining its credibility as a truly inspirational visual website. Simply create an account and start posting. You can also download a plugin for your browser. Members can like an image and even post comments about it. Typeish A closed bookmarking community - and for a good reason! This is an image bookmarking community that carefully selects the images it displays on the site. And you can tell: the images all fall into a sort of artistic / design theme. To join, you need to email them and ask / beg for an invite. FFFFound FFFFound! Probably the premier image bookmarking site on the internet right now. It emerged after Vi.sualize.us and started off as a pretty simple and straight to the point image bookmarking site that allows you to register an account and post images. Its popularity forced it to close registrations and now you can only join FFFFound if you have an invite. Images in here fall strictly into the design, artistic and inspiration theme. IMGFave A simple, WeHeartIt clone made on Tumblr. Condense A french image bookmarking site. Currently a closed community but it intends on opening registrations soon. Images strictly into the graphic design spectrum: typography, architecture, packaging and ads. Picocool Another closed community image bookmarking site, but I wouldn’t call it inspiring really. The website looks bland in comparison to the rest I have mentioned here. You need an invite before you can even register, which is a downer. Yayeveryday One of THE BEST image bookmarking sites out there, except that the emphasis is on the artists themselves: original works / images made and submitted by the users. It is a community of artists, designers, photographers and the people who appreciate them. Users get dedicated profile pages that credits their work, websites, fans, etc. Members can comment on each other’s submissions. Enjoysthin.gs Simply, a place to share and save things you enjoy. People submit their favorite image, and users can rate the image by enjoying it. The more enjoys an image gets, the more popular it is. And a few more similar ones: Lookbook.nu A fashion community site that allows users to submit images of themselves wearing fashionable or stylish items of clothing. Members can hype a particular image and share the image on twitter and facebook. This is a large growing community already with a japanese version. The site cross promotes each and every submission in its own various microsites and social profiles on tumblr, facebook, twitter etc. Polyvore Similar to Lookbook, except that you can also buy the looks. Users can create looks from available items for sale on the site and images of their own and create style inspiration called sets. deviantART A community site that emerged during the livejournal craze. Oh man, I still remember when livejournal was awesome. Nostalgia. Anyway, deviantART is where users can create profile pages, post, discuss share and rate each other’s submissions. It is one of the largest social networking sites for emerging, amatuer and established artists and art enthusiasts with more than 13 million registered users. The SEO Benefits of Image Bookmarking Image bookmarking has the added benefit of going viral quicker than a simple text link. This is because sites like those mentioned above don’t just display your images, it also saves the link in it as well. We Heart It does not use the nofollow attribute on its links. So does Typeish and Enjoythi.gs. All these sites are a minimum of PR 5, and FFFFound doesn’t just keep your link, its saves the alt tags and title of the post it was submitted from as well. The plus side is that you don’t need to be an artist, designer or photographer to participate. As long as the image / content is interesting enough, you’ll make the cut. This also inspires and motivates you to create interesting and unique ideas and ways to market your site / brand. Also, if you are clever enough to replicate these websites, you will see how easy it is to get free content easily, sub-automatic community-driven and daily at that. A great, simple and legit link-baiting technique! Example of Image that has received many Hypes When a member submits an image that has received many hypes, likes or enjoys, they are sure to link back to the post from their own blog to show this off. People like to be popular and people love it when they get good ratings. The backlinks for you will just keep pouring in. If you add a link (like your client’s) with the image and if it gets reblogged and goes viral, all you gotta do is just harvest the links that gets generated. There is also the added bonus that these backlinks are all dofollows. I have also noticed that sites like these get a high Pagerank quicker than normal blogs. (Some of those sites mentioned above, according to their whois records were only created recently, between late 2007-2008.) Of course, the age old argument that an image’s alt tag does not weigh as much as anchor text on a text link will surface, but at the end of the day, a link is still a link and spiders can only read images as text if you leave the alt tags in. How do I know this works? Coz I’v tried it, look: Image Bookmarking Linkbuilding Why create directories and bookmarking sites when you can create image bookmarking sites? 🙂

Blog

An Overview of Linear Regression and Gradient Descent in Python using Google Ads Data

Home

Blog

Simran Gill

12 June 2020

Over the past few years there has been a real buzz over machine learning and its possible applications, as shown by

This post does assume prior knowledge of python and linear algebra.

What is Machine Learning?

Machine learning is defined as the field of study that gives computers the ability to learn without being explicitly programmed. A way of thinking about this is that a program is said to learn if a programs performance on a specific task improves with experience.

Linear Regression

For simple linear regression I believe it will be best to demonstrate with our Google ads example. For this post we will just be going over single variable linear regression. To generate the data I simply downloaded a day to day report from one of our clients accounts for the past 3 years which contained, among other variables, cost per conversions and the conversion rate for each day. When exploring the data I thought that it may seem logical to see that higher conversion rates lead to a lower cost per conversion, so let’s explore this further.

The first step is to visualise our data to see if there is indeed a relationship.

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
data = pd.read_excel("C:\\...\\googleads.xlsx")
#save data as numpy arrays of size (n,1), where n is the number of examples
X = np.array(data['Conv. rate'])
X = np.reshape(X,(len(X),1))
Y = np.array(data['Cost / conv.'])
Y = np.reshape(Y,(len(Y),1))
#plot data to investigate relationship
plt.figure(1)
plt.scatter(X,Y,s=8,marker = 'x')
plt.ylabel("Cost Per Conversion")
plt.xlabel("Conversion Rate")

Here I am saving our X and Y variables as matrices (or column vectors in this case), the reason for doing this will become apparent later on.

Our plot seems to show a relationship between the two but from the shape of the data it appears it may be a power law.

A power law means our variables follow the relationship Y = aX^k, where a and k are constants . The easiest way to find our coefficients a and k is to plot our values on a log-log graph instead. When data that follows a power law is fitted on a log-log graph the result is a linear graph. This is due to the fact that if we take logs of our power law equation we yield log(Y) = klog(X) + log(a) which resembles the well known linear equation Y = mX + c. Here k = m and c = log(a). The base that we choose for our logarithm doesn’t matter, you just need to take note of it to pull out a from c = log(a) after we calculate our coefficient c.

So lets plot our data on a logarithmic scale. For reference I will be using a log of base 10 but natural logarithms or any other base are fine.

#suspected power law so lets see what a log log graph looks like
X_log = np.log10(X)
Y_log = np.log10(Y)
plt.figure(2)
plt.scatter(X_log, Y_log, s = 8, marker = 'x')
plt.ylabel("Log of Cost Per Conversion")
plt.xlabel("Log of Conversion Rate")

This seems like a promising correlation. Now to fit our line of best fit.

Hypothesis

From the data we can see that a fitted line of the equation h_θ(X) = θ₀ + θ₁x₁would work well. This is called our hypothesis and has the expected form Y = c + mX.

To make this simpler to write we define 2 column vectors Θ = [ θ₀; θ₁ ] and X = [ x₀ ; x₁ ] , where x₀ = 1. Here our Θ vector is a vector of our parameters that we need to fit. Our X vector is a vector of our features and in this example we only have a single feature, the logarithm of our conversion rate. Defining these vectors allows us to write our hypothesis as h_θ(X) = Θ^TX. Here Θ^T is the transpose of the Θ vector and usual matrix multiplication rules apply.

As an aside we could actually fit higher order polynomial terms, such as x₁², and it would still be classed as linear regression as our parameters still scale linearly. For example if h_θ(X) = θ₀ + θ₁x₁+θ₂x₂, our x₂ feature could equal x₁². To decide if these polynomial features should be included in your model its useful to explore things like residual plots, bias/variance curves and statistical fitting parameters such as the R² value, however to keep the blog short I won’t go into that exploration for this data.

Cost Function – Theory

To decide if a fitted line is actually a good fit to our data we need to define a value that indicates this. This is our cost function. A cost function represents a cost associated with fitting that specific line to our data. A common cost function used for linear regression is the mean squared error. Lets go through this step by step.

Say we have a hypothesis h_θ(X) that has some pre-determined parameter values. For a single examples feature vector, X(i), our hypothesis has an output of h_θ(X(i)). We first look at the difference between this output and the actual expected output, Y(i), to obtain (h_θ(X(i)) – Y(i)). We then square this to avoid negative values which would make this cost function useless for our purpose, (h_θ(X(i)) – Y(i))². Now we want to sum over all of our examples and find a mean, (1/(2m))*∑_i((h_θ(X(i)) – Y(i))²). Here m is the number of examples and we’ve also multiplied by a half for simplicity in later steps.

So our final cost function is:

J(Θ) = (1/(2m))*∑_i((h_θ(X(i)) – Y(i))²)

The idea here is that the values of our parameters that give the lowest possible value of our cost function results in the best possible fit to our data. So in order to find the best values of our parameters we have to somehow minimise our cost function.

Cost Function – Plot

Now we know we want to minimise the cost function to obtain our optimised parameters it may be useful to plot out how our cost function varies with θ₀ and θ₁. One thing we know about our cost function is that it will always be convex and only have a single minima which makes our life easier.

Our first step in plotting our graph is creating a function to return the value of our cost function for a specific value of θ₀ and θ₁. I will add comments to our code to explain each step.

def cost_function(theta_0, theta_1, X, Y):
    #build a column vector out of our input parameter values of size (2,1)
    theta_vec = np.array([[theta_0],[theta_1]])
    #create a matrix of a column of ones (for our x_0 parameter) and a column of our example values
    X_mat = np.concatenate((np.ones((len(X),1)),X),axis=1) 
    #vectorized approach to find our output value for each example
    hyp = X_mat@theta_vec
    #vectorized approach to calculate the total cost function
    cost_func = (1/(2*len(X)))*(((hyp - Y).T)@(hyp - Y))
    return cost_func[0,0]

In order to make our calculations as efficient as possible we implement a vectorized approach as above. This removes the need for for loops and makes the function much less computationally expensive. This approach isn’t really necessary for our application as using for loops instead wouldn’t make a massive difference to execution speed. However when moving onto bigger data sets and looking at using something like multivariate linear regression this could make a huge impact on execution speed so it’s good practice to use it now. For clarity, .T and @ in the above code represent the transpose of a matrix and the dot product for matrix multiplication, respectively.

It’s possible to choose a sensible range of parameters just by looking at our log-log graph above. We expect θ₀ to be approximately in the range 0 to 1. We also expect θ₁to definitely be negative and approximately be between 0 and -1. Having said this lets choose a bigger range for a nicer cost function graph.

#generate our parameters vs cost graph data
theta_0_arr = []
theta_1_arr = []
cost_func_arr = []
for theta_0 in np.linspace(-3,3, num = 100):
    for theta_1 in np.linspace(-3,3, num = 100):
        theta_0_arr.append(theta_0)
        theta_1_arr.append(theta_1)
        cost_func_arr.append(cost_function(theta_0, theta_1, X_log, Y_log))

Now for the plot. Our graph has a fairly shallow gradient differential in one direction on our graph so a traditional 3D scatter plot wouldn’t help us much as our minima isn’t obvious. Instead let’s plot a contour plot and see where our approximate minima occurs.

import scipy.interpolate
#plot our data onto a contour plot to get an idea of where gradient descent should take us      
N = 1000
xi = np.linspace(min(theta_0_arr), max(theta_0_arr), N)
yi = np.linspace(min(theta_1_arr), max(theta_1_arr), N)
zi = scipy.interpolate.griddata((theta_0_arr, theta_1_arr), cost_func_arr, (xi[None,:], yi[:,None]), method = 'cubic')
#levels sets out array of levels to set lines at (chosen based around min(cost_func_arr))
levels = np.arange(0,1,0.05)
plt.contour(xi, yi, zi, levels = levels)
plt.xlabel("Theta 0")
plt.ylabel("Theta 1")

Here we are ignoring cost function value levels higher than 1 as this is of no interest to us.

Not the prettiest plot but placing my mouse over where I best believe the centre to be reveals an approximate θ₀= 0.24 and θ₁= -0.85.

However, we can do much better than a just-by-eye fit. This is where the gradient descent algorithm comes in.

Gradient Descent

For linear regression the values of our parameters can actually be found numerically and there are other more complex methods which have certain advantages over gradient descent that can also be used. Having said this, the gradient descent algorithm is a simple algorithm that gives a nice intuition into exactly what we are trying to do. So let’s take a look.

The algorithm is to repeat until convergence:

θ_j = θ_j – α*partial differential of J(Θ)

For all parameters j, updating all of them simultaneously. Here α is a constant. This makes sense logically as we are taking our value for θ_jand are taking away the gradient of our cost function at θ_j. This means that on our graph we are travelling down the contour lines and, if we repeat the process enough, we will eventually arrive at our minima.

Using θ₁ as an example it is quite straightforward to show that our partial differential is:

(1/m)∑_i(h_θ(X(i)) – Y(i))x₁(i)

Therefore our gradient descent algorithm for θ₁ becomes:

θ₁ = θ₁ – α*(1/m)∑_i(h_θ(X(i)) – Y(i))x₁(i)

This can actually be vectorized to be used for our Θ vector. This updates all of our parameters simultaneously. So in our code we will repeat

Θ = Θ – α*(1/m)X_mat^T(X_matΘ – Y)

until convergence, where X_mat is our matrix of ones and example values as defined in our code snippet for the cost function.

So let’s build our gradient descent algorithm in python.

def gradient_descent(initial_theta_0, initial_theta_1, X, Y, alpha, iterations):
    theta_vec = np.array([[initial_theta_0],[initial_theta_1]])
    X_mat = np.concatenate((np.ones((len(X),1)),X),axis=1)
    for i in range(0,iterations):
        hyp = X_mat@theta_vec
        theta_vec = theta_vec - alpha*(1/len(X))*(X_mat.T)@(hyp - Y)
    return theta_vec

It is important to not choose an alpha value that is too large as this can actually lead to our function diverging as the algorithm completely misses the minima and higher and higher values for the modulus of our differential are attained.

Now let’s run this with a value of 0.1 for alpha and for 10000 iterations. These values were chosen via experimentation and could be improved upon. This is done by creating plots for how different values of alpha influence the cost function value over our iteration range.

init_theta_0 = 10
init_theta_1 = 10
cost_function_init = cost_function(init_theta_0, init_theta_1, X_log, Y_log)
optimised_theta = gradient_descent(init_theta_0, init_theta_1, X_log, Y_log, 0.1, 10000)
cost_function_at_optimised = cost_function(optimised_theta[0,0],optimised_theta[1,0],X_log, Y_log)

Our initial values for our parameters were chosen at random. For our initial values our cost function value is 35.97. After our gradient descent the cost function dropped to 0.018. Our parameter values at this cost function value are θ₀= 0.2309 and θ₁= -0.8643. These values are around the values we expected form our cost function plot and are not far off from our just-by-eye fit values.

Final Results

Our hypothesis for our log-log graph is h_θ(X) = 0.2309 – 0.8643x₁. This means our values for our power law coefficients are k = -0.8643 and log(a) = 0.2309 therefore a = 1.702.

So our final relationship from our findings appears to be

Cost/Conv = 1.702*conv rate^-0.8643

Here is how our model fits to our data

For a simple linear regression our model didn’t come out too bad. It seems to struggle at extremely low conversion rates with high costs per conversion however captures a general trend in the data. Our model shows that for lower conversion rates just a slight increase in the conversion rate could result in a large decrease in cost for each conversion. Easier said than done I know but CRO is a service offered right here at Blueclaw.

Our model may be improved by capturing further data at these extreme points. Having said that, there are probably other variables that impact our cost per conversion that our model doesn’t account for. We could improve this by moving to something like a multivariate regression approach that may include things like polynomial features and cross-interaction terms.

For reference the above curve has an R² value of 0.79.

Final Thoughts

The main purpose of this blog post was to introduce the type of maths involved in machine learning. In practice, something like this could be done in a few minutes in excel, however when moving onto more complex analysis and models it’s necessary to move to something like python.

The maths involved above also provide a nice basis for more complex methods like multivariate linear regression, logistic regression and even neural networks. Even then there are plenty of Python libraries like keras, pytorch and sklearn that can build these models rapidly. Still I believe it’s useful to know the kind of maths going on behind the scenes to be able to potentially modify these models and at the very least understand the inputs, outputs and the arguments used in these pre-built functions.

To discuss anything mentioned in this blog post, or to have a chat about how machine learning can be implented into your digital strategy, get in touch.

Written by

Simran Gill

Contact.

We’re always keen to talk search marketing.

We’d love to chat with you about your next project and goals, or simply share some additional insight into the industry and how we could potentially work together to drive growth.

from XLMedia

We believe that great work comes from a passion for all things digital and an absolute commitment to excellence.