Tracking, resolving, storing and presenting AKA.MS short links from social media using Serverless capabilities in Azure

Tracking, resolving, storing and presenting AKA.MS short links from social media using Serverless capabilities in Azure

I think I’ve written once or twice I’m a great fan of building Proof of Concepts and prototypes. This allows me to rapidly learn without fear or risk of failing in front of an important customer. And all customers who pay for your time are important, of course.

I’ve long admired the fantastic AKA.MS short links service Microsoft has internally created for their employees. Or that’s my understanding, as I see only Microsoft FTEs posting shortened URLs such as
http://aka.ms/surfacethewomen (a great campaign, by the way!) on social media and sometimes also on docs.microsoft.com.

Note: I have no access, direct or indirect information about AKA.MS other than what I see publicly being used on social media. This post is based on my own findings.

I figured it would be interesting to store these shortened links. Maybe to share them later (and thus hopefully providing the person from Microsoft with continuous statistics that I hope they’re gathering) and also to use myself for looking up interesting content I might not easily find through search engines.

Tracking AKA.MS short links

I chose to track these short links from Twitter, as I felt it’s the least obtrusive and has a great amount of data available. If you search for “aka.ms” on Twitter, you get dozens of pages of results.

I wanted to build my solution with the least amount of friction, overhead and cost as I’m generally paying prototypes such as these out of my own pocket. It also teaches me a lot about different services in Azure that I can then use when I design and build solutions for my customers.

I opted to use Logic Apps as the structural engine for tracking, resolving and storing AKA.MS links. Logic Apps allows me to rapidly prototype, re-design and modify a workflow-like solution that bends to needs.

The idea is very simple, although I spent about an hour shifting and iterating through different options. In the end, I ended up with a logic like this:

  1. Pick up tweets that match a search for aka.ms
  2. Verify there’s a URL in there
  3. Look up the real address behind the shortened URL
  4. Look up the title-property of the destination site
  5. Store these in a database
  6. Publish data for viewing

I’m using a simple condition to match the tweet to a URL first. I was scratching my head a few times as first my condition was set for “aka.ms” but I couldn’t find any tweets:

"contains": [
"@triggerBody()?['OriginalTweet'
['TweetText']",
"https://aka.ms" ]

Turns out Twitter processes all URLs through their own link shortener called t.co. It kind of makes sense as they need to gather their own statistics also. I changed the condition in the Logic App from “aka.ms” to “t.co” and it started triggering correctly.

I hope there’s some logic in all this

When this condition becomes true, it passed the tweet content through additional logic that performs steps 3-6 from the list above.

Resolving and storing AKA.MS short links

I wanted to use a serverless approach, ie. something that runs in the cloud without a separately provisioned virtual machine or a similar construct that would require periodical attention. Azure Functions, which I used extensively when building the Tenant ID Lookup Tool is optimal for something like this.

One of the learnings I recall from spending time at numerous Microsoft events and conferences is to keep things simple. This is especially relevant with Azure Functions so I generally aim for simple methods that fit in one screen.

Thankfully I’ve got a large screen at home where I’m writing this.

The first Function is responsible for finding the exact aka.ms URL and resolving the URL it’s advertising. If you’re keeping track of things, we’re essentially doing a lookup t.co –> aka.ms –> target site. The logic is quite simple, and I’m sure someone with more time, persistence and effort could probably do this much more efficiently. But as I said before, the idea is to keep this simple and easy to understand.

This first Function is called akaMSResolverFunction and it resolves t.co mangled addresses to the actual aka.ms addresses. This is done by performing a HTTP GET and picking up the HTTP 301 permanent URL redirection. The response header then contains the target URL that is of interest for us.

A sample network capture from Chrome shows a page load for https://aka.ms/surfacethewomen and the HTTP 301 selected from the left navigation:

Location field reveals the the target we need. I’m picking that up and using it in another Function later on.

First, I use a traditional regular expression in C# to verify something useful is about to happen – this complex-looking regex is used to match a valid URL:

Regex regex = new Regex("https://([\\w+?\\.\\w+])+([a-zA-Z0-9\\~\\!\\@\\#\\$\\%\\^\\&\\*\\(\\)_\\-\\=\\+\\\\\\/\\?\\.\\:\\;\\'\\,]*)?", RegexOptions.IgnoreCase);
Match match = regex.Match(tweetContent); 

string akaMSURL = "";
HttpClientHandler httpClientHandler = new HttpClientHandler();
httpClientHandler.AllowAutoRedirect = false;

HttpClient httpClient = new HttpClient(httpClientHandler); 

And while I’m at it, it’s easy to build the HTTP Client Handler. I needed the regex as I found my Logic App to occasionally slip through content that did not match my needs. It does iterate through multiple links but I didn’t built this logic into the actual Function yet.

Note: You might be wondering while reading this that how come I’m using Azure Functions for aka.ms addresses – where did the t.co go? Well, the connector for picking up the tweets passes the aka.ms from the raw data so we save an extra step.

Next within the akaMSResolverFunction I’m iterating through the collection of matches from the regex and capturing the aka.ms addresses:

if (match.Success)
{    
	while (match.Success) {

		var response = await httpClient.GetAsync(match.Value);
		string untested_akaMSURL = response.Headers.GetValues("Location").FirstOrDefault();
		
		// verify we have aka.ms URL 
		if (untested_akaMSURL.Contains("aka.ms"))
		{
			akaMSURL = untested_akaMSURL; 
		}

		match = match.NextMatch(); 
	}
}

And this completes the first function and returns a valid aka.ms URL back to my Logic App.

The next Function is named akaMSLookupFunction and it looks up what’s behind the aka.MS link that is received from akaMSResolverFunction. It’s super simple:

HttpClient httpClient = new HttpClient(); 

var response = await httpClient.GetByteArrayAsync(akaMSURL);
string html = Encoding.UTF8.GetString(response);

string title = Regex.Match(html, @"\<title\b[^>]*\>\s*(?<Title>[\s\S]*?)\</title>", RegexOptions.IgnoreCase).Groups["Title"].Value;

I start by constructing a HTTP Client, getting the response for the HTTP Get (against the aka.MS URL, not the t.co URL) and then fetching the title property from the default page using regex. This is used to store the results in a database with a nice and searchable title.

The third step in the Logic App is to store these findings into a Azure SQL database. Logic App has a native connector for this, and it dynamically provides the table structure for population.

Being an accidental DBA has its perks — you can be dangerous in one simple step!

I chose to store all findings in one table and to keep it simple the datatypes are far from optimized.

“If in doubt, use nvarchar and null”

The last Function is akaMSDump, which retrieves a list of recent aka.ms links and constructs a presentable HTML snippet. It connects directly to the same Azure SQL database and selects the top 15 entries:

var str = ConfigurationManager.ConnectionStrings["sqldb_connection"].ConnectionString;
using (SqlConnection conn = new SqlConnection(str))
{
	conn.Open();
	var text = "SELECT TOP 15 akaMSURL, akaMSTitle, timestamp FROM akaMS WHERE akaMSURL IS NOT NULL AND akaMSTitle IS NOT NULL ORDER BY timestamp DESC";

	using (SqlCommand cmd = new SqlCommand(text, conn))
	{
		// Execute the command and log the # rows affected.
		SqlDataReader reader = await cmd.ExecuteReaderAsync();

		if (reader.HasRows)
		{
			while (reader.Read())
			{
				string akaMSURL = reader.GetString(0);
				string akaMSTitle = reader.GetString(1);
				sb.Append("<li>" + akaMSTitle + " (<a href='" + akaMSURL + "'>" + akaMSURL + "</a>)</li>");
			}
		}
	}
}

It’s very old school and I intend to optimize this further when things inevitably start slowing down with enough entries in the database.

Finally, akaMSDump completes by writing the HTML content to a Blob in Azure Storage:

string accessKey;
string accountName;
string connectionString;

CloudStorageAccount storageAccount;
CloudBlobClient client;
CloudBlobContainer container;
CloudBlockBlob blob;

accessKey = ConfigurationManager.AppSettings["storagekey"];
accountName = ConfigurationManager.AppSettings["storageaccount"];
connectionString = "DefaultEndpointsProtocol=https;AccountName=" + accountName + ";AccountKey=" + accessKey + ";EndpointSuffix=core.windows.net";
storageAccount = CloudStorageAccount.Parse(connectionString);

client = storageAccount.CreateCloudBlobClient();
container = client.GetContainerReference("html");

blob = container.GetBlockBlobReference("urls.html");

using (Stream stream = new MemoryStream(Encoding.UTF8.GetBytes(sb.ToString())))
{
	await blob.UploadFromStreamAsync(stream);
}

Combining these three Functions and a connector to Azure SQL makes the Logic App logic look like this:

There’s some overlap in running akaMSDump almost continuously, but so far that hasn’t stopped the solution from performing well. The amount of aka.ms entries I currently have is about 100,000 stored within the table.

The Azure SQL database is a Basic Tier, costing me around 4.21 €/month so it’s a worthwhile operational expense to have. It’s also obvious that having akaMSDump so actively dumping data it’s hitting the database a bit too much at times.

Entries within the database look clean:

Presenting the data

I wanted something I can easily demo, so obviously a website was needed. I’m running a simple Web App in Azure, using the Free Plan. It’s called the AKA.MS Tracker.

If you have a look at the source, there’s only a simple jQuery to embed data from Azure Storage:

$(function(){     $("#urls").load("https://akamsstg.blob.core.windows.net/html/urls.html"); }); 

I needed to configure CORS to allow this Web App to access the HTML file that is generated from akaMSDump Function:

In summary

Yet another fun and interesting project to work on. I’m amazed at the capabilities Azure Functions continues to provide. For this project I built all code – admittedly not much – by hand within Azure Portal and did not resort to using Visual Studio or VS Code this time.

I could now host the static website on Azure Storage, but at the time I built the initial skeleton implementation this option wasn’t available. And I like having the expanded options to scale and configure a real Web App, as opposed to just having a container in Azure Storage that is accessible through HTTPS.

It’s obvious that without further optimization and a bit of refactoring Azure SQL database is the bottleneck here. This is evident also from akaMSDump hitting the database like there’s no tomorrow:

It’s been interesting to see how scaling, refactoring and optimization affects different parts of the solution. Driving all this through Logic Apps gives a very smooth and agile way for changing the overall architecture, phasing out things I don’t need and introducing new capabilities I might want — all without interrupting the service.

Total cost for running this solution over a month is 4.78 €. Less than a cup of cappuccino in Helsinki!

I hope you enjoyed reading this as much as I enjoyed building this. Thanks for reading!