Conversations with weighted shoulders - Rediscovering old reports is scary

Not long ago @ftrain and I had this conversation on twitter. This is the post I said I would write, ta-da.. or something. I’m not that great at blogging and I tend to ramble so bear with me a little.

The projects I was working on were primarily related to updating some bio-informatic applications from their existing C# and Windows Foundation implementation to HTML5 and JavaScript.

Now the reports and code I had orignally written aren’t as directly related as I remember. But the conclusions I came to have stuck with me and I’ve been mulling them over ever since.

The first report is related to using the GPU via WebGL for computations and visualisation of Sammon Map data. For investigating the relationships between genomes and genes for various organisms etc.

It can be found here. Don’t laugh too hard! This was my first real attempt at GPU programming for visualisation and computation, but it was immediately appealing.

The second report is here. This report was quite well received by my supervisor and led to this presentation.

The second report was related to a project that was targetting a larger web application that was being designed to collect various visualisations in a modular structure. One of the problems we faced was that because we sometimes had larger datasets, the front end would freeze whilst the computations were performed and the visualisation created, and finally rendered.

My solution was to take the data and chunk-a-fi it, perform the computation within a Web Worker and return that package to the front end, where the main thread would handle the visualisation. This meant we could handle much larger data sets and still have a responsive front end, with a visualisation that was handling the constant stream of input data from the Worker.

I am getting to a point, just bear with me.

The part that I ended up being most interested in, but not in a position (at the time) to follow through with. Was the Bonus Level from the presentation…

Any device capable of viewing a web page and running JavaScript would be able to lend its cores to a large compute problem.

Obviously the people running those bitcoin ads already worked this out and to a lot of people this wouldn’t be new information…

But to me this seemed really really awesome.. :D

Given a truly massive dataset and an embarrassingly parallel computation that needs to be performed. I think it should be possible to create an application that collects a data packet from a central source, perform the computation and return the result. Asking for another packet if it’s appropriate.

More importantly, this should be possible to achieve using a web based application that can co-exist on a webpage, or be the sole purpose of the page itself. Thus representing a ‘distributed by design’ application for mass computation. You don’t even have to design the delivery mechanism or create operating system specific binaries that then require constant maintenance. …Okay mostly. Cross browser functionality and close-to-universal web apps aren’t simple beasts.

This sort of thing has been done before, the finest example I know of is the SETI @ Home program that operates in an almost identical fashion.

Workplace Example

In a given office you may have ~100 employees, of those, maybe 80 have a smartphone. Each smartphone may have anywhere from 1-4 cores, with 1-4 or even more cores on the GPU. Additionally there may be many tablet owners amongst them. So maybe another 40 tablets, with 1-4 cores and even MORE GPU cores available.

That’s not counting the desktop machines around the place, but for this examples lets assume everybody needs those machines dedicated to their current task.

Using napkin pseudo-nothing-like-maths, that’s…

(80 * avg(1-4 cpu_cores) & avg(1-n GPU cores)) + (40 * avg(1-4 CPU cores) & avg(1-n GPU cores))

That makes for many many cores that for a large proportion of the day are sitting, relatively, idle.

A web page that was established on the internal network could contain a dedicated page that would contain the processing application. Happily and quietly exchanging packages and results with a supervisor and utilising all the devices in an internal distributed computing cluster that is:

Quiet
Low power
Requires minimal, if any, additional infrastructure
Admittedly only really works during office hours. ahem

University / Large Business Example

Basically the same example above except with maybe ~10k students and staff, although the sysadmins may hate you when they see what you’re doing to their network throughput.

Other Bits…

For a large number of organisations this would allow them to have a in-house computing cluster with minimal expenditure on high end servers. Schools and other why-the-fudge-aren’t-they-better-funded organisations would be able to use them for all sorts of weird and wonderful things!

Another aspect is that the devices in question are often considered to be ‘low power’ devices. However if we’re suddenly utilising their full computing potential in this way, then we will quickly drain their batteries and annoy a lot of people.

Although there is a simple solution - “Just plug all the devices into the nearest USB port!”. This will add the cost of powering all of these devices to the cost of running the ‘cluster’.

An alternative to this and one that has been proven out by the truly brilliant RaspberryPI community, is to utilise solar power stations for charging all of the devices. Providing not only a free and effective method for keeping these devices charged under normal use. But in essence provides a solar powered, air cooled, silent, computing cluster!!

It should be obvious by now as well that there is MASSIVE potential for all of the devices that would normally be trashed. Simply plug them in to your ever growing cluster for more computing capability! You might need to upgrade your solar systems a little bit over time. But the savings on power and infrastructure investments alone should nicely cover that.

Enough from me

These ideas haven’t actually resulted in me creating anything yet. Which feels silly as the ideas, at least on the face of it, seem relatively straight forward.

“How hard can it be?”

Given my current obsession with learning programming languages I am sure I have sufficient tools in the box to achieve at least a prototype. I should probably get on that…