Monday, May 5, 2014

Dynamic Distributed Processing with HTML5

One of the core technologies we are working on at Game Theory Labs is the ability to create dynamic processing clusters using browsers on any computing device.  The general idea is to apply crowd sourcing to CPU intensive problem solving.  Every browser that connects to the cluster becomes a processing node that shares a portion of the computational load.  Creating this platform on top of HTML5/JavaScript, any device with a browser (phone, tablet, laptop, etc.) can be added to the cluster just by visiting a webpage.

At the end of last year we had the privilege of being invited to speak at the INFORMS Annual Meeting to discuss the use of our HTML5 Dynamically Distributed Parallel Processing infrastructure in solving a Fixed Charge Network Flow (FCNF) problem using Binary Particle Swarm Optimization (BPSO).  You can check out the slide deck here.  In this post we want to discuss the processing infrastructure in more detail and show how HTML5 opens the doors for solving hugely complex problems with everyday devices.

Architecture

Overview
Figure 1.  Layout for sample cluster network


The architecture is comprised of three parts: Control Node (CN), Processing Node (PN) and a Particle.

The CN is the server and is used for signaling and synchronizing the cluster.  Each device that connects to the network is added as a PN where it registers its device type and Operating System.  The PN then spins off worker threads called Particles.  The Particles are where all data processing is preformed.  They are individual threads that are synchronized and controlled directly by the PN.   The CN will notify each PN how many Particles to create based upon user defined settings for each device type.  For example a phone may have one Particle running while a laptop could have 20.  The more computing power a device has, the more Particles it can run in parallel.  Figure 1 gives a layout for a small example network.



Figure 2.  Control Panel for Control Node for a sample cluster network
Control Node (CN)

The CN is built using NodeJS.  Since the CN acts as a signaling and synching device where no processing occurs, we need a server architecture that had a low overhead for a large number of IO connected devices.  NodeJS is built around a non-blocking single threaded event loop.  This works well to keep a very low overhead for connected devices and its non blocking event driven architecture allows for a massive amounts of connection with a minimal memory footprint.  eBay reported being able to host 120,000 simultaneous connections on a single server with each connection taking approximately 2KB of memory.  Not only is the overhead of NodeJS device connections small, it can be easily scaled horizontally giving a truly elastic backend.

Upon connecting to the CN via a WebSocket each devices registers its type (Desktop, Laptop, Tablet or Phone) and OS (iOS, Android, Windows, OSX, Linux).  The CN will instruct each connected device to create the appropriate number of Particles based upon the user defined parameters.  For example, Figure 2 shows the following network characteristics:


Device TypeConnectedParticles/Device
Desktop18
Laptop24
Tablet12
Phone11

Control Node Example Execution

Configure
User connects to the CN control panel (Figure 2) and configures the cluster network.  At a minimum the user must define the number of Particles per device type.  Caution should be given on creating too many Particles for devices with lower computing capability.  For example, given a phone with 8 Particles on a large problem will mostly likely cause the application to crash on that device and thereby remove it from the cluster network.  There is no rule on how many Particles per device that should be created, it fully depends on the problem size (memory required per Particle) and device processing capabilities.

Register
Once the CN is configured it is ready to accept connections from PNs.  As each PN connects the CN control panel will update to show the total number of active devices currently connected to the cluster.

Set Problem
After the desired number of PN have connected to the cluster network the user can send the desired problem set to each device.  In the current implementation, the problem is stored online and is accessed via a URL.  The current cluster is set to solve Linear Programming (LP) problems using Henry Gourvest JavaScript port of the Open Source Linear Programming Kit (GLPK).  Currently the problems must be written in MathProg which is a subset of AMPL.

Start
Now that the problem has been loaded on all PNs the user can start the processing.  When the user selects the 'Start' button on the control panel, the CN will issue a start command to each connected PN and begin listening for updates.

Fitness Updates
Since each PN operates independently, the only way for it to receive a better solution from another PN is by synchronizing with the CN.  Once a PN has found a solution that is better than any existing solution from the current Particles it controls it notifies the CN. The CN saves the best fitness score per PN and if the new solution is a global best it then passes that solution to all connected PN.  This allows the CN to know the best solution that currently exist on the cluster network and which PN found the solution as well as allowing each PN to take advantage of this new information and help guide its heuristics.  The control panel will display the current best solution fitness found as well as the time at which that solution was obtained.

Stop
Once the user is ready to stop processing they will press the 'Stop' button on the control panel.  This will issue a stop command to all connected PN.  At this point the CN can be queried to obtain a JSON object for all of the statistics about the current problem session.  The data set includes: Total elapsed time, each solution located, time solution was located, and PN which found solution.

At this  point the process can start over and solve a new problem.

Processing Node (PN)
Figure 3.  Processing Node control panel.
The PN exist on a client device (phone, tablet, laptop, etc) and acts as the communication bridge between the Particles doing the processing and the CN.  The PN therefore has two event loops: one for the CN and one for the Particles.

PN to CN Execution:
Upon loading the PN connects to the CN and registers its device type and OS.  Based upon this registration information the CN will respond with the number of Particles this PN should create for the current problem setup.  Once the user sets the problem on the CN, this information is passed to the PN where it grabs the associated problem file via an AJAX call.  The PN then parses the problem file and loads the appropriate data into each of the Particles created.  Once the problem is loaded the PN has three events it listens for from the CN:

  • Start:  PN receives a 'Start' event, it tells all loaded Particles to update and sets the local status to 'PROCESSING'
  • Fitness Update:  When the CN receives a new global best fitness, it sends that solution to all connected PNs.  The PN then distributed that data to each Particle so it can be utilized in the next processing cycle.
  • Stop: Sets local status to 'IDLE' which prevents further Update calls to Particles once they complete current processing cycle.

PN to Particle Execution:
Once the PN registers it creates all of the Particles requested for the given problem and then passes the problem data to each Particle.  After initialization of the Particle, the PN starts processing on a given Particle by issuing the "Update" message.  The PN then listens for the Particles 'Complete' event.  Once this event is fired, the PN checks the solution returned by the Particle.  If the solution is better than any other solution found by all Particles located on the PN (local) then it updates all local Particles with a new solution.  If the solution is better than any solution found on the entire network (global) then it sends solution to CN to update all PNs on the cluster.  The PN will then issue another 'Update' command to the Particle to force another calculation cycle, unless CN has issued a 'Stop' command which has put the local status to 'IDLE'


Particle
Each Particle on a PN is an independent thread.  They are each created as a WebWorker (you can read more about WebWorkers in this presentation Bringing The Sexy Back to WebWorkers) and execute in a completely independent JavaScript environment.  Each Particle can only communicate with the PN that spawned it via JSON messages.  The Particle is setup with the following default events, but more can be added depending on application.


  • Initialize:  The PN calls initialize once the problem has been defined and passes this information to the Particle so it can prepare for execution
  • Update:  When this event is triggered the Particle will perform a single execution on the problem and pass back the result to the PN for evaluation.
  • Fitness:  When the PN is updated with a new fitness (locally or globally) this event is triggered which allows the Particle to update heuristic information for the next execution cycle.
  • Solution:  This event will return the best solution located on the Particle.  This is only used after computation has stopped and data is being collected for analysis by the user.

Application
Figure 4. The Dynamically Distributed Cluster Network

Setup:
We took the implementation described above and created a small cluster network (Figure 4) that included the following devices: PC Desktop, PC Laptop, SurfacePro, MacBook Pro, iPad, iPhone, Nexus 7.  All Windows and Android devices used Chrome while iOS/OSX devices used Safari.   

Problem:
We chose to solve a Time Spaced Fixed Charge Network Flow (FCNF) Problem using Binary Particle Swarm Optimization (BPSO).  Each node in the network has a given excess or deficit of supplies.  The problem requires supplies to be transferred across the network to meet all demands at every node.  The solution is the path that solves all demand on network while minimizing cost (fixed and variable) occurred when crossing a given arc.


ProblemsNodesArcs
5n6p30245
10n21p2103,890
20n30p60023,000

The possible solutions for the smallest problem (5n6p) is equivalent to 2^245 (approximately 5.6 x 10^73).  As you can see the solution spaces are quite large and require a huge amount of processing to solve.  We ran 3 different test on each problem for 10 minutes and recorded the results.  
  • GLPK (Single Machine)
  • Serial - BPSO (Single Machine)
  • Distributed - BPSO (Distributed Network)
Results

The distributed network outperformed both the standard GLPK and Serial BPSO in all test.  The results for the 5n6p problem are summarized in the tables below.  Overall the distributed network found a better solutions (907k) and was also able to perform 4.3 times more calculations than the serial implementation (300K to 1.3M).

GLPKSerial BPSODistributed BPSO
Fitness (Lower is better)1.3M937K907K
Calculations (Higher is better)285K300K1.3M

Serial vs Distributed
Fitness37K Improvement
Calculations4.3x more solutions searched

Another interesting result is displayed in Figure 5 below.  This diagram shows the solutions found over time for the serial and distributed implementations of the BPSO algorithm.  The part we found interesting was that the Nexus 7 found the last 4 solutions (including the best solution overall) even though it was one of the weakest devices, hardware wise, in cluster.  This was another conformation on how using all computing devices together in a cluster (no matter how inferior of a device) is beneficial.

Figure 5.  Displaying 5n6p solutions over time and device that located solution

ViRIS: Virtual Reconfigurable Intelligent Systems

Prologue, Rise Of The Resistance:

In the 21st century the rapid evolution and adoption of science and technology created a new tech based economy which allowed for soaring profits in the private sector.  Technology corporations began to thrive and became the only financially viable entities.  Countries began to fail and were forced to sell land and resources to the corporations to stay solvent.  In 2087 countries as we know it were dissolved which made these corporations the new world powers. 

This style of government was a mixture of communism and capitalism.  A free world economy but controlled and directed by a few.  These large corporations were the tops in every major industry.  They formed “The Alliance”.  A corporate conglomeration that brought peace to the world, through the tight control over the worlds resources.  They controlled every job available in the civilized world.  Those who chose to live outside of their control were forced to live in a world with out technology, medicine, agriculture and manufacturing.  They lived an archaic life, in a futuristic world.

With the tight grip on the world’s resources, The Alliance was able to make huge strides in technology.  Advanced Artificial Intelligence, Quantum Computing and Communications, Nanoscale manufacturing and sub atomic particle manipulation all became possible in just a few short years after the Alliance was formed.  These technologies lead to the exploration of the known universe as well as the expansion of The Alliance beyond Earth.

The Alliance began to push ubiquitous/pervasive computing and ambient intelligence connecting everything and everyone to their technology, forming massive computer networks that allowed for collecting, sharing and dissemination of information.  This was meant to solidify The Alliance’s control of the world by increasing its dependence on technology, but this was the singular event that would lead to rise of the only group to battle the Alliance and its grip on humanity.

This group was the last hope, the last opportunity to break the tight grip the Alliance had on the world.  They became know as the Omega Resistance (OR).

No one knows who makes up the OR, not even its own members.  It is a network of computer savvy hackers that have infiltrated The Alliance core networks and used its push for pervasive computing against them. OR did not have the access to large server farms or the hardware backend that The Alliance maintained, and had no way of creating or purchasing such technology due to The Alliances control on the market.  But with the connection of all electronic devices to the Internet, dubbed the Internet of Things (IoT),  and The Alliances main network, they realized they didn’t need standalone systems, instead OR embedded small pieces of code over hundreds of thousands of computing devices.  At any given moment there were plenty of devices online that held small pieces of OR code, that allowed for them to develop a virtual cloud based distributed cluster network.  ORs network lived inside of The Alliances very own network, it was a virus that effected well over 50% of all connected machines. 
The premise of ORs network was built around Swarm technology developed years earlier.  Their idea was instead of a single complex machine/algorithm/application,  they combined 100’s or 1,000’s of small simple machines/algorithms/applications that could be combined to perform the task of a single complex system.  This allowed for built in redundancy, as if one node in the Swarm was to fail, there were plenty of others that could easily pick up the required task.  OR took every need of a complex network and broke it down to simple individual nodes.  These nodes were then distributed across The Alliances network undetected, as each node individually was harmless to the network.   Once distributed OR could then dynamically connect to infected machines and form a powerful computing network on the fly, and subsequently shut down the network if The Alliance got to close.

ViRIS, A New Beginning:

With this new network capability OR began to go on the offensive.  Building on the same techniques they had seen in previous cyber attacks, they devised an algorithm similar to a "fork bomb" that could infiltrate a machine and begin replicating itself.  This process would quickly overload the device and cause the system to shut down.  Upon rebooting ORs code would then control the device by disrupting its communication with the Alliance network and re-routing all traffic through its own system.  This then allowed OR to repurpose the device and use it to support its own needs.  This new algorithm became know at ViRIS: Virtual Reconfigurable Intelligent Systems.

This new guerrilla warfare tactic gave OR a chance, but the Alliance was quick to react.  The Alliance was unable to solve this "Zero Day Attack" OR used in accessing their devices, so the implemented a new protocol that erased all operational data, instructions and algorithms on a machine once it  lost connection to Alliance's network.  This gave OR access to raw hardware devices they had infected with ViRIS, but they were not capable of performing any complex or routine activities without further programming and instruction.

Now with a way to procure equipment and facilities, ORs new mission was to train these autonomous machines to do their bidding.  OR operatives began using there massive distributed computing infrastructure to create a simulation environment where they could experiment with various Machine Learning algorithms and techniques, such as Genetic Algorithms, Artificial Neural Networks, Particle Swarm Optimization, Computer Vision, etc.  Once they were able to train a machine to perform a task it was added to ViRIS and upon reboot this new algorithm was loaded into the machine.

The Resistance has to work quickly, as the Alliance is furiously trying to shut down the Zero Day vulnerability being exploited.  Once they lose this opportunity they may never again regain the ability to remotely take over Alliance equipment.


Gameplay:

ViRIS takes a new twist on Real Time Strategy (RTS) gaming.  Instead of micro controlling your units they are autonomous controlled units you have trained via various machine learning algorithms.

You have joined Omega Resistance and were given various goals to complete and must train your army of ViRIS infected devices to complete your mission.  You will create mock scenarios in the virtual sandbox area and train your robots for the various task required to successfully complete the mission assigned.  You can utilize one of the many algorithms currently loaded in ViRIS, create your own via the ViRIS interface,  or purchase them through the black market.  You can connect multiple computing devices (phones, tablets, laptops, etc) to increase the size of ORs distributed network and thereby train  your devices faster.

Once you feel training is complete and are ready for the task, you carry out your mission by infecting Alliances devices and watching how well your training preforms.  To keep your location safe, after launching ViRIS you sever all communication with the devices you infect.  You can now only watch and see if you have trained them well enough to accomplish the orders you were given.