Monday, May 5, 2014

Dynamic Distributed Processing with HTML5

One of the core technologies we are working on at Game Theory Labs is the ability to create dynamic processing clusters using browsers on any computing device.  The general idea is to apply crowd sourcing to CPU intensive problem solving.  Every browser that connects to the cluster becomes a processing node that shares a portion of the computational load.  Creating this platform on top of HTML5/JavaScript, any device with a browser (phone, tablet, laptop, etc.) can be added to the cluster just by visiting a webpage.

At the end of last year we had the privilege of being invited to speak at the INFORMS Annual Meeting to discuss the use of our HTML5 Dynamically Distributed Parallel Processing infrastructure in solving a Fixed Charge Network Flow (FCNF) problem using Binary Particle Swarm Optimization (BPSO).  You can check out the slide deck here.  In this post we want to discuss the processing infrastructure in more detail and show how HTML5 opens the doors for solving hugely complex problems with everyday devices.


Figure 1.  Layout for sample cluster network

The architecture is comprised of three parts: Control Node (CN), Processing Node (PN) and a Particle.

The CN is the server and is used for signaling and synchronizing the cluster.  Each device that connects to the network is added as a PN where it registers its device type and Operating System.  The PN then spins off worker threads called Particles.  The Particles are where all data processing is preformed.  They are individual threads that are synchronized and controlled directly by the PN.   The CN will notify each PN how many Particles to create based upon user defined settings for each device type.  For example a phone may have one Particle running while a laptop could have 20.  The more computing power a device has, the more Particles it can run in parallel.  Figure 1 gives a layout for a small example network.

Figure 2.  Control Panel for Control Node for a sample cluster network
Control Node (CN)

The CN is built using NodeJS.  Since the CN acts as a signaling and synching device where no processing occurs, we need a server architecture that had a low overhead for a large number of IO connected devices.  NodeJS is built around a non-blocking single threaded event loop.  This works well to keep a very low overhead for connected devices and its non blocking event driven architecture allows for a massive amounts of connection with a minimal memory footprint.  eBay reported being able to host 120,000 simultaneous connections on a single server with each connection taking approximately 2KB of memory.  Not only is the overhead of NodeJS device connections small, it can be easily scaled horizontally giving a truly elastic backend.

Upon connecting to the CN via a WebSocket each devices registers its type (Desktop, Laptop, Tablet or Phone) and OS (iOS, Android, Windows, OSX, Linux).  The CN will instruct each connected device to create the appropriate number of Particles based upon the user defined parameters.  For example, Figure 2 shows the following network characteristics:

Device TypeConnectedParticles/Device

Control Node Example Execution

User connects to the CN control panel (Figure 2) and configures the cluster network.  At a minimum the user must define the number of Particles per device type.  Caution should be given on creating too many Particles for devices with lower computing capability.  For example, given a phone with 8 Particles on a large problem will mostly likely cause the application to crash on that device and thereby remove it from the cluster network.  There is no rule on how many Particles per device that should be created, it fully depends on the problem size (memory required per Particle) and device processing capabilities.

Once the CN is configured it is ready to accept connections from PNs.  As each PN connects the CN control panel will update to show the total number of active devices currently connected to the cluster.

Set Problem
After the desired number of PN have connected to the cluster network the user can send the desired problem set to each device.  In the current implementation, the problem is stored online and is accessed via a URL.  The current cluster is set to solve Linear Programming (LP) problems using Henry Gourvest JavaScript port of the Open Source Linear Programming Kit (GLPK).  Currently the problems must be written in MathProg which is a subset of AMPL.

Now that the problem has been loaded on all PNs the user can start the processing.  When the user selects the 'Start' button on the control panel, the CN will issue a start command to each connected PN and begin listening for updates.

Fitness Updates
Since each PN operates independently, the only way for it to receive a better solution from another PN is by synchronizing with the CN.  Once a PN has found a solution that is better than any existing solution from the current Particles it controls it notifies the CN. The CN saves the best fitness score per PN and if the new solution is a global best it then passes that solution to all connected PN.  This allows the CN to know the best solution that currently exist on the cluster network and which PN found the solution as well as allowing each PN to take advantage of this new information and help guide its heuristics.  The control panel will display the current best solution fitness found as well as the time at which that solution was obtained.

Once the user is ready to stop processing they will press the 'Stop' button on the control panel.  This will issue a stop command to all connected PN.  At this point the CN can be queried to obtain a JSON object for all of the statistics about the current problem session.  The data set includes: Total elapsed time, each solution located, time solution was located, and PN which found solution.

At this  point the process can start over and solve a new problem.

Processing Node (PN)
Figure 3.  Processing Node control panel.
The PN exist on a client device (phone, tablet, laptop, etc) and acts as the communication bridge between the Particles doing the processing and the CN.  The PN therefore has two event loops: one for the CN and one for the Particles.

PN to CN Execution:
Upon loading the PN connects to the CN and registers its device type and OS.  Based upon this registration information the CN will respond with the number of Particles this PN should create for the current problem setup.  Once the user sets the problem on the CN, this information is passed to the PN where it grabs the associated problem file via an AJAX call.  The PN then parses the problem file and loads the appropriate data into each of the Particles created.  Once the problem is loaded the PN has three events it listens for from the CN:

  • Start:  PN receives a 'Start' event, it tells all loaded Particles to update and sets the local status to 'PROCESSING'
  • Fitness Update:  When the CN receives a new global best fitness, it sends that solution to all connected PNs.  The PN then distributed that data to each Particle so it can be utilized in the next processing cycle.
  • Stop: Sets local status to 'IDLE' which prevents further Update calls to Particles once they complete current processing cycle.

PN to Particle Execution:
Once the PN registers it creates all of the Particles requested for the given problem and then passes the problem data to each Particle.  After initialization of the Particle, the PN starts processing on a given Particle by issuing the "Update" message.  The PN then listens for the Particles 'Complete' event.  Once this event is fired, the PN checks the solution returned by the Particle.  If the solution is better than any other solution found by all Particles located on the PN (local) then it updates all local Particles with a new solution.  If the solution is better than any solution found on the entire network (global) then it sends solution to CN to update all PNs on the cluster.  The PN will then issue another 'Update' command to the Particle to force another calculation cycle, unless CN has issued a 'Stop' command which has put the local status to 'IDLE'

Each Particle on a PN is an independent thread.  They are each created as a WebWorker (you can read more about WebWorkers in this presentation Bringing The Sexy Back to WebWorkers) and execute in a completely independent JavaScript environment.  Each Particle can only communicate with the PN that spawned it via JSON messages.  The Particle is setup with the following default events, but more can be added depending on application.

  • Initialize:  The PN calls initialize once the problem has been defined and passes this information to the Particle so it can prepare for execution
  • Update:  When this event is triggered the Particle will perform a single execution on the problem and pass back the result to the PN for evaluation.
  • Fitness:  When the PN is updated with a new fitness (locally or globally) this event is triggered which allows the Particle to update heuristic information for the next execution cycle.
  • Solution:  This event will return the best solution located on the Particle.  This is only used after computation has stopped and data is being collected for analysis by the user.

Figure 4. The Dynamically Distributed Cluster Network

We took the implementation described above and created a small cluster network (Figure 4) that included the following devices: PC Desktop, PC Laptop, SurfacePro, MacBook Pro, iPad, iPhone, Nexus 7.  All Windows and Android devices used Chrome while iOS/OSX devices used Safari.   

We chose to solve a Time Spaced Fixed Charge Network Flow (FCNF) Problem using Binary Particle Swarm Optimization (BPSO).  Each node in the network has a given excess or deficit of supplies.  The problem requires supplies to be transferred across the network to meet all demands at every node.  The solution is the path that solves all demand on network while minimizing cost (fixed and variable) occurred when crossing a given arc.


The possible solutions for the smallest problem (5n6p) is equivalent to 2^245 (approximately 5.6 x 10^73).  As you can see the solution spaces are quite large and require a huge amount of processing to solve.  We ran 3 different test on each problem for 10 minutes and recorded the results.  
  • GLPK (Single Machine)
  • Serial - BPSO (Single Machine)
  • Distributed - BPSO (Distributed Network)

The distributed network outperformed both the standard GLPK and Serial BPSO in all test.  The results for the 5n6p problem are summarized in the tables below.  Overall the distributed network found a better solutions (907k) and was also able to perform 4.3 times more calculations than the serial implementation (300K to 1.3M).

GLPKSerial BPSODistributed BPSO
Fitness (Lower is better)1.3M937K907K
Calculations (Higher is better)285K300K1.3M

Serial vs Distributed
Fitness37K Improvement
Calculations4.3x more solutions searched

Another interesting result is displayed in Figure 5 below.  This diagram shows the solutions found over time for the serial and distributed implementations of the BPSO algorithm.  The part we found interesting was that the Nexus 7 found the last 4 solutions (including the best solution overall) even though it was one of the weakest devices, hardware wise, in cluster.  This was another conformation on how using all computing devices together in a cluster (no matter how inferior of a device) is beneficial.

Figure 5.  Displaying 5n6p solutions over time and device that located solution

No comments:

Post a Comment