Thursday, June 13, 2013

Taking our #WebGL #HTML5 App Native

When the Web Fails..

Sometimes Web Apps (or for the marketing person.... software in the cloud) just doesn't meet the requirements for the job.  We have been looking for an open source solution that combined the Chromium browser into native binaries for OSX and Windows where we can package and distribute our WebApps as native desktop applications.  This allows us to ship the same software to all of our clients and provide both offline and online versions of the software.

Our first try was AppJS...
We were very hopeful that this platform would allow us to basically drop our web app into the native application and ship to any of our customers who wanted an offline version of our software.  The issue came when trying to deal with large complex systems.  We have been working on an Online Operating Systems called JaHOVA OS, unfortunately getting AppJS to load up JaHOVA OS became quite cumbersome.

Several of our applications are graphics intensive and utilize WebGL for rendering.  The application we were trying to port at the time was a 3D cancer visualization tool called CaPTIVE.  After tweaking CaPTIVE for about an hour to try and get any response from AppJS we began looking to see if others had a similar experience.  We saw that Thibaut Despoulain from Artillery Games had made a few post on the forums, and with his successful launch of HexGL (an HTML5/WebGL futuristic themed racing game), we thought it might we worth asking if he was successful.

From his response  we figured it was time to move on and try to find another solution

Enter Node-WebKit FTW!

"node-webkit is an app runtime based on Chromium and node.js. You can write native apps in HTML and Javascript with node-webkit. It also lets you to call Node.js modules directly from DOM and enables a new way of writing native applications with all Web technologies.  It's created and developed in Intel Open Source Technology Center"

Zhao Cheng has a slide deck that goes through some of the basics of node-webkit, but the wiki on the github site is very useful.

To test out node-webkit we decide to port over our alpha build of Omega Resistance (OR), an HTML5/WebGL Couch Co-Op Space Arcade Shooter with Gamepads. There were a few hitches along the way, but easily overcome.






         Quirks..

XHR
In OR all of the shaders and models are downloaded via AJAX calls.  Since the files are being fetched from local file store rather than a server, the response status codes are different than that of a standard server.  We test for response status code equal too 200 but with node-webkit the response status code is 0.

Shared WebWorkers
Although Shared WebWorkers seem to be available, however in our porting process the threaded loading system did not function properly.  The reason we added this system was to prevent the page from going non-responsive during asset loading.  Since all of the assets are now loaded from a local file store, the need for this system was no longer required.  So we decided to ditch it to see if we could get something running.

This problem does cause concern though, as we are currently working on building GEn3CIS: An HTML5 Based 3D Engine for Gaming and Simulations.  GEn3CIS is highly dependent upon the ability to multithread our subsystems, so for GTL to fully adopt node-webkit, this will be an issue we will have to solve.

Once we worked through these few quirks, we successfully got Omega Resistance up and running in both OSX and Windows based native executables with the same frame-rates as running in a browser.

Final Thoughts

File Size
One thing to be aware of with node-webkit is the final file size of the executable will be around 50-60Mb.  Since the executable has a complete instance of WebKit + NodeJS + WebApp the overall file size may end up being larger than you expect.  This was not an issue for us, but it is something to be aware of if small file size is important

Enigma Virtual Box
Virtual Box allows you to package the node-webkit executable with all other needed DLLs and dependent files into a single binary for Windows.  This is defiantly the approach we wanted to take with our project, but on a clean install in Windows XP the Virtual Box version of the app could not initialize WebGL, but running the application outside of the Virtual Box had no issues.  We tested the app on our Windows 7 VM and no issues.  Since the app was able to run on XP, there probably is a way around the issue of the app not initializing WebGL canvas when run inside of Virtual Box.

Chrome Developer Tools
Node-webkit allows access to the developer tools console, which is a great help when trying to debug your code.  node-webkit also allows the developer to limit the access scope of the developer tools.  You can pull up the console inside of the demos below by doing the following

  • Press CTRL and  ~ to open console
  • Enter "tools" at the prompt


Lets See the Demo...

Check them out yourself, tell us on Twitter how they worked...@GameTheoryLabs or @CoreyClarkPhD

OSX: Download
Windows (Enigma Virtual Box): Download
Windows: Download
Web: Play It On The Web (Chrome)

Presentation From Dallas HTML5 Meetup

Wednesday, June 5, 2013

GEn3CIS: An HTML5 Based 3D Engine for Gaming and Simulations: Part 2

GEn3CIS
Game Engine for 3D Complex Interactive Simulations: Part 2

For a background on the motivations that brought GEn3CIS, check out Part 1 of this series...

We considered various different architectures for GEn3CIS, each has its own strengths and weaknesses.  Our goal in this part of the series is to take you through our design decisions and logic and layout the base architecture for GEn3CIS.  In the context of this article a sub-system will represent any engine component such as Graphics, Physics, Artificial Intelligence, Networking, Input, Sound, etc. Also since the focus of the article is building a multi-threaded systemin in JavaScript, the term thread and WebWorker are synonymous.


Option 1: 
Run each Sub System in a thread

This is current implementation used in JaHOVA OS, so if you read Part 1 of the series then you know this will not produce the results we are looking for, but it is still worth discussing.

On the surface this seems like the easiest architecture to implement   While getting everything up and running is a bit on the trivial side, getting each sub-system to play nice with each other is very different story.  

Sub-System Setup

The major advantage of this architecture is the sub-systems do not require any re-work.  You can implement the same serial execute sub-systems and drop into a multi-threaded environment   This is the biggest draw to using this layout.  But as you already probably know, if you simplify one thing, then you are probably complicating another.

Memory Syncing

One of the main drawbacks to this design is keeping the memory shared between each sub-system synced.  While implementing this system in a language such as C++ causes a dramatic increase in complexity, JavaScript limits the access to memory between individual threads (WebWorkers).  Each WebWorker has its own memory space and is completely isolated from any other process.  This prevents having to implement any sort of Critical Sections or Mutex control.  The overall accessibility and communication between WebWorkers actually mimics that of Erlang programming language.  Henning Diedrich of Eonblast gave a great talk at GDC Online in Austin last year called, "Why ... Erlang".  There is over a 100 slides in his presentation, and can you believe he went through all of them in an hour, crazy!

Due to the way WebWorkers communicated, copies of world/entity data had to be sent to each sub-system.  The sub-system could therefore make any changes/updates to the dataset and then return it back to the main application to by synced to the "master copy".  The issues comes when two sub-systems want to modify the same data set, which has priority?  In small systems, this is not a huge issues, but as the complexity of the scene increases keeping everything synced can become an issue.

One way around this is to only allow specific subsystem to update specific datasets.  This prevents the syncing conflict issue as no two sub-systems can update the same piece of data.  This was the approach taken with JaHOVA OS.

Expandability

The second issues this implementation brings is expandability.  As of this moment 4 and 8 core machines are becoming common, but 12, 16 and 32 core systems are on the horizon.

Wouldn't it be great if software actually utilized all of the power afforded it by the hardware?  

With this implementation you have no expandability, if you have 4 subsystems, you use 4 threads.  It doesn't matter if the hardware only has 2 cores, you will run 4 threads.  Conversely, if the system has 8 cores, you still only use 4... I think we can do better.


Option 2: 
Run each sub-system serially, but thread the subsystem

Sub-System Setup

As you can probably guess, this is one of the biggest downfalls to this implementation   It requires you to rewrite any serial driven sub-system into an equivalent parallel processed system, not a trivial task.

Memory Syncing

The greatest advantage to this implementation is the removal of memory syncing issues.  Since each sub-system is executed serially, no two sub-systems will try to update a shared resource at the same time.  This completely removes any need for Critical Sections or Mutex controls in  your code.  A major Win!

Expandability

Since each sub-system can generate as many threads as required for execution, this design should allow for expandability to utilize higher core systems efficiently.

Can We Do Better?

While Option 2 shows great promise and is actually a common architecture used in production for other languages, it still left a bit to be desired for what I was looking for.

Sub-System Has Control

One of the main issues I have with Option 2 is that all control of optimization exist in the Sub-System.  I suppose this is not an issue if you only plan on using your own sub-systems, but if you plan to allow extensibility for others to add 3rd party sub-systems you are giving quite a bit of control over to that 3rd party system.  I would rather the control rest with the main engine (GEn3CIS) and the sub-systems request execution on the engine.

Idle Time

The serial execution allowed for all resource syncing issues to be removed, but is it overkill?  If one sub-system is completely done executing on a portion of a dataset, why should a second sub-system have to wait for a non related/blocking process to complete before it began execution?  I would prefer to see an architecture that could be simplified to Option 2, but allowed for expandability to allow non-blocked processes to begin execution.

Option 3:
Parallel executed sub-systems via Thread Controller with Thread Pool

This layout requires that each sub-system be designed for parallel execution, but also requires a more Functional Programming approach than what is normally seen with regular Object Oriented (OO) design.  The idea of moving away from OO code design to Functional Code design has its controversies ... here is a good read from John Carmack Id Software on Functional Programming in C++.  Each sub-system must be able to create functional code blocks that can be executed over a dataset passed into the code block.  The functional code blocks are passed to thread controller.  The Thread Controller can then organize and execute code blocks in any available thread inside of the Thread Pool.  This moves the overall engine design to Data Oriented vs Object Oriented.  Niklas Frykholm from BitSquid has an interesting presentation, "Practical Examples in Data Oriented Design" showing some of the advantages for Data Oriented Engine design, although I dont think they will translate over to a JavaScript based system... but maybe ASM.js will give some performance boost.

In its simplest form, the Thread Controller can collect the request from each sub-system and execute them one system at a time, which gives the same functional execution as Option 2.  But now that other systems have already queued request to Thread Controller, it now has the ability to begin executing request from waiting sub-systems if they are no longer blocked based upon memory access.  This gives the control back to the engine (GEn3CIS) while also limiting idle time.

This design also allows for complete control over thread creation and therefore can make sure that every sub-system takes complete advantage of all cores available on the given hardware platform.  Intel created a paper on "Programming a Parallel Game Engine" that shows has some similarity to the design shown here as well.

So with out further ado, I present the proposed GEn3CIS Architecture...



Few Concerns...

I have just gone off the Multithreading Deep End in JavaScript... can Web Workers Really handle this?
How many WebWorkers can actually be run at once?
Is this just going to crash browsers?

At this point I feel these are all very valid questions.  I have done some background performance testing which as mentioned in Part 1 was presented at various conferences, check the presentations out if you want to dive deeper, but I have a recap below trying to answers some of these concerns.


Have I just gone off the Multi-threading Deep End in JavaScript?

Ya, probably...

Can WebWorkers Really Handle This?

While that has yet to be seen, I can say that after doing quite a bit of testing the overhead caused by using WebWorkers is quite low.  Data/Message transmission latency is quite low (<1ms)


How many WebWorkers can actually be run at once?


While I dont have a specific answer to this question, I did try creating 10 threads on the fly and using each one to control a ball on the screen.  In this demo all of the physics associated with each balls movement is actually be calculated in a thread and sending back its updated position to the main application   The main application then updates the ball position (and shadow) on the screen. You can actually run the demo live here.


Is this just going to crash browsers?
That is a very real possibility...


So we have shown the initial architectural design of how GEn3CIS is going to look, but how is it going to work?
That is Part 3 of the series...







GEn3CIS: An HTML5 Based 3D Engine for Gaming and Simulations: Part 1

A Little Background

Multi-threading in JavaScript is a topic that is near and dear to our heart.  But a topic that does not seem to get much attention or love on the web.  I have tried "Bringing The Sexy Back To WebWorkers" by presenting at conferences ranging from Game Developers Conferences (GDC) in Austin, San Francisco and Shanghai, China to HTML5 specific conferences such as HTML5 Developers Conference, Dallas Meetups and Ft Worth Code Camp.  My goal was to show the power of Multi-threading in JavaScript and the need to transition our serial code execution to parallel processed execution to start taking advantage of the multicore hardware being used today.  This lead me to start developing a platform that would allow me to demonstrate just how powerful this technology could be,  enter JaHOVA OS.

Most of the slides from past presentations can be accessed from here.

JaHOVA OS
JavaScript HTML5 Online Virtual Application Operating System

JaHOVA OS is an open-source online platform that was designed from the ground up to be flexible and modular.  The goal was to have on demand multithreading capability.  The OS allowed for any method to be serialized and sent over to a WebWorker so it could be executed in parallel of the main application   You could also load external scripts and connect up to the main OS via a "Thread Controller" (more on this later).  To demonstrate this functionality I built a 2 player Networked 3D Game (really just a tech demo) utilizing WebGL, WebSockets,  WebWorkers and Node.js called Omega Resistance v0.1 


The goal was to maximize my frame rate (Frames Per Second: FPS) while handeling Graphics, Physics  Artificial Intelligence and Networking.  Keeping FPS above 30 is crucial for real-time graphics.

The results were great!

As you can see from the chart, when using the multithreaded environment my FPS never fell below 35FPS, but without the browser was not able to keep up with over 55% of all frames rendering below the required 30FPS.  The full writeup of the results can accessed here.

With results that good, I figured lets push my design and just see how far we can go.  Sadly, not much further.


JaHOVA OS was built to allow single or multi-threaded operation for basic web applications to complex distributed applications severed from the cloud.  So once I started adding more complex code to the mix (i.e. Higher Graphics Demands, Full 3D Rigid Body Physics and Collision Detection/Resolution, Higher Level Artificial Intelligence, Gamepads, etc), the FPS dropped.  Version 0.2 of Omega Resistance fell below 30 FPS on lower end machines which hindered gameplay.  The video to the right shows off the upgrades made to Omega Resistance and is online and playable at or.gametheorylabs.com.

After some profiling and debugging the problem became apparent.  JaHOVA OS could not meet the needs for a high performance real-time application such as Omega Resistance, due to its design to be both single and multithreaded.  I need to build a custom platform that was designed from the group up to handle real-time 3D graphics as well as run complex optimization and simulations applications.


From this problem, GEn3CIS was born.

Part 2 of this series will go into various multi-threaded architectures and specifically show the proposed architecture for GEn3CIS.


All code demos can be found in code repos located at git.gametheorylabs.com


Tuesday, April 30, 2013

Upcoming Talks

Currently Game Theory Labs is working on two separate talks showing off some of its latest HTML5 based technology under development for The Institute for Operations Research and Management Sciences (INFORMS) Annual Meeting.

Emergency Department Simulation to Predict System Impact during Hospital Construction and Remodeling


Baylor Medical Center is undergoing significant and costly facility remodeling and the construction will impact many areas over several months. These changes will likely affect the capacity and efficiency of the Emergency Department (ED). The purpose of this presentation is to discuss how simulation modeling is being used to evaluate these potential impacts and provide the Hospital administration with credible, data-based decision support throughout the process.

The WebGL based 3D environment for the ED Express Care Unit can be accessed via the following link.


Distributed Parallel Process Particle Swarm Optimization on Fixed Charge Network Flow Problems

We are developing a parallel process particle swarm optimization (PSO) on an HTML5 based dynamically distributed system and assess its performance as applied to the multicommodity fixed charge (MCFC) network flow problem. The MCFC problem is motivated by a real-world cash management problem faced by large national banks and is NP-hard. We compare the performance of a serial and distributed parallel process PSO implementation and empirically evaluate the optimality gap for multiple instances.

We are currently in the process of converting JaHOVA OS into a high performance multithreaded game and simulation engine (GEn3CIS).  One feature of GEn3CIS is its ability to distribute processing across any internet enabled device with a modern browser.  Essentially this allows a user to take their phone, tablet, PC/Mac, etc and utilize there combined computing power to solve any complex simulation, learning, and/or optimization problem.