Preamble
I really am staggered by the pace of change in technology. I first wrote this article in October 2013. This was shortly after Facebook's React Framework was first released. Thus it's impact had yet reached me. I first paid attention to it, with the introduction of David Nolen's Om wrapper. And it has quickly disrupted the Clojure UI landscape, replacing serious contenders, like Pedestal. Even before that, Evan Czaplicki wrote a Functional Reactive Programming language for the web - Elm. While I can't say that it will significantly alter the RIA landscape, I feel it introduces a significant paradigm shift in the approach and construction of RIAs. So publishing this article now, feels more documentary than anything else. The principles of Agile Development, Model-View-Controller, et al, still apply. But now, in July, 2014, the tooling has already shifted significantly forward.
Overview
I'm often asked what are the best tools and technology stacks for building a Web Application. For the purposes of this article, I'll focus on more advanced front-end representations - what are known as Rich Internet Applications (RIA(s)). I think it's useful to step back and consider the purpose and conceptions of Rich Internet Applications (RIA, synonymous with Single-Page Applications (SPA)). We can start by thinking back to when most applications were on the desktop. As the internet grew in popularity, Javascript was introduced into browsers (and Flash). Web pages grew in sophistication, as to begin to resemble full desktop apps. Now, we have things like Google docs, which are basically our old desktop applications extruded onto the web. I mention all of this in order to get us thinking about how we should be treating these new web apps. Ie, we should be treating these web apps like full applications. And with that, my opinion is that, to the highest degree possible, we should let our webapps do its own rendering, state changes, business logic, etc. It's a much cleaner design to i) pass raw HTML template chunks and ii) JSON data from RESTful services. The web app will have enough intelligence to take these, and generate a web view, UI functions, state transitions between the UI, etc. I advocate these principles to enforce a clean separation of concerns. They future-proof your app, and allow for easily scaling machine resources, or adding new functionality.
With this in mind, as an example, let's consider three MVC Web Frameworks - Ember, Angular, and Backbone. We'll take a semantic comparison between these libraries, and more, why a certain library would benefit us from a production, cost, time, or future planning standpoint. So you can properly judge my position, I'll state from the beginning, my opinion that Backbone is usually the best tool for a front-end MVC solution. My experience is that it optimizes i) developer time (ie. speed to market), ii) production efficiency (it's very lightweight), iii) scalability, and iv) future flexibility. To begin, I present a useful Client-side JS MV* Framework Roundup. It gives a nod to the TodoMVC project. TodoMVC implements a simple todo app in all the web MVC frameworks. It's meant to help you select the best one for your needs.
Like Rails, Ember is meant to be an opinionated framework, using common idioms. Views are handled via 2-way binding against rendered moustache templates. Angular is meant to be a way of declaring dynamic views in web-applications. It does this by letting you extend HTML vocabulary for your application. Angular also defines its own set of attributes and markup, which are processed by its JS library to provide browser-specific behaviour. Backbone is intended to be a lightweight and focused way of building single-page applications (or RIAs). It gives structure to web applications by providing models with key-value binding and custom events, collections, views with declarative event handling, etc. It connects it all to an existing API over a RESTful JSON interface.
With the above, I'll begin with my preference to eschew the moustache approach to templates, used by Ember. It tangles together the raw HTML template chunks, with transformation logic. And it unnecessarily forces web designers to know Javascript or some other logical transformation language, reducing developer efficiency. There are better, more declarative path-based solutions, like PureJS. Ember also implements rendering logic on the server. This tangles together application functions, reducing future flexibility and scalability. The tangling I described earlier is also why I eschew Angular.
Now, broadly listing a technology stack, will not address enough cases. Below, I'll outline three scenarios, or types of web applications, and an appropriate technology stack baseline. With each set of choices, I'll explain the tool and the rationale behind that choice. But I also want to step back again, and take a more holistic approach to my solutions. Before the Scenario Breakdown, I'll describe my approaches to i) Project Management, ii) thoughts on Pair Programming, and iii) a good approach for Testing and Test Automation.
Project Management
I think most software projects are good candidates for an Agile software development approach. Consider eXtreme Programming (XP) and Scrum, both Agile Methodologies. They are closely aligned, yet with subtle differences. XP uses strict priority order, and prescribes engineering practices (see here). I think it's appropriate to start with Scrum, then introduce elements of XP where needed (ie, Continuous Integration, TDD, etc).
- With regard to roles within a project, at the very least, most will need the i) Product Owner ii) Team iii) Scrum Master and iv) the Project Manager.
- Sprints of 2 weeks are a good starting point. This would include a Planning Meeting, where the i) tasks / Stories for the sprint are identified ii) Estimated and iii) Prioritized. Teams should also conclude each sprint with a review or Retrospective Meeting. This is where the progress is reviewed and lessons for the next sprint identified. And of course the software will be Delivered to and reviewed by the customer.
- I find Daily Scrums to be overkill for most projects, unless teams can strictly keep them to 5 minutes. However, it's good practice to do constant Backlog Refinement. That being the process of creating stories, decomposing stories into smaller ones, refining, prioritizing and sizing existing stories using effort / points.
- That leads to the next feature, adding a Points System to tasks. An abstract point system is used to discuss the difficulty of the story, without assigning actual hours.
- Product Backlog is an ordered list of "requirements" that is maintained for a product. It consists of features, bug fixes, non-functional requirements, etc. - whatever needs to be done in order to successfully deliver a working software system.
- Sprint Backlog is a subcomponent of the Product Backlog. It is the list of work the Development Team must address during the next sprint. The velocity previous sprints will guide the team when selecting stories/features for the new sprint.
- Increment is the sum of all the Product Backlog Items completed during a sprint and all previous sprints.
- Burn Down Chart is a publicly displayed chart showing remaining work in the sprint backlog. Updated routinely, it gives a simple view of the sprint progress.
A few other key project artifacts are itemized below. These are needed to maintain efficient management of developer hours:
- Spike - A time boxed period used to research a concept and/or create a simple prototype.
- Velocity - The total effort a team is capable of in a sprint. The number is derived by evaluating the story points completed from the last few sprint's stories/features.
- Tracking - Both these tools have excellent project management features: Pivotal Tracker and FogBugz.
Thoughts On Pair Programming
I believe software team cohesion, is closely tied to how productive and empowered all team members feel. So we discussed starting with a solid development methodology. This usually means an Agile software development approach. A next good step is pair programming. I like the rock-solid code that is usually produced with pair-programming. I find some of the effects of pairing, are i) each programmer is more thoughtful wrt how they are designing the system(s); and ii) both programmers usually a wider breadth of technical knowledge and experience between them. This is because the person coding is usually required to verbally explain and justify their technical decisions. iii) And fewer tangents are made, due to the constant support of an ever-present partner. Also, pairs can and should switch between coding and supporting. This allows rest for each team member, and usually means the active coder is more fully alert.
Full-time pair-programming is a good idea, if your team can afford it. However, it's sometimes necessary for a programmer to either i) quickly try out a solution or technology, to better understand the problem domain. Or ii) it's often necessary for someone to simply take time to think clearly about a problem (could involved reading books, blogs, etc). So in a full pairing engagement, time apart from coding could reasonably be managed by the pairs.
Testing and Test Automation Solutions
Of course the testing framework would depend on the language in which we choose to implement the system. There are several levels and approaches to testing that are appropriate in each scenario.
- Unit Tests (vs BDD) - Unit Tests addresses individual units of code. Alternatively, BDD, an outgrowth of TDD, focuses on the behavioural specification of software units
- Acceptance tests (vs Generative testing) - Acceptance tests address the end-to-end functioning of the system. This is in contrast to Integration tests, which only test several layers of the system (but not everything). Generative testing is a newer idea. It is one where the code itself generates test cases. We typically write code to generate test cases according to one or more assumptions you would like to test. This is a good approach for more complex systems; when we want to test unanticipated inputs, over a wide range.
- Simulation testing - Simulation testing, derived from disciplines such as engineering, disaster recovery, etc., is meant to be a rigorous, scalable, and reproducible approach to testing. Artifacts from each step (modelling, defining activity streams, execution, result capture, and validation) are captured in a time-aware database, so steps can be run (and re-run, and enhanced) independently of each other.
- Continuous Integration (or Automated build) - Continuous integration (CI) merges all developer working copies with a shared mainline several times a day. Its main aim is to prevent integration problems, upon delivery of the software.
So for example, consider a Ruby Rails versus a Clojure Compojure application. Generally, the pattern would be:
- Ruby - RSpec (BDD) > Cucumber (Acceptance tests) > CruiseControl (Continuous Integration). This is a well-understood and battle-tested collection of test tools. It gives great test coverage for the simple version of our webapp. Generative or Simulation testing is not warranted in a simpler web application scenario.
- Clojure - Speclj (BDD) > Test.generative (Generative tests) > Pallet (Continuous Integration). Speclj is a clean and straightforward approach to testing, while focusing on the behaviour of software units. Test.generative allows us to test the more general assumptions we have about the system. We then let the test tool generate potentially thousands of tests that validate our assumptions. This would be more appropriate than Acceptance tests, for a dynamic and streaming types of applications. And Pallet is a dev ops automation platform, with excellent integration with hudson/jenkins, and Clojure build tools. Simulation testing is probably not warranted if the application is more speculative in nature. Ie, users will often create and deploy new algorithms, quickly negating prescribed simulations.
Scenario Breakdown
Before selecting a toolset, it's very important to know a few things about the system
- What are the core function(s) ?
- What is the expected time-to-delivery ?
- Where the delivered application will live (incl. network reliability), and what DBs and with which services it must communicate ?
- What are its users, and how much load the application is expected to see ?
- Who will be maintaining the application upon delivery, and what are their skill-sets ?
Scenario A)
This is a Rich Internet Application (akin to Pixelthrone), solely as a web tool, communicating with 3rd party cloud services. It will be a responsive front-end that is capable on smart phones, tablets, and varied screen sizes.
- HAML / SCSS / Coffeescript / PureJS - Haml, Scss and Coffeescript compile down to html, css and javascript, respectively. They're higher level syntaxes that let developers write equivalent output code, in a much shorter amount of time. The added benefits greatly outweigh the added abstraction. PureJS is a lightweight templating tool, that eschews the moustache templating approach. My opinion is that the moustache approach, incorrectly tangles together document structure and logic in the same place. PureJS, instead uses path-like expressions for data locations.
- Backbone - Backbone has a focused and elegant approach to rendering choices. It also has a clean and lightweight approach to managing the internal state of the application (model and controller). And the RESTful server communication is also very consistent and well thought out. In short, these design advantages are what help optimize development and production costs, time, and future planning.
- Bootstrap - Bootstrap is an excellent front end framework with which many developers already have a strong knowledge level. However, there are advantages and disadvantages of this option, and some alternatives.
Advantages
- Every HTML element that could potentially be used is accounted for. Meaning even rare tags, like <dl> , will be elegantly styled and positioned.
- It lays a foundation for consistency that would take a good amount of time to achieve manually. Further, when a developer passes off the deliverable to the client, others will be able to 'extend' the original work without disturbing the general aesthetic.
- It's facility for rapid prototyping, and again, most team's familiarity, means it would be quick to use and efficient.
Disadvantages
- Suboptimal for creating a performance driven web app
- The framework can become too heavy, because so many things (html elements, etc) are included. It can be tough to quickly find what you're looking for. Additionally, troubleshooting unexpected margins and borders and whatnot can be difficult.
- It's not bespoke, or tending toward a higher quality brand. It is a generic solution that a lot of startups use.
- Customizing such a pervasive framework can be very tricky. Changing one thing might mean unintended effects on other elements.
Alternatives
- Foundation is a responsive front-end framework. It let's developers quickly prototype and build sites or apps that work on any kind of device.
- HTML5 Boilerplate is a professional front-end template for building adaptable web apps or sites. It does not impose a specific development framework, freeing the developer to manipulate the code to their needs.
Scenario B)
A basic, SQL-backed webapp; simple set of functions, moderate usage, and Junior Sys Admins maintaining.
- HAML / SCSS / Coffeescript / PureJS - Haml, Scss and Coffeescript compile down to html, css and javascript, respectively. They're higher level syntaxes that let developers write equivalent output code, in a much shorter amount of time. The added benefits greatly outweigh the added abstraction. PureJS is a lightweight templating tool, that eschews the moustache templating approach (see here). I'll reiterate my opinion that the moustache approach, incorrectly tangles together document structure and logic in the same place. PureJS, instead uses path-like expressions for data locations.
- Ruby / Rails / JSON data exchange - Ruby is an excellent dynamic, object-oriented language. It has language features (first-class-functions, simple syntax design, etc) that let programmers quickly build out capable, general-purpose solutions. Sinatra is good for simple webapps. However Rails gives i) better support for REST endpoints ii) more compatible libraries and iii) easier setup and migration of SQL database schemas and data. There's a good set of Rails / Sinatra tradeoffs here. And JSON is a well-known and supported data exchange format, especially for RESTful, AJAX calls
- not Sinatra - see Rails / Sinatra tradeoffs here
- PostgreSQL - The app data is rectangular and related. That makes SQL technology a good fit. schema and queries will be well-known before hand, meaning they won't require a lot of mutation after delivery. Postgres is a reliable, stable, and well-known RDBMS. It is open source, and has a license that's suitable for commercial purposes (see here).
Scenario C)
A complex, stateful UI, backed by several "big data" stores. Client wants to capture and analyse a constant stream of financial data. Researchers will take this data and need to create and deploy new algorithms and analytics on top of the data. This means real-time analytics, on a constant stream of data; high usage by very senior quantitative analysts and data scientists; maintained by Senior IT Personnel.
- HAML / SCSS - Haml and Scss compile down to html and css, respectively. They're higher level syntaxes that let developers write equivalent output code, in a much shorter amount of time. The added benefits greatly outweigh the added abstraction.
- Coffeescript / PureJS / RequireJS / BackboneJS - Coffeescript compiles down to Javascript. It provides greater expressive power over javascript, using less code. The added benefits greatly outweigh the added abstraction. PureJS is a lightweight templating tool, that eschews the moustache templating approach (see here). I'll reiterate my opinion that the moustache approach, incorrectly tangles together document structure and logic in the same place. PureJS, instead uses path-like expressions for data locations. RequireJS is a very good tool for building component systems necessary in a large, complex thick-client. BackboneJS is a lightweight, well-thought out MVC tool for managing in-browser app state.
- ** almost Clojurescript, Enfocus, Functional Reactive Programming - These technologies would be a much better fit than the abouve, for the kind of real-time sensitive interactions in the app. Clojurescript especially, is ideal for computationally intensive, interactive applications (see here). However maintenance ability and cost is high. I would recommend this over RequireJS and BackboneJS if you have very good IT specialists as maintainers. Enfocus is a templating tool for Clojurescript. Like PureJS, it uses path-like expressions for data locations. Functional Reactive Programming (FRP) is an approach that uses Functional programming techniques to operate on data structures over time. Ideally, we'll want an FRP library that lets us more cleanly transform, compose, and query streams of data (mouse moves, stock streams, etc).
- not Websockets - There's more standard HTTP Server Sent Events (EventSource API)
- Clojure / Pedestal (for SSE support) / Storm / JSON data exchange - Clojure provides a number of language features (first-class functions, homoiconic, immutable data, etc) that make it ideal for building complex, data intensive apps. Pedestal is a tool set for building web applications in Clojure. For this app, it has a number of useful features, such as built in SSE support. Storm is a distributed realtime computation system. It provides a set of utilities for doing realtime computation. I chose it over Hadoop, as Storm is used for real time processing while Hadoop is used for batch processing. JSON is a well-known and supported data exchange format, especially for RESTful, AJAX calls.
- ** almost EDN - would be a better data exchange format than JSON. This data format is extensible, has rich objects, and is serializable. But it's new, not in wide enough use, and not enough people understand it. I would only recommend this format if the client has very Senior maintainers.
- Datomic - I think Datomic is ideal as it i) decouples DB functions such as read & write (see here). It also ii) has a flexible schema model, allowing for changes to data structures, as users learn more about the domain. It also iii) has a sound data model based on time and immutability (more faithfully representing data over time) and iv) a logic-based query language (focus on facts). The downside is the specialized knowledge needed to maintain and query the database. But the advantages, and simplicity of the query language, mitigate those tradeoffs. All these other databases are close considerations. But they don't fit the bill due to their specialized nature. Whereas Datomic covers more ground, in terms of leveraging the data. You can see some DB tradeoffs here.
◦ not Cassandra - Our app will write to DB, more than it reads. And this is Cassandra's main advantage. Most reads will come from big data stream services (via Storm).
◦ not Redis - Good for rapidly changing data sets (but not that much will be needed); but it works best when those data sets all must fit into memory
◦ not Neo4j - This is good for graph-style, rich or complex, interconnected data.
◦ not Couchbase - Good for low-latency and high availability
◦ not VoltDB - Good for reacting fast on large amounts of data
Further
These technologies are simply a good baseline when considering building out a Rich Internet Application. There are other options, as with the rising popularity of Javascript on all devices. There's NodeJS on the server, PhoneGap native apps on the mobile device. Tools like Node-Webkit also allow you to create desktop applications with Javascript. And with Tessel, we can even use it on our micro-controllers (ie Arduino).
Beyond tools simply, teams should consider the kinds of language features and architectures appropriate for their needs. Features such as immutable data structures or first class functions (ie closures) offer a lot of benefits and usually be added in as a library or 3rd-party solution. Beyond that, even, techniques like Combinators and Functional Reactive Programming offer better control, albeit with increased abstraction. You can think creatively. I, personally, prefer tools that offer the greatest amount of expressive power. And at the same time optimize my i) developer time (ie. speed to market), ii) production efficiency (it's very lightweight), iii) scalability, and iv) future flexibility.
UPDATE (DEC 13, 2020)
I've had a few pieces of feedback on this post. One of them from Nick Schaferhoff, a web developer that has put together a useful tool - the HTML5 Periodic Table.
Being that "humans don't know how to compute" (1, 2), my position is that nature is the best thing to emulate. So right off the bat, I liked the analogy with chemistry. And the organizing principle (analogous to atomic weight) seems to follow the elements along the W3C's spec.
Hope this helps.