Programming for PaaS

Chapter 3. Types of PaaS

In the previous chapters we briefly discussed the concept of portability, which lets you move applications and deploy them on different systems. While portability is in many cases an attractive feature, there are trade-offs to both portable and non-portable PaaS.

Non-Portable: Following a Template

With a non-portable PaaS you build an application by writing code around the unique specifications and APIs of that platform.

This means that the structure of your code needs to adhere very strictly to a certain template or API. The APIs might be centered on the service’s databases, storage mechanisms, or search mechanisms. Other times, the APIs are lower level and code related. Sometimes you must even use specialized languages that are built only for that platform.

As you can see, there can be various types of hooks into a platform that make it non-portable. The earliest forms of Platform-as-a-Service were built around these highly structured ideas. They were the underpinnings of the early experiments that turned into what we now know as Platform-as-a-Service.

But questions quickly arose. Why should you write your code around a proprietary API? Are the benefits and access to the data worth the lack of flexibility? Before we examine how a new generation of companies answered those questions, let’s take a look at some of the major players in the non-portable PaaS category.

Force.com

Launched in 2008, Force.com, Salesforce’s developer platform, allows you to build applications that extend the functionality of Salesforce, a very popular SaaS customer relationship management (CRM) sales tool. This was one of the first incarnations of PaaS as we know it today. Force.com inspired a generation of applications, and thousands of developers built new ways to access and analyze Salesforce’s rich dataset.

The Force.com Platform-as-a-Service provides Web Service APIs and toolkits, a user interface for building apps, a database for persistent data, and basic website hosting functionality. More recently, it has added mobile capabilities that make it easy to create mobile applications using the Force.com PaaS as well.

The downside to using Force.com is that you cannot build general-purpose applications using any programming language you want. Application logic is built and run using Apex, a strongly typed, object-oriented, Java-like language that runs on Force.com. The upside is that getting started creating new applications is simple and fast using the web-based interface. You choose the data types you want to collect, creating data structures that can autogenerate web and mobile interfaces for collecting that data.

On top of data input options, you can also add formulas, approval processes, email sending, and tests very quickly and easily. The Force.com PaaS even has its own Force.com IDE for developing, testing, and deploying applications quickly and easily.

When you work within the constraints of the Force.com platform, you do not have to worry about scaling or managing your application; Salesforce does that for you. This idea is the foundation from which PaaS has gained popularity.

As the Force.com platform has matured, more and more services have been added: one example is Database.com, a database service with custom APIs and data feeds for building applications used by around 100,000 businesses worldwide.

Google App Engine

Google App Engine (GAE), also launched in 2008, was one of the very earliest forms of PaaS. Its promise was that you could tap into the vast power of Google, draw on Google’s computing resources, and use Google’s infrastructure and expertise in running and operating machines. The caveat was that your application would have to adhere to Google’s standards. Google has built not only an operations team, but also a set of tools and systems that work in a specific way—and they work that way in order to scale to “Google scale.”

What kind of scale are we talking about? A very large one in which you are dealing with many, many thousands of machines all working together to solve a single problem. When you are dealing with a scale as large as Google’s, the tools are very prescriptive; they need to be in order to process volumes of data that are so immense. Google literally processes the entire Internet. One of the central ideas of GAE is that if you adhere to its standards, you too could have that power inside your application, and only pay for the processing power that you use.

Here it becomes evident why GAE is non-portable. It has an existing infrastructure that you are allowed to tap into only if you play by Google’s set of rules. On the one hand, you have to write your code around Google’s expectations. On the other hand, if you do so, you gain the benefits of being able to run at Google scale.

Many PaaS platforms have limitations of various types. When you are dealing with a non-portable PaaS such as GAE, those limitations can be strict. You have to adhere very closely to them in order to take advantage of GAE’s features.

With GAE, there are limitations on access to the filesystem, on access to memory, on the amount of memory you can tap into, and on how long your processes can run.

These limitations can force a developer to think in different ways. For example, let’s suppose you have a website that needs to compile a large list of information and compute data for each item on that list. In a traditional hosting environment, when a user makes a request to such a site, delivering results can take a significant amount of time. All of the processing must happen before the response goes back to the user. And, depending on the complexity of this processing, it could easily take seconds.

This is where we encounter another of the limitations within GAE: there is a set amount of time within which your application must respond. If it doesn’t respond fast enough, GAE will kill the process.

But this can actually be a positive factor in the development of the user experience if the application is designed to know that it cannot live that long. It forces the developer to ask himself, “Instead of simply doing it the way I always have done it, how does Google want me to do it?” So, instead of compiling the list every time a user hits the website, you could create a cache and serve up the list from the cache. The cache could serve very quickly and provide a better user experience. In order to compile that cache in the background, you would have to use a different set of Google tools to do those calculations in a more scalable fashion, and then put those into the caching database.

GAE has a large following and a large developer mindshare behind it. Its promise—taking advantage of the power of Google—has helped make it a leader in the non-portable PaaS category.

Windows Azure

Microsoft also thought very hard about how to build a Platform-as-a-Service. Its expertise with the .NET Framework led the company to consider the best way to accomplish this around .NET. In 2008, Microsoft launched Windows Azure.

The company set out to create a set of libraries. These were designed in such a way that if a developer were to incorporate them into her system, Azure would take advantage of those libraries in order to scale the system. Microsoft also provides standard services that can scale independently.

With Windows Azure, you have basic systems, like a message bus and queuing systems, and a variety of different options based on the specific needs of your application. These provide patterns for developers to build distributed applications that can interact with each other over networks. If you incorporate through the libraries the technologies and services that Microsoft has built, you can take advantage of the Azure system and be able to scale your application fairly quickly and easily.

Recently, Microsoft has taken steps to move away from a non-portable Azure system, decoupling some of the requirements that tie developers into required services. This has allowed expansion into different languages and technologies, taking Azure from a non-portable PaaS more into the portable realm, which requires no changes to the code in order for it to run. So, Azure is actually a system that started out as non-portable and has been moving slowly toward portability. In fact, Microsoft recently released a very portable version of its Platform-as-a-Service.

Non-Portable Conclusion

Although all of these PaaS options started out as non-portable, many of them are adding functionality that makes them more and more portable every day. GAE has released PHP support that requires fewer and fewer changes to work out of the box. Windows Azure has also released PHP support, and developers can do more without programming against the Microsoft APIs than ever before.

Portable: No Heavy Lifting Required

A portable PaaS is a platform built to run code without requiring significant changes to how that code is written. For developers who have created code to run in shared hosting or dedicated hosting environments, moving that code into a portable Platform-as-a-Service should not be difficult. There are no required service hubs that need to be adhered to in order to run your applications.

There are still limitations, and they can be somewhat challenging to get around, but those limitations are much more functional rather than code related.

Portability broadens the amount and types of code that you can write for Platform-as-a-Service. It also broadens the language support and allows for more flexibility. If you want to move an application between different portable PaaS platforms, you will need to change some aspects of how your application works, but typically those changes will not involve a complete rewrite of your system.

In contrast, look at the early days of Google App Engine, which at the time only supported Python; you needed to write a particular version of Python with certain functions enabled. That limited you: for example, you couldn’t run one of the most popular Python frameworks, Django. It’s a problem you would never encounter today on a portable Platform-as-a-Service.

Heroku

Heroku, founded in 2007, was one of the earliest companies offering a portable Platform-as-a-Service. Heroku saw what Force.com and Google App Engine were doing and concluded that forcing developers to write their code against its APIs didn’t make as much sense as just letting any code be written.

Heroku started out as a Ruby-only independent startup company, allowing Ruby code to be deployed in any form. It has since been acquired by Salesforce.com and has expanded its language offerings. However, Heroku still doesn’t let you write to the filesystem. The rationale is that this makes it easier to create more instances of your app (Heroku calls them “dynos”). If you were to upload or change a piece of code, it would only end up running on a single dyno, and if your application runs on 100 dynos, the uploaded file would not be propagated, leaving an inconsistent dyno. In order to prevent that problem, Heroku simply says that you cannot write to the filesystem (except for an ephemeral temporary directory).

As with Google App Engine, there is also a certain amount of time (although it’s more generous with Heroku) that an application can survive for before it is timed out. But there are also tasks that can run in the background and do some work that is asynchronous. These ideas were pioneered by Heroku and set some of the early standards for what can be done with portable Platform-as-a-Service.

A further comparison of Heroku to Google App Engine illustrates some of the key differences between portable and non-portable PaaS.

With Google App Engine, you have to be very strict about the code you are writing, making sure that it adheres specifically to Google’s APIs. With Heroku, the code you write—whether it is on a shared or dedicated host—is the same as it is on Heroku. The difference in portability has to do with writing your code against the provider’s system versus writing it in a generic way. The reason it becomes portable is that you can take that same code from Heroku and run it on your own systems without having to make major modifications to it.

One of Heroku’s other innovations revolves around deploying code. The early PaaS offerings, like Google App Engine, had programs through which you would deploy your code. Heroku took a more general approach and created a git-based deployment system (git is a source-controlled management tool that lets you keep revisions of your software over time, like CVS and other source-control tools).

At Heroku, when you commit your code into the git source control, pushing the code into Heroku triggers a deployment. It’s a very quick and easy way to allow developers to deploy their code, unlike trigger shell hosting systems that generally use FTP. With FTP, you have to look for files that have changed and make sure you upload them and sync them. But git will track the file changes over time and keep a history of those changes so you don’t have to go hunting for files that have changed. It identifies them and sends those files to the platform automatically.

Cloud Foundry

Built by VMware, Cloud Foundry is a generational new technology that is centered around PaaS. A more recent creation than either Google App Engine or Heroku, it comprises a set of programs and services that let you run your own Platform-as-a-Service. It is open source licensed (Apache 2) and can even be run on your laptop.

With Cloud Foundry, you have access to a set of packages and programs that allow you to interact and deploy code with the same feel as PaaS. It will manage your code and let you scale it up and down in the ways you are used to with a Platform-as-a-Service.

Because it is an open source program, it is significantly different from Heroku, Google App Engine, or Windows Azure. Each of those is a hosted managed service: you do not get to look at the source code, nor do you get to modify those services. Essentially, they are take-it-or-leave-it situations, black boxes into which you enter your code, which is then deployed.

Cloud Foundry has many of the features of PaaS, but it is not a managed service. Like Apache, it’s a technology that you have to run yourself. However, it is a complex distributed system and is quite a bit harder to run than a typical system such as Apache.

It is not a system that you can sign up for publicly. If you want to sign up, you have to look for a public provider of Cloud Foundry. Their numbers are growing: AppFog is one, and VMware offers it as a hosted service so you can try it.

In contrast to Heroku, instead of using a git-deploy mechanism, Cloud Foundry created a REST API, a new way to think about deploying code. It uses a Ruby-based command line tool to deploy code. Because it is a REST API, it gives you the flexibility to make your own decisions about whether you want git integration, CVS, Subversion, Darcs, Mercurial, Team Foundation, or anything else. If you want a different kind of version control for your code, it doesn’t prescribe one for you and lets you use whichever one you want.

Cloud Foundry also made some other innovative decisions around how to support third-party services. With Heroku and other Platform-as-a-Service technologies, one of the quick benefits is the ability to provision services and have them tie into your application. Traditionally, the way that has happened has been by setting environmental values that are passed along to your application through the runtime. They can be read into the application to get the credentials for your database. Cloud Foundry supports a very similar mechanism, but it has wrapped the idea of services—like MySQL, Postgres, and Redis—into an abstraction that lets you bind services to and unbind them from applications; it keeps an environmental variable array available so that you can iterate through them programmatically.

Cloud Foundry also lets you bind a single data source to multiple applications, which is handy when you need to debug an application and determine its state—either in production data or as an audit system. You can bind your data source simultaneously to various different applications.

One of Heroku’s innovations is centered on a third-party add-on marketplace. Via this marketplace, Heroku can provision other cloud tools and, using the environmental variables, pass along the credentials for those tools to your application. Some PaaS platforms, like AppFog, have incorporated a similar idea, but Cloud Foundry does not currently have third-party integration built in.

Service binding is one of the few aspects within a portable Platform-as-a-Service that does actually require a difference in code. When you are trying to move an application from one Platform-as-a-Service to another, generally the part that will result in a difference is how it connects to the database or data sources. Often there are dissimilarities that do require some code—usually a small amount—to be rewritten.

AppFog

CloudFoundry.org pushed the boundaries of Platform-as-a-Service by providing an open source tool set that any developer could use. It also pushed into new territory because it was not a managed Platform-as-a-Service. From the developer’s point of view this meant that before you could use Cloud Foundry, you would have to set it up yourself and then run it. If you were going to use it in production, you would have to set it up and run it in production, which, while offering a great amount of flexibility, cuts back on the ease of use typically associated with PaaS.

AppFog is a Platform-as-a-Service that is managed and maintained, and it incorporates Cloud Foundry technology. AppFog started as an independent startup and was acquired by CenturyLink in 2013.

One of the other innovations of Heroku is that it was entirely built on Amazon Web Services, and to this day it continues to run on AWS. This is very different from earlier PaaS providers such as Force.com, Google App Engine, and Azure. Each of these was built on top of its own platforms and infrastructures. Cloud Foundry has two components: the open source library called CloudFoundry.org and a proprietary hosted managed platform called CloudFoundry.com, which uses CloudFoundry.org code.

AppFog is a company that also uses CloudFoundry.org code and runs it on multiple infrastructures and public cloud systems. So, while it does run on Amazon Web Services, it is also compatible with OpenStack platforms like Rackspace, HP Cloud, and others. It can even run on private cloud instances of OpenStack and vSphere.

From a developer’s point of view, AppFog has many of the features of CloudFoundry.org that you’d find out of the box. But it runs on many infrastructures, letting you choose them and giving you the portability of those infrastructures, as well as the code. Additionally, AppFog has taken the ideas of other platforms, like integration with third-party cloud services, and incorporated those into its platform. The result: you can sign up and run your applications in any cloud infrastructure that you want, using a technology that you can run yourself, giving you the benefits you’ll find in other systems (like Heroku) that incorporate third-party add-ons.

dotCloud

There are other platforms that focus specifically on the infrastructure behind the PaaS and spend less time on the user experience. dotCloud is an example of a Platform-as-a-Service that innovated by being the first to support multiple languages and technologies, and it has popularized the idea of Linux containers with an open source project called Docker.

When dotCloud was released, Heroku had been focusing only on Ruby, Google App Engine only on Python, and Azure only on .NET. dotCloud came along and offered support for Python, Java, Ruby, and other technologies.

This popular Platform-as-a-Service has focused on creating a system that works though the command line, similar to Cloud Foundry. It has a Unix command line and an API to interact with that command line, enabling you to deploy your applications in multiple languages.

One of the differences between Cloud Foundry and dotCloud is how they approach language support. With Cloud Foundry, the entire system is an open source tool, which means that you can go in and change the way that applications are run and managed; management is tightly coupled to the Cloud Foundry tool set. dotCloud has abstracted the way that applications are run; you can manage the way they are run within the specifications while you deploy your application.

CloudBees

CloudBees is a Platform-as-a-Service that is focused specifically on Java. It has been built around Java tool sets and incorporates common tools used within Java platforms.

One aspect that separates CloudBees from other platforms is its integration with continuous integration tools such as Jenkins. In fact, CloudBees has hired some of the people that maintain Jenkins and has become a leader in the continuous integration space. This has given Platform-as-a-Service a new twist because it allows for systems and extends the broader view of PaaS. With other platforms, the idea is to take your code and deploy it into production; CloudBees incorporates more development tools to extend the purview of what it provides.

Instead of simply taking code and putting it into production, CloudBees provides a system that lets you test that code in a continuous manner, making sure that it works before it goes into production. It provides a longer pipeline before your code is deployed and extends some of the functionality for how a developer can work with her Platform-as-a-Service. To date, however, CloudBees still only supports Java and Java-enabled applications. So, while it has broadened what can be accomplished with a Platform-as-a-Service, CloudBees is still limited to just one technology.

Summary: Where Do You Want to Live?

With a portable Platform-as-a-Service, the major advantage is that you can take existing code and deploy it more easily, without major rewrites. It can be a faster iteration. If you need to move your application from a particular system into another environment, it generally takes a lot less effort.

The advantages of a non-portable platform are highly dependent on what that platform provides. For example, in Google App Engine, the advantage is tying into the infrastructure and operations of Google. In the case of Windows Azure, the advantage is tying into the operations of Microsoft.

The trade-offs depend on what kind of application you want to run. For example, if you need to run a Node.js application, you won’t be able to do so on Google App Engine. But if you want to try Google’s table functions, you won’t be able to do that on Heroku or AppFog. Your selection of a portable or non-portable PaaS depends on your needs and what feature set you want to take advantage of. When all is said and done, however, you should keep your mind on what’s down the road for your application; ultimately, you must ask yourself how much you are concerned about future changes to your code and where you want it to live.

Dealing with Legacy and Greenfield Apps

Another important consideration arises when you are evaluating PaaS options: are you moving existing applications or creating new ones? When answering that question, it is very important to think about the portable and non-portable aspects of Platform-as-a-Service.

If you have already written your code, it’s going to be substantially more difficult to move your code into a non-portable Platform-as-a-Service. Rearchitecting a complex Python application in order for it to work on Google App Engine is going to be a much bigger challenge than trying to get it to run on AppFog or Cloud Foundry. In the case of Cloud Foundry, it could work right out of the box. In the case of Google App Engine, it may take significant engineering resources.

If you are creating new applications—if you want to start from scratch and if you have flexibility in terms of the language and technology choices you make—you’ll have more choices in your selection of a platform. If you are evaluating technologies for scalability and performance, taking a look at Google App Engine and Windows Azure is very much worthwhile. If taking advantage of Google’s operations and infrastructure would be advantageous to your greenfield application, it makes sense to try Google App Engine. The caveat is to think ahead and make sure that should something drastic happen to the platform you choose, you won’t have boxed yourself into a corner.

An additional factor is pricing. If you are on a non-portable platform and the pricing changes, suddenly becoming too expensive, moving your application to another provider can be much more difficult than with a portable service.

One more issue: downtime. Almost every Platform-as-a-Service has had issues with reliability at some point. If your application can only run on one of these platforms, you take the risk that it will go down with that platform at some point. If your app is built more generically and can run on many different portable platforms, you can take advantage of that should you encounter reliability issues.

Tapping Into Services

Earlier, we briefly discussed services in the context of Heroku and Cloud Foundry. One of the big benefits of using a Platform-as-a-Service is the ability to tie quickly and easily into services such as MySQL, memcached, and PostgreSQL. There are a number of related advantages: you can get started with them quickly, they are managed and maintained for you, and typically the platform will do backups and provide redundancy.

There are disadvantages as well. On some platforms, it can be difficult to access your data and have an insight into the running of the services. And some services, especially those in the SQL family, are notorious for being difficult to scale. When your application grows, one of the biggest concerns is making sure that you have insight into your database systems; you want to be able to inspect all the details about how your code is interacting with your database and gain insight into how you can optimize your database accordingly.

It’s a trade-off. On the one hand, you may experience lack of control. On the other hand, there is a huge advantage: automatic management.

The services model can play an important role when it comes to email. Functions such as sending email can be extremely difficult because email hosting providers have become very adept at sending mail to spam and junk mailboxes. They are accustomed to knowing when cloud providers are sending email and blocking it. This problem arises because spammers are able to spin many servers in the cloud very quickly in order to send massive amounts of email. So, many cloud providers are blacklisted. You are not allowed to send email, even if you try, which creates big problems when you are trying to create applications that send email. Luckily, integration with mailing systems within the services model lets you quickly tie into hosted mail senders who have solved these problems for you and have been approved explicitly by Gmail, Hotmail, and many email spam filters to let their messages go through. Doing that on your own is a very difficult task; having easy tie-ins to be able to do that through a hosted provider who is already whitelisted is a big advantage when you are creating applications.

Moving Toward Open Standards

Open standards are an important concept for Platform-as-a-Service because they can give a developer confidence about how to deploy an application independently of which service provider is doing it.

Having to learn the ins and outs of every single different provider can be a complete nightmare. However, we have already seen the wide variety of PaaS options out there and the different types of services that they provide.

The non-portable Platform-as-a-Service that started out with the simple premise of “We will give you the power of Google for your web application” provides a different solution than a portable platform like Heroku that says, “We will run the application without changing your code.” And that is a much different type of solution than one that says, “We will give you your own Platform-as-a-Service that you can run on your laptop.” Each of these providers has tried to tackle different problems and has a much different way of thinking about the way you create and deploy applications, which is why each has a different standard, a different feel, and a different look. They might use git-based deployment mechanisms, REST APIs, or proprietary systems for deploying. One might support continuous integration and one might focus on deployment. With such a variety, it has been hard to standardize on a single one.

The Allure of Open Source

Although historically there have been many different types of Platform-as-a-Service technologies, with more being born every day, there is the possibility for an emerging standard to appear around Cloud Foundry or OpenShift, a few of the open source options for PaaS.

There are several compelling reasons for this. The fact that you can run these technologies from your laptop is very compelling for developers trying to incorporate PaaS into their daily workflow. The other big plus for open source communities around PaaS is the fact that you can theoretically have choice between various providers with compatible PaaS options, or even run it yourself in production. In contrast, unless Heroku open sources its technology stack, you are tied to Heroku’s choice of infrastructure provider.

Taken together, these assets of open source PaaS offer an opportunity to create a standard around their APIs without the ability to lock anybody into a specific infrastructure or service provider. It also acts as an emerging standard for how to deploy applications in a standard way across infrastructures, both public and private.

We have already seen the start of using the Cloud Foundry API as a standard. AppFog uses it to deploy applications across Amazon Web Services, Rackspace, HP Cloud, and Windows Azure, which illustrates that one standard API can be used across different backends. The way that the developer interacts with these systems is though the Cloud Foundry API, decoupling the infrastructure implementation underneath from the API language that the developer uses to speak to it.

More and more PaaS technology is being open sourced every day. Cloud Foundry was one of the early leaders to do so, but Red Hat later open sourced OpenShift, GigaSpaces has an open source Cloudify offering, and even dotCloud has open sourced large chunks of its PaaS technology stack.

Evaluating Your Legacy

As you can see, there are a variety of PaaS providers out there, built for different needs and with different ideas in mind. When thinking about legacy applications and moving them to the cloud, it’s a good idea to understand the limitations of the platforms with respect to the needs of your applications. In the next chapter, we’ll take a deeper look at moving legacy apps to PaaS and provide solutions for some of the challenges you might encounter.

Previous Chapter

2. What Is PaaS?

Next Chapter

4. Moving Legacy Apps to PaaS