Chapter 7. A Look at Core Services

When selecting a PaaS provider, it is important to understand what exactly you are getting yourself into. This chapter will focus on core services, not just the running of an application. When we talk about core services, we’re talking about functions that provide data storage, SQL, NoSQL, queuing, and other support for applications. These core services can also include email monitoring, caching and data management, consumption and analysis, each of which can be an entire application of its own.

Typically a PaaS provider will manage the core services itself, or through a third-party add-on system that integrates tightly with the PaaS. Either way, when using PaaS you do not end up managing many of the core services yourself. This can clearly be beneficial (what developer really wants to spend her time tuning my.cnf files?), but it has some significant trade-offs as well, which we will explore in this chapter.

The goal for this chapter is to help you know what to expect from PaaS-based core services and what questions to ask potential PaaS providers before you commit to running production code on their systems.

Non-PaaS Core Services

PaaS is, by its nature, a very managed environment. You do not have to worry about scaling. You do not have to worry about the operations behind your application. You just know that you need to have it scale up or scale out; the major decisions are whether the app needs more RAM capacity or more instances. That will come to be what you expect out of your core services like MySQL and MongoDB as well. In fact, much of the time, core services can end up being just as difficult to run, scale, and manage as your applications, if not more so. That’s because, if you are managing core services on your own, every service that you add can also add a huge amount of operational complexity.

When you consider hosting and managing a MySQL service, also consider how much time you’ll spend managing it, monitoring it, and dealing with outages. Once you’ve worked with enormous data stores and terabytes of data, it becomes very clear that it’s just as technically challenging to work with services like MySQL, PostgreSQL, MongoDB, and memcached in a distributed fashion as it is to work with your application code directly.

In a non-PaaS environment, you would have to set up MySQL. Then you’d have to tune it. You would have to set up your RAID disk backend for redundancy. Then you would have to devise external backup plans. You would have to set up master-slave replication, or sometimes even master-master replication with slaves. Then you would have to add heartbeat monitors to ensure that the system is continuously watching for outages and can deal with them. You have to be tuning, monitoring, and maintaining these core services all the time. You have to look for all sorts of very low-level network settings and make sure that you are constantly maintaining, managing, and dealing with security patches and upgrades.

Evaluating PaaS for Services

In a PaaS environment, a large chunk of these maintenance tasks are done for you automatically. Obviously, the quality of the operational excellence can range tremendously depending on what PaaS vendor you choose. So, evaluating the core services provided by each PaaS provider can at times be just as important as evaluating any other feature.

The end of this chapter has a checklist of questions you should have good answers to before committing to any vendor. It is critical you take these into consideration early in the decision-making process.

Look for what kinds of limitations you will run into:

  • How much data storage is allowed?

  • How much disk I/O is permitted?

  • What happens when I hit my limits?

  • How much RAM is available per service?

  • How can I get more resources for my services if I need to scale?

When considering a PaaS provider, having this information ahead of time is important as it tells you whether or not you are going to be able to scale your application.

Keep in mind that with many applications you are never going to hit data barriers. Most applications do not need terabytes of disk. Most don’t even need gigabytes. Generally, most applications need just tens of megabytes, with room to grow into the hundreds of megabytes.

So, you should be looking for and predicting how much data you are going to need to store in MongoDB, PostgreSQL, CouchDB, Redis, or memcached. You’re trying to understand, “Does my PaaS allow me to grow? Or will it limit me and make it hard for me to move off?” You’ll need to put in a little work up front talking and thinking about what your application is going to require at the higher end.

Some PaaS providers are going to be a lot better at this than others. But much of the time a PaaS platform will have already tuned the database, the configuration files, and the network interfaces so that they are significantly better than out of the box. It will have done a lot of thinking about failover and redundancy, which can be difficult problems to solve, especially if you are just a developer. Even operations teams can easily spend weeks or months planning and building out the procedures to do things that come right out of the box in a PaaS-managed service.

The benefits of using a PaaS for core services can be significant. You get up to speed much faster. You get to prototype faster. You get to write more code sooner. You don’t have to worry about how to do a distributed memcached or master-master MySQL. You don’t have to figure out how to get through SMTP filters for mail servers and through the spam filters at Google and Hotmail for sending email. PaaS can take much of the burden off your shoulders. A PaaS provider, giving you greater flexibility and speed, can do all of these things for you.

Saving Time with Managed Databases and PaaS

When you are trying to set up managed services on your own, the ones that can provide the biggest headaches are databases. Managing them and backing them up can take more time than writing the code for your application in the first place. The advantages of working with managed SQL core services in PaaS can be significant if you do your homework and make the right choices.

SQL

Working with SQL and PaaS can save an impressive amount of time. By relying on expertise that hopefully has been invested in creating a scalable SQL solution for you, you’ll be able to work faster. But once again, there are several caveats.

You may need to consider the limitations on how many queries you’re allowed. There are many different ways to charge for SQL core services:

  • Number of queries you make to your database in a month

  • Data storage capacity

  • Number of concurrent users accessing data

  • Some combination of these

These are very different ways to price the service and can lead to drastically different bills depending on your specific use cases.

Your decision will be highly dependent on several factors, given the type of application you are going to build. There are a few things that can change the dynamics when choosing a managed PaaS core service or going on your own.

Within the next 12 months, which of these answers will most likely be true in your case?

  1. How much of the data can theoretically be cached most of the time?

    1. More than 50%

    2. Between 10–50%

    3. Less than 10%

  2. How much is your app dynamic data driven, versus static content or objects storable?

    1. Less than 40% dynamic

    2. Between 40–60% dynamic

    3. More than 60% dynamic

  3. How much data is stored in your database?

    1. Less than 5 GB

    2. Between 5 GB and 50 GB

    3. More than 50 GB

  4. How many records need to be stored in any given table?

    1. Less than 10,000,000

    2. Between 10,000,000 and 100,000,000

    3. More than 100,000,000

  5. How many tables do you need?

    1. Less than 20

    2. Between 20–100

    3. More than 100

  6. How frequently will the data be accessed?

    1. Less than 10 times a second

    2. Between 10 and 100 times a second

    3. Over 100 times a second

  7. How much data needs to be accessed frequently and repeatedly?

    1. Less than 1 GB

    2. Between 1 GB and 10 GB

    3. More than 10 GB

If you answered (a) to 5–7 of these questions and (b) for the rest, you are a perfect candidate for most PaaS-managed core SQL services.

If you answered (b) to 3–7 of these questions, you will need to evaluate your PaaS options carefully and do load testing before committing to the PaaS vendor you select.

If you answered (c) to 2–3 of these questions, your best bet is to either run the core SQL service yourself if you are capable of doing so or select a provider that specializes in scaling the SQL service to the levels you need. You can still use PaaS for your application logic, but it is not a good idea to use it for your managed core data service if you are likely to hit these general guidelines.

Some applications will fit the mold and some will not. For example, consider a WordPress database. This can be a highly cacheable application, so you end up not hitting the database with every query. It can be cached within the PaaS application environment easily. It can be highly optimized and relatively inexpensive to run, and doesn’t usually require a lot of data to be stored.

However, when you have a PaaS provider that limits your concurrence on the database, it can be problematic if your blog or website is going to be handling a heavy load. Typically, blogs do not need to handle more than 10 requests a second—even that would be a generous amount of traffic. But there are times when you are going to want to ensure that you can deal with 1,000 requests a second. If your database is designed so that it can only do 10 concurrent requests, that is going to limit the ability to serve 1,000 people. Lots of those people are going to see error pages. The conclusion: knowing those limitations up front is important.

On the other hand, if you are being charged on a per-query basis and there is no limit to concurrence, you need to make sure that you optimize your code to minimize queries, which can get very expensive.

So, while you must keep close track of how you are being charged for your database, the potential benefits—the ability to save time and move quickly—are significant.

NoSQL

NoSQL is a type of key/value or document store database that is gaining popularity and has many forms, including MongoDB, CouchDB, Redis, Cassandra, and Riak. These technologies assist in scaling out in different ways than traditional SQL storage mechanisms.

The advantages of working with NoSQL and PaaS are similar to the ones you experience when working with SQL and PaaS, such as the ability to get started very quickly. You do not need to think about tasks like running your MongoDB service, managing it, or tuning it.

One of the benefits of self-managing NoSQL is that scaling out can be much easier compared to managing a MySQL, PostgreSQL, or traditional SQL database: you just add more virtual machines and tie those into the configuration. Not every PaaS provider allows you to grow quite as easily. It’s a good idea to understand your contingency plan in the event that you need to scale bigger than the NoSQL capacity allowed within your PaaS.

Within the next 12 months, which of these answers is most likely to be true?

  1. How much data is stored in your database?

    1. Less than 5 GB

    2. Between 5 GB and 50 GB

    3. More than 50 GB

  2. How many records or documents need to be stored in any given table?

    1. Less than 100,000,000

    2. Between 100,000,000 and 1,000,000,000

    3. More than 1,000,000,000

  3. How frequently will the data be accessed?

    1. Less than 10 times a second

    2. Between 10 and 100 times a second

    3. Over 100 times a second

  4. How much data needs to be accessed frequently and repeatedly?

    1. Less than 1 GB

    2. Between 1 GB and 10 GB

    3. More than 10 GB

If you answered (a) to 3–4 of these questions and (b) for the rest, you are a perfect candidate for most PaaS-managed core SQL services.

If you answered (b) to 2–4 of these questions, you will need to evaluate your PaaS options carefully and do load testing before committing to the PaaS vendor you select.

If you answered (c) to 2–4 of these questions, your best bet is to either run the core NoSQL service yourself if you are capable of doing so or select a provider that specializes in scaling the NoSQL service to the levels you need. You can still use PaaS for your application logic, but it is not a good idea to use it for your managed core data service if you are likely to hit these general guidelines.

Caches and PaaS: Look for Redundancy

Typically, caches end up being stateless and volatile. There are a few popular caching technologies, some open source like memcached and Ehcache and some proprietary ones like TerraCotta for Java. They do not require as much overhead as a SQL or even a NoSQL database. It is easier to scale them and to run them. If they die it is not going to cause major problems, as long as there is redundancy in place.

The advantages of working with caches in PaaS actually outweigh some of the disadvantages of working with databases in PaaS, since often the hardest part of implementing caches is setting up redundancy, which is typically handled for you within a PaaS environment. Again, checking with your PaaS provider to ensure that there are redundancy and failover mechanisms in place is an important bit of research that needs to be done.

The only other consideration is the limit on how much data is cached. Sometimes a PaaS provider limits you to only being able to cache a certain amount of data. Often that will be sufficient, since most caching systems will purge the oldest (least recently used) data automatically, ending up with only the most important data in memory. However, if your dataset grows past the limit and you are constantly hitting your SQL or NoSQL backends, and your PaaS provider can’t give you a higher limit, it’s time to move to hosting the cache yourself.

Moving from a PaaS hosted cache to a self-managed caching system is not usually a very hard process since caches by nature are volatile, so it is not so important to set up a managed cache for yourself before you hit your limits.

Solving the Challenges of Email

Sending email in a cloud environment can be a very difficult proposition. When you are using virtual machines in Amazon Web Services, Rackspace, or Azure, they often have IP addresses that have been blocked by email providers. Big email providers like AOL and Gmail will block large groups of IP addresses owned by public cloud providers because spammers have regularly taken advantage of them in the past.

When using the public cloud, it is very likely that your application is going to run on an IP address that has been blacklisted by email providers, which leads to a question: how do you send email to your users within a PaaS public cloud environment? Also important for some applications is this associated question: how do you accept and process inbound email in your application?

If you are running a blog, this may not matter. But if you are running an interactive service and part of the service has to do with processing email, you’ll be facing a unique set of challenges when using the public cloud.

The smartest approach is to leverage a managed email service. Often this service is going to be different than your database or caching service. It is likely not to be natively hosted with your PaaS provider. It is usually an independently run software service, and generally it will charge by how many emails you need to send. Sometimes your PaaS provider can facilitate the connection between your application and the email service by setting up an account for you. These kinds of services will send email for you, either one-off or in bulk. A key advantage is that the emails sent from these accounts have already been whitelisted by Gmail, AOL, and Hotmail, which means they are much more likely to get to your users’ inboxes instead of being blocked by spam filters than if you sent them directly yourself.

You may not realize it, but being whitelisted and remaining whitelisted is actually a full-time job of its own. It is very difficult to keep up to date with the requirements that Google, AOL, Microsoft, and Yahoo! impose. Making sure that most users keep receiving communications is easy to do when you are only sending a few dozen emails a day, but it becomes incredibly difficult when you are sending hundreds or thousands a day. In the end, it just makes sense to leverage these services. You can go through third parties, but they may also be managed by your PaaS provider or by your IaaS provider.

Here is a list of a few popular mail services:

The Importance of Monitoring

Whether you are running your own infrastructure or building on PaaS, there is one challenge that is the same: monitoring all the pieces of your application.

This means knowing how much load your application has (right now and historically), and how your database is doing (in terms of transactions, disk I/O, memory, etc.). It also means understanding how many people are getting data concurrently (or trying to) and whether your service is succeeding or failing to serve them.

Considering Your Options

Monitoring options in PaaS can vary greatly. Some PaaS providers give you deep insight and have monitoring built in, others require third-party integrations, and yet others provide little to no insight.

Like your email service, monitoring can be provided either by a third party or natively by your PaaS platform. Many times it is a combination of both. There will be a set amount of information that your PaaS platform will give you about your application; the monitoring will show you some statistics, give you some baseline information, and perhaps integrate with third-party monitoring services to give you deeper information. For example, New Relic and AppFirst, which are third-party SaaS services, will give you very detailed information about how well your application is performing and when there are errors, and can even notify you if the application is not working or hits certain limits. These services can sometimes tell you how long it takes for your application to respond to a user and give you averages and historical information.

Monitoring is important for another reason. Traditionally, PaaS has been extremely easy to scale horizontally and sometimes even vertically. The hardest part is knowing when to scale. The vast majority of PaaS services do not have an auto-scaling feature that decides to add capacity to your application for you. But if you have a proactive monitoring service that informs you when you are hitting some limits, or when your application is running slowly, you can take action.

With information from monitoring services, you can set thresholds and even make your own auto-scaling functionality within the PaaS environment. Alternatively, you can simply go into the console and move the slider up and say, “I need more resources right now.”

In any case, detailed information about monitoring is critical and should never be overlooked when running applications in PaaS. Much of the attraction of PaaS is that you don’t have to worry about the operations behind your application or about tuning and configuring a lot of settings. But ultimately you need to be accountable and know how well your application is performing. Monitoring is essential for doing that. It should never be skipped, and it should be included in every application you run in a PaaS environment.

Here is a list of a few popular monitoring services:

Taking the Long View

Services are the spice of life in applications. They can turn simple applications into interesting ones. They can add value and depth to your application, but every time you add a new service it adds a layer of complication to the running and managing of your application, along with a set of limitations that you should always be aware of.

There are certainly large benefits to services. Every time you are evaluating whether you should build it or run it yourself, you should always think about seeing if a managed service is a better solution for you, because a managed service will likely save you a lot of time and money. But every time you pick a managed service, be aware that it is probably going to be priced in various ways, even if it is the same type of service offered by different providers.

Know how your application works. Use that knowledge to think months or years ahead and evaluate whether a managed service will still be a good fit down the road. This is an exercise that should always be performed.

The responsibility of using managed services comes with a price. The price is paid by planning ahead.

Load Testing

One of the most important parts of dealing with managed services is understanding the limitations of scale. To find those limits, you need to do performance tracking and load testing.

In many PaaS environments, there are add-ons and sometimes even native services that will do this for you. These services can test everything that we’ve discussed in this chapter—the monitor, database capacity, concurrence, limitations, and latencies—so that you can see how your application will perform when 1,000 people come to your site.

These kinds of services should be utilized after you’ve set up every other kind of managed service, then tied into your code. You can use them in different levels. You can see what it is like when 10, 10,000, or 100,000+ concurrent people are hitting your site so you can test your capacity planning. You can run fire drills to scale up both the instances of your application and also your database capacities to make sure that your team is ready to react proactively when your application is under siege.

Summing it up: performance and load-testing services are critical for working through many of the scaling use cases that we have been alluding to.

Here are a couple of popular load-testing services:

Planning an Upgrade Path

A final but very important consideration was touched on earlier in this chapter: you must plan ahead.

You need to think about how you can upgrade when you hit the limits. There are two clear choices:

  • You can move within the PaaS to a higher limit.

  • You can manage your own services.

How you proceed will depend on what PaaS you select, how it manages its systems, and how much access you get to those systems. There are PaaS providers that will let you upgrade seamlessly and allow you to up your resources, consumption, and concurrency with no change at all.

However, you should be aware that there are some providers that will require you to pick a new plan with a new data size, while they create a separate data storage container for you with higher limits. That new data storage will be bigger and have more capacity, but you are going to have to move your data and change your code. Sometimes the PaaS will do that for you, but that takes time. During the changeover, you are going to need downtime to migrate into the larger-capacity plan.

This process will be similar if you are going to move from a managed service to a non-managed one in which you do the management. If you need to create your own database to handle more capacity than the PaaS permits, you will have to set that up as a separate container and figure out a path similar to the one we just described in which you take your site down while you move the data over to the new database with the larger capacity.

Storage Options

In planning ahead for a PaaS upgrade, you should consider where your data is stored and how you are going to run the database yourself. For example, if your Platform-as-a-Service is hosted on Amazon Web Services in the US East data center, you might want to use virtual machines within that data center and manage MySQL yourself. You’ll have to spin up the virtual machines within AWS, which will give you a low latency between the application and its data storage; in return you will get very high access, high speed, and high performance since they are colocated in the same network facility. It’s difficult, but certainly achievable.

The same thing can be done in other hosting providers. You can find comparable solutions at Rackspace, HP Cloud, and Azure: if your PaaS runs in one of these infrastructures, you can spin up virtual machines within the same infrastructure and enjoy low-latency connections between your application and the data. You will not have to manage the application and the scaling of the application, but you will have to manage the database.

Alternatively, certain IaaS providers can manage your services. Sometimes they will offer even greater levels of control or different feature sets compared to PaaS providers. One example of this is Amazon’s Relational Database Service (RDS). It has a great deal of functionality and can be used as its own managed service outside of any PaaS. So, if you choose one PaaS to run your application, you can choose Amazon RDS on the same infrastructure provider to achieve low latency. Your application can talk to that RDS database directly. If you ever choose to move your application to a different PaaS provider that is also on AWS, you can remain with that same RDS, which could be talking to a completely different PaaS provider. This gives you a little bit of flexibility in where you run your application.

In summary, it’s important to think ahead and have plans on how to approach and get past the scalability limits of managing your data. As long as you have thought through these scenarios, it will be easy to get started with a managed service within the PaaS.

When considering a PaaS for managed services, it’s essential to evaluate the provider’s features and limitations. Here are some questions for you to ask when selecting a PaaS provider:

General questions
  • How much data storage do apps have access to?

  • Does the app expect persistent disk storage?

  • Can applications write files to the disk?

  • Are the files being written to the disk ephemeral?

  • How much RAM is available to an application instance?

  • What happens if my app hits that RAM limit?

Databases
  • How will the provider charge?

  • Queries per month?

  • Data storage capacity?

  • Number of concurrent connections?

  • Is redundancy provided?

  • Is there a failover mechanism in place?

Caches
  • Is redundancy provided?

  • Is there a failover mechanism in place?

Email
  • Can the provider facilitate a connection between your app and an email service?

  • Can the provider set up an email account for you?

  • What third-party email service(s) does it work with?

Monitoring
  • What kind of monitoring insight does the provider offer?

  • What kind of control do you have?

  • Does the provider have an auto-scaling feature?

  • Will it notify you when you are hitting the limits?

Performance
  • Does the provider offer performance statistics or have add-ons that have them?

  • Does it provide load-testing services?