Getting Familiar with the Alexa Skills Kit (ASK)

1 The Tools You Need and Where to Find Them

In this chapter, I’ll cover some basic terminology, hopefully clarify any confusion, and then move into getting you up and running with everything you’ll need to start developing Amazon Alexa Skills on your own.

Background Information

The Amazon Echo family of devices is an immensely popular platform, and much more complex than just a “voice controlled speaker” as you may have seen it referred to. As of September 2017, they hold a 70% market share (roughly 19 million devices!!), with Google Home at a distant second (24%), and other devices making up the remainder.

If you are already familiar with the Echo, some of this may be old news to you, but if you’re coming in completely cold then you might have seen the terms Alexa and Echo frequently being used interchangeably online.

The following table will allow us to get on the same page with some basic terms and how they’ll be used in this book.

Alexa	This is the personification of the Amazon Echo line of devices, much like Microsoft’s Cortana, and Apple’s Siri.
Alexa Skill	These are built-in capabilities, such as playing music, setting a timer, reporting the weather, as well as 3^rd party created functions (over 15,000 at last count) that Alexa knows how to perform. These may include retrieving data from a service, playing games, and more.
Alexa Voice Service	This is the brain behind the pretty voice. Handles voice to text translation, natural language processing, command interpretation, routing (to your skill) and translating text back to voice for the Echo.
Amazon Echo	This is the physical hardware product. Currently there are six Amazon products with varying degrees of Alexa integration. The Echo and Echo Dot are both voice driven, and are essentially the same with the Echo having a more powerful speaker. The Echo Tap and Fire TV devices require a touch to activate, but then respond to voice commands. The two newest devices, Echo Show and Echo Look, offer the same feature set as the rest, plus a video display and selfie cam, respectively.
Companion App	This is Amazon’s app (for Android, iPhone, and Fire devices) that is required to configure the Echo hardware, and can also be used to display additional information / responses from your Alexa skills. A lot of the “value” of this app (in terms of additional skill data) has been incorporated into the Echo Show.
Wake Word	A name that Alexa listens for, followed by a command. The default is “Alexa”, but users can also configure their device to respond to “Amazon”, “Echo”, or my personal favorite: “Computer”.

Hopefully that helps some. Now let’s move on to getting you up to speed on how it all works.

Alexa User Interaction Flow

It begins with you asking Alexa to do something, via your device. The Echo transmits your command to the Alexa Voice Service via the internet. Alexa Voice Service converts the speech to text, parses it, identifies the skill being requested and routes the request to the skill service endpoint. (This could be in AWS Lambda, Azure, or a service you host.)

Once your skill service processes your request and returns a response back to the Alexa Voice Service, the text of your response is converted back to speech and streamed back to the Echo where it is read back to you, and the companion app is updated.

All in all, it looks like Figure 1-1.

Figure 1-1. The Alexa User Interaction Flow

Commands are structured as follows:

Wake Word + Command Verb + Skill Invocation Name + Intent (+ optional slot)

I’ve already covered Wake Words above, so I’ll move on to Command Verbs.

The Alexa Voice Service (we’ll just say Alexa from this point forward) understands a number of command verbs, including: Ask, Begin, Launch, Load, Open, Play, Resume, Run, Start, Talk To, Tell, Turn Off, Set, and Use. This isn’t an exhaustive list, because Alexa is constantly improving and getting smarter.

In addition to command verbs, Alexa understands numerous prepositions, including: About, For, From, If, and To.

The Skill Invocation Name is rather self-explanatory, but just to be thorough, it’s the name of the skill you wish to invoke, i.e. SurfReport, Uber, Timer, or any of the other thousands available.

Intents are how you map the actions your Alexa Skill can perform to the functions that your backend service offers. Slots are optional parameters that get passed into your Intent, allowing you to narrow down the results before you get them. You can invoke skills with or without specific requests (intents.)

If you take this basic command structure and put it all together, you get something like the following:

Alexa + Set + Timer + for + 5 Minutes

Alexa + List + 3 Gas Stations + Near Me

Alexa + Ask + Uber + to Call Me an Uber + SUV + At Work

Alexa + Get + Tomorrow’s + High Tide + From SurfReport

Prior to using any of the custom (3^rd party) skills, you must enable them. This can be done in the Alexa app, or by voice command. If you know the name of the skill you wish to use, simply say: Alexa Enable skillname.

Language Variance

As developers, you can (and should) try to accommodate as many different ways of invoking skills as possible, but you don’t have to exhaust every possible permutation of the English language to do so. For example, my mother can be rather verbose, and polite, so she says please when asking Alexa to do something. My teenagers, on the other hand, tend to be a lot less formal.

When my mom says the following, Alexa only hears (i.e. cares about) the words in bold:

Alexa, please play some of Norah Jones’ music for me.

My teenage son says the following, and gets the exact same result:

Alexa! Norah Jones!

There are a number of phrases that Alexa understands and knows to ignore. These include, but aren’t limited to, things like: please, can you, I want, I would like, and for me.

I’ll revisit this concept several times throughout the book as we begin developing different types of skills. For more in-depth information on how Alexa parses voice commands, be sure to also take a look at Appendix A: Designing for Voice Interaction.

Development Environment

The right development environment can make your life much easier.

If you don’t plan on hosting your skills on servers you control (i.e. you plan to use AWS Lambda), then you can literally get away with nothing more than a simple text editor and an internet browser.

If you’re most comfortable working in the Microsoft stack, and want to control how, when and where you publish your skill code, I recommend Visual Studio Community 2017. This is especially good if you’re a student or hobbyist, because it’s free, but you can always use one of the more advanced editions too.

If you don’t plan to use VB.NET or C#, but still want a great and lightweight developer environment, I would encourage you to download Visual Studio Code instead. It runs everywhere, and is smart enough to handle most script languages with ease.

If you’re not running Windows, there’s a version of Visual Studio Code that runs equally well on Mac and Linux, as shown in Figure 1-2, so you don’t have to miss out on all the fun.

Figure 1-2: Visual Studio Downloads

No matter which editor you decide on, you can download both from here: https://www.visualstudio.com/downloads/

If you aren’t a Microsoft fan, or have a favored environment, odds are you can use your preferred stack too, although you may need multiple tools in order to handle the editing, compiling, publishing parts of the process. That’s beyond the scope of this book, but chances are if you have a favorite process and tools for web development, you’ve already got everything you need anyway.

Optionally, you may want to download and install Node.js. We won’t use it exclusively throughout the book, but we will be referencing the Node.js Alexa Skills Kit Samples in a later chapter. You can find the latest stable Node.js release here: https://nodejs.org/en/download/ as shown in Figure 1-3.

Node.js is one of the most supported languages in the Alexa Skills Kit documentation and samples, so even if you don’t plan on using it to write your own skills, you’ll likely be looking at a lot of it.

Figure 1-3: Node.js Download Screen

The Amazon Developer Portal

One of the great things about developing Alexa Skills is the variety of supported languages and platforms available to you, the developer.

Aside from your preferred development environment, and optionally Node.js (as mentioned above) nearly everything else you need to develop skills for Alexa can be found in the Alexa section of Amazon’s Developer Portal, which you can find at: https://developer.amazon.com/alexa.

The Alexa section of the developer portal is broken down into three main parts: Alexa Skills Kit (ASK), Alexa Voice Service (AVS), and the Alexa Fund, as shown in Figure 1-4. Don’t be fooled, there’s A LOT of information buried in there. Amazon adds new content constantly, and provides excellent support for the entire family of Echo devices.

Figure 1-4: The Alexa section of the Amazon Developer Portal.

I’ll cover the Alexa Skills Kit (ASK) in-depth in the next chapter, but we’re going to skim the surface just a little in this chapter so you can get started building your first skill. Before I do that, though, I’m going to shuffle things around for a moment (I know… bear with me) and talk about the other two parts of this section: the Alexa Voice Service (AVS) and the Alexa Fund.

The Alexa Voice Service refers to the cloud-based backend service that is the brain of Alexa. It’s what your Echo device talks to, and serves as the controller that routes commands to the various skills your users will invoke. It’s also how you would integrate support for Alexa into your own connected products, via either AVS API endpoints (over HTTP) or the AVS Device SDK for C++ developers.

In-depth coverage of custom hardware integration with Alexa and AVS is beyond the scope of this work, so I encourage you to dig into the AVS documentation available on the Alexa section of the Developer Portal.

The Alexa Fund is a $100 million pool of venture capital funding intended to promote innovation in voice technology. If you’ve got some ideas about how to improve or create new Alexa capabilities, or new devices that use Alexa, it may be worth your time to talk to them. You can find more information, and links to contact them, under the Alexa Fund heading on the Alexa page.

Ok, with that out of the way, let’s talk briefly about the Alexa Skills Kit (I’ll save the good stuff for chapter 2) and then spend the rest of this chapter building your very first Amazon Alexa skill.

Amazon describes the Alexa Skills Kit (ASK from here on out, for the rest of the book… I promise) as “a collection of self-service APIs, tools, documentation, and code samples that make it fast and easy for you to add skills to Alexa.” This covers voice driven skills, and also includes samples for making video skills for the Echo Show device.

Let’s make a skill!

Making Your First Alexa Skill

Someone once said to me: “Nobody wants to wait 2 or 3 chapters to build something.” So, in the spirit of that, it’s time to push up your sleeves, heat up a Hot Pocket, grab your coffee, and get busy.

We’re going to build a “Hello World!” skill. I know, it’s not sexy, or even particularly exciting, but… it allows me to show you everything you need to get a simple skill up and running without any additional fluff or functionality getting in the way.

You’re going to need an account for the Amazon Developer Portal. You can connect it to your regular Amazon account, or not. Totally up to you. If you already have an account, go ahead and sign in. If you don’t have an account, you’ll be prompted to create one. Don’t worry, it’s free.

Once you’ve signed in, make sure you’re in the Alexa section, and click the Get Started button under the ASK logo, as shown in Figure 1-5. If you don’t see it, try clicking the Alexa tab in the menu bar near the top of the page. Amazon has a tendency to move things around, but you should be able to spot a “Get Started” button or link somewhere on the page.

Figure 1-5: The ASK Get Started Button.

Next, click the “Add A New Skill” button, which will put you into the Skill Information Screen, as seen in Figure 1-6. This is the beginning of a 7 screen workflow that you will need to complete for each skill you create.

For this skill, I am sticking with the default choices of Custom Interaction Model and English (U.S.) language. For the Name and Invocation fields, just call it “Hello World”. Take the defaults in the Global Fields section, and click the Save button.

At this point, the Save button will go away, and a Next button will appear. Before you click it, take a look at the page again and you’ll see the addition of an Application ID field. This is unique to your skill, and you can’t edit it.

Figure 1-6: The Skill Information Screen

Go ahead and click the Next button, and you’ll be presented with the Interaction Model screen, as shown in Figure 1-7. Don’t worry if it’s not an exact match to what’s in the book. Amazon is constantly adding new things to improve the skills development process, so you might see something new when you create your own skill.

I’ve also cropped out the Custom Slot Types section since we won’t be using it for this skill. We’ll make use of it later in the book though, and I’ll talk about it more then.

The two most important things on this screen are the Intent Schema and the Sample Utterances, which I’ll discuss now.

The Intent Schema is a block of JSON code (that’s JavaScript Object Notation, but everyone just pronounces it like the name Jason) that describes the Intents in your Alexa Skill. We covered Intents earlier in this chapter, but if you’ve forgotten, they are the functions your skill knows how to perform. Feel free to flip back a few pages for some examples.

Add this to the Intent Schema box:

{
"intents":
[
{
"intent":"HelloWorldIntent"
},
{
"intent":"AMAZON.HelpIntent"
}
]
}

The HelloWorldIntent will be invoked whenever the user gives one of the Sample Utterances below, and will return the phrase “Hello World!” We’ll get to the actual code behind that Intent in the next section.

The AMAZON.HelpIntent is one of several built-in intents provided by Amazon. We don’t actually have to add it to the schema unless we intend to extend it, but it’s a good habit to get into anyway.

Figure 1-7: The Interaction Model Screen

The Sample Utterances are the phrases Alexa listens for in order to invoke the intents associated with your skill. You can (and should) assign multiple phrases to each Intent.

Add the following to the Sample Utterances box:

HelloWorldIntent say hello
HelloWorldIntent say hello world
HelloWorldIntent hello

You may have noticed, we’re not adding an utterance for the Amazon.HelpIntent. For this example, we don’t really need to, because if Alexa doesn’t understand your utterance, the HelpIntent will be invoked by default.

Once you’ve done this, click the Next button and proceed to the Configuration screen.

For this skill, you will just take all the defaults, except for the Endpoint Type, as seen in Figure 1-8. Select the AWS Lambda option. We’ll discuss both option in further detail in a later chapter, but for now, we’re going to use AWS Lambda. (AWS stands for Amazon Web Services, in case you were wondering.)

We’re going to create a very simple AWS Lambda Function that returns the phrase “Hello World” whenever the function is called by the HelloWorldIntent.

Figure 1-8: Service Endpoint Type and Address

In your browser go to the AWS Management Console (http://aws.amazon.com) and log in. If you don’t already have an account, you’ll need to create one first.

Once you’re in the dashboard, look for the Compute section (Figure 1-9), and find the entry for Lambda. Click on the Lambda link, and then click on the red “Create function” button in the top right of the AWS Lambda Functions screen.

Figure 1-9: AWS Compute Section

You’ll be asked to select a blueprint, which are templates to help jumpstart the coding process for specific tasks, but you’re going to bypass that for now and click the red “Author from scratch” button.

Next, you’ll be asked to configure triggers for your Lambda function. Click in the empty box (see Figure 1-10) and select “Alexa Skills Kit” from the popup list, and then click the Next button.

Figure 1-10: Add Trigger

At this point, you should be looking at the Configure Function screen. Fill in the Basic Information section as shown in Figure 1-11. If the runtime is defaulted to a different version of Node.js, that’s ok. It won’t matter for this example.

Figure 1-11: Basic Information About Your AWS Lambda Function

Scroll down a little more to the Lambda function code section. You want it to look like the following node.JS code, which I’ll discuss below:

'use strict';

exports.handler = function (event, context, callback) {
if (event.request.type === "LaunchRequest") {
onLaunch(event.request,
event.session,
function callback(sessionAttributes, speechResponse) {
context.succeed(buildResponse(sessionAttributes, speechResponse));
});
} else if (event.request.type === "IntentRequest") {
onIntent(event.request,
event.session,
function callback(sessionAttributes, speechResponse) {
context.succeed(buildResponse(sessionAttributes, speechResponse));
});
}
};

function onLaunch(launchRequest, session, callback) {
var speechOutput = "You can ask me to say Hello World!"
callback(null,
buildSpeechResponse(speechOutput, "", false));
}

function onIntent(intentRequest, session, callback) {
var intent = intentRequest.intent,
intentName = intentRequest.intent.name,
repromptText = "You can ask me to say Hello World!"

if (intentName == 'HelloWorldIntent') {
callback(null,
buildSpeechResponse("Hello World!", repromptText, true));
}

if (intentName == 'AMAZON.HelpIntent') {
callback(null,
buildSpeechResponse(repromptText, "", true));
}
}

function buildSpeechResponse(output, repromptText, endSession) {
return {
outputSpeech: {
type: "PlainText",
text: output
},
card: {
type: "Simple",
title: "Hello World!",
content: output
},
reprompt: {
outputSpeech: {
type: "PlainText",
text: repromptText
}
},
shouldEndSession: endSession
};
}

function buildResponse(sessionAttributes, speechResponse) {
return {
version: "1.0",
sessionAttributes: sessionAttributes,
response: speechResponse
};
}

It seems like there’s a lot going on here, but it’s actually pretty simple. We have two event handlers: LaunchRequest and IntentRequest.

A LaunchRequest fires whenever your skill first launches, and gives you an opportunity to provide your users with some guidance on how to use the skill. If I were to say “Alexa, Open HelloWorld”, then this event would be called.

An IntentRequest fires whenever someone specifies an Intent when calling your skill. So, if I were to say “Alexa, Tell HelloWorld to Say Hello World”, then this is the event that would be called.

The onLaunch and onEvent functions are called when their respective event is received, and they are each responsible for constructing the appropriate message by calling the buildSpeechResponse function.

Finally, the buildSpeechResponse function assembles the JSON response that your Alexa skill is expecting. Without properly constructed JSON, your skill won’t understand the response at all. We’ll talk about a proper JSON response shortly, but first… let’s finish our AWS Lambda Function.

Scroll down further, to the Lambda function handler and role section, and make sure it looks like Figure 1-12. Accept the defaults for everything else.

Figure 1-12: Lambda Function Handler and Role

Click the Next button, which will take you to the review screen, make sure everything is right, and then click the Create function button.

Before returning to the Alexa Skills Dashboard, you should be sure to test your new AWS Lambda Function. You can do this by clicking the Test button on the top right of the screen. If all goes well, you should see a message that states “Execution result: succeeded.”

Expand the details section below it to see the result returned by the HelloWorld function, which in this case is the message “Hello World!”

If you’re curious, there is also some useful information about how long it took your function to run, how much memory it used, and below that is some instrumentation (graphs) about the performance of your function.

There’s a chance that you might also get an error message that looks like this: “Cannot read property ‘type’ of undefined.” That’s ok too. I know what you’re thinking, but just humor me for now, and I promise we’ll come back to it.

We’re almost done.

Scroll back up to the top of the page and copy the *AWS Lambda ARN (Amazon Resource Name) that sits right above the test button.

*Yes, that’s actually short for “Amazon Web Services Lambda Amazon Resource Name.” Crazy, huh?

Go back to the Alexa dashboard, Configuration tab, and paste it in the text box under where you selected the AWS Lambda option. Click the Next button to proceed. You should now be on the Test screen. If not, click the Test tab on the left.

I mentioned earlier that we use JSON to send and receive messages between Alexa and your service, whether it’s an AWS Lambda function, or a C# Web API service in Azure, it’s also JSON messaging.

Below is an example of a properly formatted JSON request:

{
"session": {
"new": false,
"sessionId": "SessionId.1762552d-d18c-4a7b-b2b7-c5ba9e5005ed",
"application": {
"applicationId": "amzn1.ask.skill……"
},
"attributes": {},
"user": {
"userId": "amzn1.ask.account………"
}
},
"request": {
"type": "IntentRequest",
"requestId": "EdwRequestId.14bd71ee-a39c-44fd-9de1-883d2d558fd8",
"intent": {
"name": "AMAZON.HelpIntent",
"slots": {}
},
"locale": "en-US",
"timestamp": "2017-09-02T02:58:23Z"
},
"context": {
"System": {
"application": {
"applicationId": "amzn1.ask.skill……"
},
"user": {
"userId": "amzn1.ask.account………"
},
"device": {
"supportedInterfaces": {}
}
}
},
"version": "1.0"
}

In the request block, under intent, you will see the name of the intent being requested, which in this case is the built-in AMAZON.HelpIntent. Your AWS Lambda function reads that value and routes your skill response accordingly. Don’t get overwhelmed, a lot of this is actually generated for you by Alexa.

The response back to Alexa is more concise:

{
"version": "1.0",
"response": {
"outputSpeech": {
"text": "You can ask me to say Hello World!",
"type": "PlainText"
},
"card": {
"content": "You can ask me to say Hello World!",
"title": "Hello World!"
},
"reprompt": {
"outputSpeech": {
"text": "",
"type": "PlainText"
},
"shouldEndSession": true
}
},
"sessionAttributes": {}
}

Since this is coming from your AWS Lambda Function, you’ll be on the hook for this part, but don’t worry. By the time you’re done with this book, you’ll be an old pro at building these.

You’ll get to see your skill in action if you scroll down to the Service Simulator section of the page. In the Enter Utterance box, type hello. This will send a request to your AWS Lambda function for the HelloWorldIntent of your HelloWorld skill. You can see the full JSON request in the Service Request box.

Since the phrase “hello” matches one of your sample utterances from earlier, your function knows that your passing in an Intent, and returns some properly formatted JSON, which you can see in the Service Response box.

You can also get Alexa to speak to you by clicking the Listen button below the response.

Congratulations! You did it. You have a working Alexa Skill now.

If you have an Alexa capable device, such as an Echo or Echo Dot, you can say “Alexa Enable HelloWorld Skill” and after a moment, it will be ready for you to use, but nobody else (because you haven’t published it yet.)

Try saying “Alexa Open HelloWorld” and “Alexa Tell HelloWorld to Say Hello.” Pretty exciting stuff.

You’re probably thinking about publishing your new HelloWorld skill now, right? After all, there’s two more screens in that workflow and we haven’t covered them yet. Don’t worry, we’ll come back to those screens in a later chapter.

This is a great place to stop for chapter 1. You’ve learned what tools you need and where to find them, how Alexa Skills work, and you built a simple Alexa Skill and an AWS Lambda Function. Not bad for the first chapter!

We’ve barely scratched the surface of all the cool stuff you can do. In Chapter 2, we’ll dig deeper into the ASK and take a look at what it really has to offer.

Next Chapter

Getting Familiar with the Alexa Skills Kit (ASK)

Table of Contents for Programming Alexa Skills

1

The Tools You Need and Where to Find Them

Table of Contents for
Programming Alexa Skills