Table of Contents for
Stream Processing with Apache Flink
Close
Version ebook
/
Retour
Stream Processing with Apache Flink
by Vasiliki Kalavri
Published by O'Reilly Media, Inc., 2019
Cover
nav
Stream Processing with Apache Flink
Stream Processing with Apache Flink
1. Introduction to Stateful Stream Processing
2. Stream Processing Fundamentals
3. The Architecture of Apache Flink
4. Setting up a development environment for Apache Flink
5. The DataStream API (v1.4.0)
6. Time-based and Window Operators
7. Stateful Operators and User Functions
8. Reading from and Writing to External Systems
9. Setting Flink Up for Streaming Applications
10. Operating Flink and Streaming Applications
1. Introduction to Stateful Stream Processing
Traditional Data Infrastructures
Stateful Stream Processing
Event-driven Applications
Data Pipelines and Real-time ETL
Streaming Analytics
The Evolution of Open Source Stream Processing
A Taste of Flink
What You Will Learn in This Book
2. Stream Processing Fundamentals
Introduction to dataflow programming
Dataflow graphs
Data parallelism and task parallelism
Data exchange strategies
Processing infinite streams in parallel
Latency and throughput
Operations on data streams
Time semantics
What is the meaning of one minute?
Processing time
Event time
Watermarks
Processing time vs. event time
State and consistency models
Task failures
Result guarantees
Summary
3. The Architecture of Apache Flink
System Architecture
Components of a Flink Setup
Application Deployment
Task Execution
Highly-Available Setup
Data Transfer in Flink
High Throughput and Low Latency
Flow Control with Back Pressure
Event Time Processing
Timestamps
Watermarks
Watermarks and Event Time
Timestamp Assignment and Watermark Generation
State Management
Operator State
Keyed State
State Backends
Scaling Stateful Operators
Checkpoints, Savepoints, and State Recovery
Consistent Checkpoints
Recovery from Consistent Checkpoints
Flink’s Lightweight Checkpointing Algorithm
Savepoints
Summary
4. Setting up a development environment for Apache Flink
Required Software
Run and debug Flink applications in an IDE
Import the book’s examples in your IDE
Run Flink applications in an IDE
Debug Flink applications in an IDE
Bootstrap a Flink Maven project
5. The DataStream API (v1.4.0)
Hello, Flink!
Set up the execution environment
Read an input stream
Apply transformations
Output the result
Execute
Types
Supported Data Types
Type Hints
TypeInformation
Transformations
Basic transformations
KeyedStream transformations
Multi-stream transformations
Partitioning transformations
Setting the parallelism
Referencing fields and defining keys
Defining UDFs
Including External and Flink Dependencies
Summary
6. Time-based and Window Operators
Configuring Time Characteristics
Timestamps and watermarks for event-time applications
Watermarks, Latency, and Completeness
Process Functions
The TimerService and Timers
Emitting to Side Outputs
The CoProcessFunction
Window Operators
Defining Window Operators
Built-in Window Assigners
Applying Functions on Windows
Customizing Window Operators
Joining Streams on Time
Handling Late Data
Dropping Late Events
Redirecting Late Events
Updating Results by Including Late Events
Summary
7. Stateful Operators and User Functions
Implementing Stateful Functions
Declaring Keyed State at the RuntimeContext
Implementing Operator List State with the ListCheckpointed Interface
Using Connected Broadcast State
Using the CheckpointedFunction Interface
Receiving Notifications about Completed Checkpoints
Robustness and Performance of Stateful Applications
Choosing a State Backend
Enabling Checkpointing
Updating Stateful Operators
Tuning the Performance of Stateful Applications
Preventing Leaking State
Queryable State
Architecture and Enabling Queryable State
Exposing Queryable State
Querying State from External Applications
Summary
8. Reading from and Writing to External Systems
Application Consistency Guarantees
Provided Connectors
Apache Kafka Source Connector
Apache Kafka Sink Connector
File System Source Connector
File System Sink Connector
Apache Cassandra Sink Connector
Implementing a Custom Source Function
Resettable Source Functions
Source Functions, Timestamps, and Watermarks
Implementing a Custom Sink Function
Idempotent Sink Connectors
Transactional Sink Connectors
Asynchronously Accessing External Systems
Summary
9. Setting Flink Up for Streaming Applications
Deployment Modes
Stand-Alone Cluster
Docker
Apache Hadoop YARN
Kubernetes
Highly-Available Setups
Highly-Available Stand-Alone Setup
Highly-Available YARN Setup
Highly-Available Kubernetes Setup
Integration with Hadoop Components
File System Configuration
System Configuration
Java and Classloading
CPU
Main Memory
Disk Storage
State Backends, Checkpointing, and Recovery
Security
Summary
10. Operating Flink and Streaming Applications
Running and Managing Streaming Applications
Savepoints
Managing Applications with the Command-Line Client
Managing Applications with the REST API
Monitoring Flink Clusters and Applications
Flink Web UI
Metric System
Monitoring Latency
Configuring the Logging Behavior
Summary