Table of Contents for
Stream Processing with Apache Flink

Version ebook / Retour

Cover image for bash Cookbook, 2nd Edition Stream Processing with Apache Flink by Vasiliki Kalavri Published by O'Reilly Media, Inc., 2019
  1. Cover
  2. nav
  3. Stream Processing with Apache Flink
  4. Stream Processing with Apache Flink
  5. 1. Introduction to Stateful Stream Processing
  6. 2. Stream Processing Fundamentals
  7. 3. The Architecture of Apache Flink
  8. 4. Setting up a development environment for Apache Flink
  9. 5. The DataStream API (v1.4.0)
  10. 6. Time-based and Window Operators
  11. 7. Stateful Operators and User Functions
  12. 8. Reading from and Writing to External Systems
  13. 9. Setting Flink Up for Streaming Applications
  14. 10. Operating Flink and Streaming Applications
  1. 1. Introduction to Stateful Stream Processing
    1. Traditional Data Infrastructures
    2. Stateful Stream Processing
      1. Event-driven Applications
      2. Data Pipelines and Real-time ETL
      3. Streaming Analytics
    3. The Evolution of Open Source Stream Processing
    4. A Taste of Flink
    5. What You Will Learn in This Book
  2. 2. Stream Processing Fundamentals
    1. Introduction to dataflow programming
      1. Dataflow graphs
      2. Data parallelism and task parallelism
      3. Data exchange strategies
    2. Processing infinite streams in parallel
      1. Latency and throughput
      2. Operations on data streams
    3. Time semantics
      1. What is the meaning of one minute?
      2. Processing time
      3. Event time
      4. Watermarks
      5. Processing time vs. event time
    4. State and consistency models
      1. Task failures
      2. Result guarantees
    5. Summary
  3. 3. The Architecture of Apache Flink
    1. System Architecture
      1. Components of a Flink Setup
      2. Application Deployment
      3. Task Execution
      4. Highly-Available Setup
    2. Data Transfer in Flink
      1. High Throughput and Low Latency
      2. Flow Control with Back Pressure
    3. Event Time Processing
      1. Timestamps
      2. Watermarks
      3. Watermarks and Event Time
      4. Timestamp Assignment and Watermark Generation
    4. State Management
      1. Operator State
      2. Keyed State
      3. State Backends
      4. Scaling Stateful Operators
    5. Checkpoints, Savepoints, and State Recovery
      1. Consistent Checkpoints
      2. Recovery from Consistent Checkpoints
      3. Flink’s Lightweight Checkpointing Algorithm
      4. Savepoints
    6. Summary
  4. 4. Setting up a development environment for Apache Flink
    1. Required Software
    2. Run and debug Flink applications in an IDE
      1. Import the book’s examples in your IDE
      2. Run Flink applications in an IDE
      3. Debug Flink applications in an IDE
    3. Bootstrap a Flink Maven project
  5. 5. The DataStream API (v1.4.0)
    1. Hello, Flink!
      1. Set up the execution environment
      2. Read an input stream
      3. Apply transformations
      4. Output the result
      5. Execute
    2. Types
      1. Supported Data Types
      2. Type Hints
      3. TypeInformation
    3. Transformations
      1. Basic transformations
      2. KeyedStream transformations
      3. Multi-stream transformations
      4. Partitioning transformations
    4. Setting the parallelism
    5. Referencing fields and defining keys
    6. Defining UDFs
    7. Including External and Flink Dependencies
    8. Summary
  6. 6. Time-based and Window Operators
    1. Configuring Time Characteristics
      1. Timestamps and watermarks for event-time applications
      2. Watermarks, Latency, and Completeness
    2. Process Functions
      1. The TimerService and Timers
      2. Emitting to Side Outputs
      3. The CoProcessFunction
    3. Window Operators
      1. Defining Window Operators
      2. Built-in Window Assigners
      3. Applying Functions on Windows
      4. Customizing Window Operators
    4. Joining Streams on Time
    5. Handling Late Data
      1. Dropping Late Events
      2. Redirecting Late Events
      3. Updating Results by Including Late Events
    6. Summary
  7. 7. Stateful Operators and User Functions
    1. Implementing Stateful Functions
      1. Declaring Keyed State at the RuntimeContext
      2. Implementing Operator List State with the ListCheckpointed Interface
      3. Using Connected Broadcast State
      4. Using the CheckpointedFunction Interface
      5. Receiving Notifications about Completed Checkpoints
    2. Robustness and Performance of Stateful Applications
      1. Choosing a State Backend
      2. Enabling Checkpointing
      3. Updating Stateful Operators
      4. Tuning the Performance of Stateful Applications
      5. Preventing Leaking State
    3. Queryable State
      1. Architecture and Enabling Queryable State
      2. Exposing Queryable State
      3. Querying State from External Applications
    4. Summary
  8. 8. Reading from and Writing to External Systems
    1. Application Consistency Guarantees
    2. Provided Connectors
      1. Apache Kafka Source Connector
      2. Apache Kafka Sink Connector
      3. File System Source Connector
      4. File System Sink Connector
      5. Apache Cassandra Sink Connector
    3. Implementing a Custom Source Function
      1. Resettable Source Functions
      2. Source Functions, Timestamps, and Watermarks
    4. Implementing a Custom Sink Function
      1. Idempotent Sink Connectors
      2. Transactional Sink Connectors
    5. Asynchronously Accessing External Systems
    6. Summary
  9. 9. Setting Flink Up for Streaming Applications
    1. Deployment Modes
      1. Stand-Alone Cluster
      2. Docker
      3. Apache Hadoop YARN
      4. Kubernetes
    2. Highly-Available Setups
      1. Highly-Available Stand-Alone Setup
      2. Highly-Available YARN Setup
      3. Highly-Available Kubernetes Setup
    3. Integration with Hadoop Components
    4. File System Configuration
    5. System Configuration
      1. Java and Classloading
      2. CPU
      3. Main Memory
      4. Disk Storage
      5. State Backends, Checkpointing, and Recovery
      6. Security
    6. Summary
  10. 10. Operating Flink and Streaming Applications
    1. Running and Managing Streaming Applications
      1. Savepoints
      2. Managing Applications with the Command-Line Client
      3. Managing Applications with the REST API
    2. Monitoring Flink Clusters and Applications
      1. Flink Web UI
      2. Metric System
      3. Monitoring Latency
    3. Configuring the Logging Behavior
    4. Summary
Back to top