Overview of Apache Flink
Apache Flink is an open source framework and a distributed processing engine used for batch data processing (Unbound and Bound). Flink has been build to run in all cluster environments and perform computation at in Memory speed and at any scale
Please find the objective of the training below:
- Introduction to Apache Flink
- Transformation Operations of Dataset API
- Interaction with Real-time Data
- Gelly API and Graph Processing
Duration
3 Days
Prerequisite for Apache Flink
Good Knowledge of Data Integration and how does the distributed data works.
Course Outline for Apache Flink
Section: 1
- Introducing Flink
- Batch-Processing Vs Stream-Processing
- Hadoop Vs Streaming-Engines (Spark & Flink)
- Spark-Vs-Flink
- Flink Architecture/Ecosystem
- Flink’s programming model – Flow of a Flink program
- Installation of Flink
Section: 2
- Transformation operations of DataSet API
- Default Code structure of a Flink Program
- WordCount using Map, Flatmap, Filter, groupby
- Joins – Inner join
- Joins – Left, Right & Full Outer Join
- Join Hints for Optimization (Exclusive feature)
Section: 3
- DataStream API Operations
- Data Sources & Sinks of Datastream API
- First program using Datastream API
- Reduce Operation
- Fold Operation
- Aggregation Operations: Flink
- Split Operation
- Iterate Operator
Section: 4
- Windows: Flink
- Introduction to Windowing
- Window Assigners
- Various Time-Notions ofWindows in Flink
- Tumbling-Windows Implementation
- Sliding Windows Implementation
- Session Windows Implementation
- Global Windows Implementation
- Triggers in Windows
- Evictors for Windows
- Watermarks, Late Elements & Allowed Lateness
- How to generate Watermarks
- Recommendation
Section: 5
- State, Checkpointing, and Fault-tolerance
- Understanding State in Flink
- Checkpointing/Barrier Snapshoting
- Incremental Checkpointing (New Feature)
- Types of States
- Value State Implementation
- List State Implementation
- Reducing State Implementation
- Managed Operator State Implementation
- Implement Checkpointing in a Flink Program
- The Broadcast State Implementation
- Queryable State (Beta Version)
Section: 6
- Interacting with Real-Time Data
- Getting Twitter data using its APIs
- Adding Kafka to Flink as a Data source
- Install Kafka – RealTimeTuts Link
Section: 7
- Solve Real-Time Case studies in Flink
- Twitter data analysis in Flink
- Bank Real-Time Fraud detection
- Stock Real-Time Data-Processing
Section: 8
- Table & Sql API | Relational APIs Flink
- Introducing Table & Sql API
- Register a Table in Relational APIs
- Writing Queries in Table & Sql API
Section: 9
- Gelly API for Graph Processing
- What is a Graph
- Calculate Friends of Freinds of a Person using GELLY Api