封面
版权信息
Credits
About the Author
About the Reviewers
www.packtpub.com
Customer Feedback
Preface
Chapter 1. Architecture and Installation
Apache Spark architecture overview
Installing Apache Spark
Writing your first Spark program
Spark architecture
Apache Spark cluster manager types
Running Spark examples
Brain teasers
References
Summary
Chapter 2. Transformations and Actions with Spark RDDs
What is an RDD?
Operations on RDD
Passing functions to Spark (Scala)
Passing functions to Spark (Java)
Passing functions to Spark (Python)
Transformations
Set operations in Spark
Actions
PairRDDs
Shared variables
References
Summary
Chapter 3. ETL with Spark
What is ETL?
How is Spark being used?
Commonly Supported File Formats
Commonly supported file systems
Structured Data sources and Databases
References
Summary
Chapter 4. Spark SQL
What is Spark SQL?
What is DataFrame API?
What is DataSet API?
What's new in Spark 2.0?
The Sparksession
Creating a DataFrame
Parquet files
Working with Hive
SparkSQL CLI
References
Summary
Chapter 5. Spark Streaming
What is Spark Streaming?
Steps involved in a streaming app
Architecture of Spark Streaming
Caching and persistence
Checkpointing
DStream best practices
Fault tolerance
What is Structured Streaming?
References
Summary
Chapter 6. Machine Learning with Spark
What is machine learning?
Why machine learning?
Types of machine learning
Introduction to Spark MLLib
Why do we need the Pipeline API?
How does it work?
Feature engineering
Classification and regression
Clustering
Collaborative filtering
ML-tuning - model selection and hyperparameter tuning
References
Summary
Chapter 7. GraphX
Graphs in everyday life
What is a graph?
Why are Graphs elegant?
What is GraphX?
Creating your first Graph (RDD API)
Basic graph operators (RDD API)
Caching and uncaching of graphs
Graph algorithms in GraphX
GraphFrames
Comparison between GraphFrames and GraphX
References
Summary
Chapter 8. Operating in Clustered Mode
Clusters nodes and daemons
Running Spark in standalone mode
Using the Cluster Launch Scripts to Start a Standalone Cluster
Running Spark in YARN
Running Spark in Mesos
References:
Summary
Chapter 9. Building a Recommendation System
What is a recommendation system?
User specific recommendations
Key issues with recommendation systems
Recommendation system in Spark
References
Summary
Chapter 10. Customer Churn Prediction
Overview of customer churn
Why is predicting customer churn important?
How do we predict customer churn with Spark?
Exploring customer service calls
References
Summary
Appendix . Theres More with Spark
Performance tuning
I/O tuning
Sizing up your executors
The skew problem
Security configuration in Spark
Setting up Jupyter Notebook with Spark
Shared variables
References
Summary
更新时间:2021-07-09 18:46:26