Apache Spark (от англ. spark — искра, вспышка) — программный каркас с открытым исходным кодом для реализации распределённой обработки неструктурированных и слабоструктурированных данных.
2nd Edition. — Damji Jules, Wenig Brooke, Das Tathagata, Lee Denny. — O’Reilly Media, 2020. — 398 p. — ISBN: 978-1-492-05004-9. Data is getting bigger, arriving faster, and coming in varied formats—and it all needs to be processed at scale for analytics or machine learning. How can you process such varied data workloads efficiently? Enter Apache Spark. Updated to emphasize new...
Sams Publishing, 2017. — 1105 p. — ISBN13: 978-0-672-33851-9. Apache Spark is a fast, scalable, and flexible open source distributed processing engine for big data systems and is one of the most active open source big data projects to date. In just 24 lessons of one hour or less, Sams Teach Yourself Apache Spark in 24 Hours helps you build practical Big Data solutions that...
Packt Publishing, 2016. — 332 p. Spark is one of the most widely-used large-scale data processing engines and runs extremely fast. It is a framework that has tools that are equally useful for application developers as well as data scientists. This book starts with the fundamentals of Spark 2 and covers the core data processing framework and API, installation, and application...
Apress Media, LLC, 2023. — 416 p. — ISBN-13: 978-1-4842-9380-5. This book explains how to scale Apache Spark 3 to handle massive amounts of data, either via batch or streaming processing. It covers how to use Spark’s structured APIs to perform complex data transformations and analyses you can use to implement end-to-end analytics workflows. This book covers Spark 3’s new...
O’Reilly Media, Inc., 2022. — 435 p. — ISBN 978-1-492-08238-5. Apache Spark's speed, ease of use, sophisticated analytics, and multilanguage support makes practical knowledge of this cluster-computing framework a required skill for data engineers and data scientists. With this hands-on guide, anyone looking for an introduction to Spark will learn practical algorithms and...
Manning Publications, 2016. — 472 p. — ISBN: 978-1617292606. Spark in Action teaches you the theory and skills you need to effectively handle batch and streaming data using Spark. Fully updated for Spark 2.0. Big data systems distribute datasets across clusters of machines, making it a challenge to efficiently query, stream, and interpret them. Spark can help. It is a...
2nd Edition. — Apress Media, LLC, 2021. — 445 p. — ISBN-13 (electronic): 978-1-4842-7383-8. Take a journey toward discovering, learning, and using Apache Spark 3.0. In this book, you will gain expertise on the powerful and efficient distributed data processing engine inside of Apache Spark; its user-friendly, comprehensive, and flexible programming model for processing data in...
Packt Publishing, 2019. — 334 p. — ISBN: 978-1-78934-656-5. Combine advanced analytics including Machine Learning, Deep Learning Neural Networks and Natural Language Processing with modern scalable technologies including Apache Spark to derive actionable insights from Big Data in real-time Every person and every organization in the world manages data, whether they realize it or...
O’Reilly Media, Inc., 2020. — 336 p. — ISBN13: 978-1-492-04776-6. 2020-06-24: First Release If you want to build an enterprise-quality application that uses natural language text but aren’t sure where to begin or what tools to use, this practical guide will help get you started. Alex Thomas, principal data scientist at Wisecube, shows software engineers and data scientists how...
BPB Publications, 2024. — 638 р. — ISBN 978-93-55518-026. A practical guide to using Spark SQL to perform complex queries on your Databricks data. Description: Databricks stands out as a widely embraced platform dedicated to the creation of data lakes. Within its framework, it extends support to a specialized version of Structured Query Language (SQL) known as Spark SQL. If you...
Apress, 2020. — 281 p. — ISBN13: (electronic): 978-1-4842-5781-4. Analyze vast amounts of data in record time using Apache Spark with Databricks in the Cloud. Learn the fundamentals, and more, of running analytics on large clusters in Azure and AWS, using Apache Spark with Databricks on top. Discover how to squeeze the most value out of your data at a mere fraction of what...
O’Reilly Media, 2019. — 452 p. — ISBN13: 978-1-491-94424-0. Before you can build analytics tools to gain quick insights, you first need to know how to process data in real time. With this practical guide, developers familiar with Apache Spark will learn how to put this in-memory framework to use for streaming data. You’ll discover how Spark enables you to write streaming jobs...
2nd Edition. — Manning Publications, 2020. — 629 p. — ISBN: 978-1617295522. Spark in Action, Second Edition is an entirely new book that teaches you everything you need to create end-to-end analytics pipelines in Spark. Rewritten from the ground up with lots of helpful graphics, you’ll learn the roles of DAGs and dataframes, the advantages of “lazy evaluation”, and ingestion...
Комментарии