Apache Spark (от англ. spark — искра, вспышка) — программный каркас с открытым исходным кодом для реализации распределённой обработки неструктурированных и слабоструктурированных данных.
2nd Edition (Second Early Release) — O’Reilly Media, 2024. — 350 p. — ISBN: 9780137957002. Apache Spark is amazing when everything clicks. But this practical book is for you if you haven't seen the performance improvements you expected or still don't feel confident enough to use Spark in production. Authors Holden Karau, Rachel Warren, and Anya Bida walk you through the secrets...
BPB Publications, 2024. — 638 р. — ISBN 978-93-55518-026. A practical guide to using Spark SQL to perform complex queries on your Databricks data. Description: Databricks stands out as a widely embraced platform dedicated to the creation of data lakes. Within its framework, it extends support to a specialized version of Structured Query Language (SQL) known as Spark SQL. If you...
Apress Media, LLC, 2023. — 416 p. — ISBN-13 978-1-4842-9380-5. This book explains how to scale Apache Spark 3 to handle massive amounts of data, either via batch or streaming processing. It covers how to use Spark’s structured APIs to perform complex data transformations and analyses you can use to implement end-to-end analytics workflows. This book covers Spark 3’s new...
O’Reilly Media, Inc., 2022. — 435 p. Apache Spark's speed, ease of use, sophisticated analytics, and multilanguage support makes practical knowledge of this cluster-computing framework a required skill for data engineers and data scientists. With this hands-on guide, anyone looking for an introduction to Spark will learn practical algorithms and examples using PySpark. In each...
Packt Publishing, 2021. — 480 p. — ISBN: 1801077746, 9781801077743. Understand the complexities of modern-day data engineering platforms and explore strategies to deal with them with the help of use case scenarios led by an industry expert in big data Key Features Become well-versed with the core concepts of Apache Spark and Delta Lake for building data platforms Learn how to...
2nd Edition. — Apress Media, LLC, 2021. — 445 p. — ISBN-13 (electronic): 978-1-4842-7383-8. Take a journey toward discovering, learning, and using Apache Spark 3.0. In this book, you will gain expertise on the powerful and efficient distributed data processing engine inside of Apache Spark; its user-friendly, comprehensive, and flexible programming model for processing data in...
Packt Publishing, 2021. — 414 p. — ISBN 978-1838647216. Quickly build and deploy massive data pipelines and improve productivity using Azure Databricks Key Features Get to grips with the distributed training and deployment of machine learning and deep learning models Learn how ETLs are integrated with Azure Data Factory and Delta Lake Explore deep learning and machine learning...
Independently published, 2021. — 301 p. — ASIN B0959QYBSW. Distributed Processing for Massive Datasets About the Author About the Technical Reviewer Part I: Getting Started Understanding Apache Spark An Example The Core Use Cases Transform Your Data Analyze Your Data Machine Learning NET for Apache Spark Feature Parity Setting Up Spark Choosing Your Software Versions Choosing a...
Sussex: Apress, 2021. — 269 р. — ISBN: 978-1-4842-6991-6 Get started using Apache Spark via C# or F# and the .NET for Apache Spark bindings. This book is an introduction to both Apache Spark and the .NET bindings. Readers new to Apache Spark will get up to speed quickly using Spark for data processing tasks performed against large and very large datasets. You will learn how to...
2nd Edition. — O’Reilly Media, 2020. — 398 р. — ISBN: 1492050040. Data is getting bigger, arriving faster, and coming in varied formats—and it all needs to be processed at scale for analytics or machine learning. How can you process such varied data workloads efficiently? Enter Apache Spark. Updated to emphasize new features in Spark 2.x., this second edition shows data...
O’Reilly Media, Inc., 2020. — 336 p. — ISBN13: 978-1-492-04776-6. 2020-06-24: First Release If you want to build an enterprise-quality application that uses natural language text but aren’t sure where to begin or what tools to use, this practical guide will help get you started. Alex Thomas, principal data scientist at Wisecube, shows software engineers and data scientists how...
Apress, 2020. — 281 p. — ISBN: 9781484257814. Analyze vast amounts of data in record time using Apache Spark with Databricks in the Cloud. Learn the fundamentals, and more, of running analytics on large clusters in Azure and AWS, using Apache Spark with Databricks on top. Discover how to squeeze the most value out of your data at a mere fraction of what classical analytics...
Apress, 2020. — 281 p. — ISBN13: (electronic): 978-1-4842-5781-4. Analyze vast amounts of data in record time using Apache Spark with Databricks in the Cloud. Learn the fundamentals, and more, of running analytics on large clusters in Azure and AWS, using Apache Spark with Databricks on top. Discover how to squeeze the most value out of your data at a mere fraction of what...
2nd Edition. — Manning Publications, 2020. — 629 p. — ISBN: 978-1617295522. Spark in Action, Second Edition is an entirely new book that teaches you everything you need to create end-to-end analytics pipelines in Spark. Rewritten from the ground up with lots of helpful graphics, you’ll learn the roles of DAGs and dataframes, the advantages of “lazy evaluation”, and ingestion...
Packt Publishing, 2019. — 334 p. — ISBN: 978-1-78934-656-5. Combine advanced analytics including Machine Learning, Deep Learning Neural Networks and Natural Language Processing with modern scalable technologies including Apache Spark to derive actionable insights from Big Data in real-time Every person and every organization in the world manages data, whether they realize it or...
Manning Publications, 2016. — 472 p. — ISBN: 978-1617292606. Spark in Action teaches you the theory and skills you need to effectively handle batch and streaming data using Spark. Fully updated for Spark 2.0. Big data systems distribute datasets across clusters of machines, making it a challenge to efficiently query, stream, and interpret them. Spark can help. It is a...
Packt, 2018. — 618 p. — ISBN: 1789959209. Build efficient data flow and machine learning programs with this flexible, multi-functional open-source cluster-computing framework Key Features Master the art of real-time big data processing and machine learning Explore a wide range of use-cases to analyze large data Discover ways to optimize your work by using many features of Spark...
O’Reilly Media, 2019. — 400 p. — ISBN10: 1491944242, 13 978-1491944240. Before you can build analytics tools to gain quick insights, you first need to know how to process data in real time. With this practical guide, developers familiar with Apache Spark will learn how to put this in-memory framework to use for streaming data. You’ll discover how Spark enables you to write...
Packt, 2019. — 322 p. — ISBN: 1788994613. Speed up the design and implementation of deep learning solutions using Apache Spark Deep learning is a subset of machine learning where datasets with several layers of complexity can be processed. Hands-On Deep Learning with Apache Spark addresses the sheer complexity of technical and analytical parts and the speed at which deep...
Packt Publishing, 2018. — 142 p. — ASIN B07HRTNFZ9. No need to spend hours ploughing through endless data – let Spark, one of the fastest big data processing engines available, do the hard work for you. Key Features Get up and running with Apache Spark and Python Integrate Spark with AWS for real-time analytics Apply processed data streams to machine learning APIs of Apache...
Apress, 2018. — 393 p. — ISBN: 978-1484235782. Develop applications for the big data landscape with Spark and Hadoop. This book also explains the role of Spark in developing scalable machine learning and analytics applications with Cloud technologies. Beginning Apache Spark 2 gives you an introduction to Apache Spark and shows you how to work with it. Along the way, you’ll...
Packt Publishing, 2018. — 474 p. — ISBN: 978-1788474221. A solution-based guide to put your deep learning models into production with the power of Apache Spark Key Features Discover practical recipes for distributed deep learning with Apache Spark Learn to use libraries such as Keras and TensorFlow Solve problems in order to train your deep learning models on Apache Spark Book...
Springer, 2018. — 274 p. — ISBN: 9811305498. The book describes the emergence of big data technologies and the role of Spark in the entire big data stack. It compares Spark and Hadoop and identifies the shortcomings of Hadoop that have been overcome by Spark. The book mainly focuses on the in-depth architecture of Spark and our understanding of Spark RDDs and how RDD...
O’Reilly Media, 2018. — 608 p. — ISBN: 978-1491912218. Learn how to use, deploy, and maintain Apache Spark with this comprehensive guide, written by the creators of this open-source cluster-computing framework. With an emphasis on improvements and new features in Spark 2.0, authors Bill Chambers and Matei Zaharia break down Spark topics into distinct sections, each with unique...
Apress, 2018. — 375 p. — ISBN: 978-1-4842-2148-8; e-ISBN: 978-1-4842-2149-5. See a Mesos-based big data stack created and the components used. You will use currently available Apache full and incubating systems. The components are introduced by example and you learn how they work together. In the , the author begins by creating a private cloud and then installs and examines...
СПб.: Питер, 2017. — 272 с. В этой практичной книге четверо специалистов Cloudera по анализу данных описывают самодостаточные паттерны для выполнения крупномасштабного анализа данных при помощи Spark. Авторы комплексно рассматривают Spark, статистические методы и множества данных, собранные в реальных условиях, и на этих примерах демонстрируют решения распространенных...
Packt Publishing, 2017. — 666 p. — ASIN B01BKL1PD8. Simplify machine learning model implementations with Spark About This Book Solve the day-to-day problems of data science with Spark This unique cookbook consists of exciting and intuitive numerical recipes Optimize your work by acquiring, cleaning, analyzing, predicting, and visualizing your data Who This Book Is For This book...
Packt Publishing, 2017. — 323 p. — ISBN: 978-1-78528-345-1. Unlock the complexities of machine learning algorithms in Spark to generate useful data insights through this data analysis tutorial The purpose of machine learning is to build systems that learn from data. Being able to understand trends and patterns in complex data is critical to success; it is one of the key...
Packt Publishing, 2017. — 452 p. — ISBN: 978-1-78588-835-9. Design, implement, and deliver successful streaming applications, machine learning pipelines and graph applications using Spark SQL API In the past year, Apache Spark has been increasingly adopted for the development of distributed applications. Spark SQL APIs provide an optimized interface that helps developers build...
Packt Publishing, 2017. — 350 p. — ISBN: 978-1-78712-649-7. Unleash the data processing and analytics capability of Apache Spark with the language of choice: Java Apache Spark is the buzzword in the big data industry right now, especially with the increasing need for real-time streaming and data processing. While Spark is built on Scala, the Spark Java API exposes all the Spark...
2nd ed. — Packt Publishing, 2017. — 354 p. — ASIN B01MR4YF5G. Advanced analytics on your Big Data with latest Apache Spark 2.x About This Book An advanced guide with a combination of instructions and practical examples to extend the most up-to date Spark functionalities. Extend your data processing capabilities to process huge chunk of data in minimum time using advanced...
Packt Publishing, 2017. — 350 p. — ISBN: ASIN: B01LY3N7ZO. Key Features Perform big data processing with Spark—without having to learn Scala! Use the Spark Java API to implement efficient enterprise-grade applications for data processing and analytics Go beyond mainstream data processing by adding querying capability, Machine Learning, and graph processing using Spark Book...
Packt Publishing, 2017. — 797 p. — ISBN: 978-1785280849. Key Features Learn Scala's sophisticated type system that combines Functional Programming and object-oriented concepts Work on a wide array of applications from simple batch jobs to stream processing and machine learning Explore the most common as well as some complex use-cases to perform large-scale data analysis with...
Packt Publishing, 2017. — 296 p. — ASIN B071VVFDMP. +Sample files Key Features Understand how Spark can be distributed across computing clusters Develop and run Spark jobs efficiently using Python A hands-on tutorial by Frank Kane with over 15 real-world examples teaching you Big Data processing with Spark Book Description Frank Kane's Taming Big Data with Apache Spark and...
O’Reilly Media, 2017. — 352 p. — ISBN: 978-1491960110. Data science teams looking to turn research into useful analytics applications require not only the right tools, but also the right approach if they’re to succeed. With the revised second edition of this hands-on guide, up-and-coming data scientists will learn how to use the Agile Data Science development methodology to...
Packt Publishing, 2017. — 356 p. — ISBN: 978-1785885136. Spark juggernaut keeps on rolling and getting more and more momentum each day. Spark provides key capabilities in the form of Spark SQL, Spark Streaming, Spark ML and Graph X all accessible via Java, Scala, Python and R. Deploying the key capabilities is crucial whether it is on a Standalone framework or as a part of...
Packt Publishing, 2017. — 294 p. — ISBN13: 9781787127265. Over 70 recipes to help you use Apache Spark as your single big data computing platform and master its libraries. While Apache Spark 1.x gained a lot of traction and adoption in the early years, Spark 2.x delivers notable improvements in the areas of API, schema awareness, Performance, Structured Streaming, and...
O’Reilly Media, 2017. — 358 p. — ISBN: 978-1491943205. Apache Spark is amazing when everything clicks. But if you haven’t seen the performance improvements you expected, or still don’t feel confident enough to use Spark in production, this practical book is for you. Authors Holden Karau and Rachel Warren demonstrate performance optimizations to help your Spark queries run...
O’Reilly Media, 2017. — 358 p. — ISBN: 978-1-491-94320-5. Apache Spark is amazing when everything clicks. But if you haven’t seen the performance improvements you expected, or still don’t feel confident enough to use Spark in production, this practical book is for you. Authors Holden Karau and Rachel Warren demonstrate performance optimizations to help your Spark queries run...
Sams Publishing, 2017. — 592 p. — ISBN13: 978-0-672-33851-9. Apache Spark is a fast, scalable, and flexible open source distributed processing engine for big data systems and is one of the most active open source big data projects to date. In just 24 lessons of one hour or less, Sams Teach Yourself Apache Spark in 24 Hours helps you build practical Big Data solutions that...
Packt Publishing, 2016. — 476 p. — ISBN: 978-1-78588-874-8. Discover everything you need to build robust machine learning applications with Spark 2.0 Data processing, implementing related algorithms, tuning, scaling up and finally deploying are some crucial steps in the process of optimising any application. Spark is capable of handling large-scale batch and streaming data to...
Packt Publishing, 2017. — 560 p. — ISBN: 978-1-78588-214-2. Master the techniques and sophisticated analytics used to construct Spark-based solutions that scale to deliver production-grade data science products Data science seeks to transform the world using data, and this is typically achieved through disrupting and changing real processes in real industries. In order to...
Apress, 2016. — 296 p. — ISBN: 9781484221747.
This book is about how to integrate full-stack open source big data architecture and how to choose the correct technology—Scala/Spark, Mesos, Akka, Cassandra, and Kafka—in every layer. Big data architecture is becoming a requirement for many different enterprises. So far, however, the focus has largely been on collecting,...
Комментарии