Apache Hadoop — свободно распространяемый набор утилит, библиотек и фреймворк для разработки и выполнения распределённых программ, работающих на кластерах из сотен и тысяч узлов.Используется для реализации поисковых и контекстных механизмов многих высоконагруженных веб-сайтов, в том числе, для Yahoo! и Facebook.Разработан на Java в рамках вычислительной парадигмы MapReduce, согласно которой приложение разделяется на большое количество одинаковых элементарных заданий, выполнимых на узлах кластера и естественным образом сводимых в конечный результат.
Packt, 2018. — 482 p. Explore big data concepts, platforms, analytics, and their applications using the power of Hadoop 3 Apache Hadoop is the most popular platform for big data processing, and can be combined with a host of other big data tools to build powerful analytics solutions. Big Data Analytics with Hadoop 3 shows you how to do just that, by providing insights into the...
Packt Publishing, 2017. — 206 p. — ISBN: 139781787124769. This book will teach you how to deploy large-scale dataset in deep neural networks with Hadoop for optimal performance. Starting with understanding what deep learning is, and what the various models associated with deep neural networks are, this book will then show you how to set up the Hadoop environment for deep...
Addison-Wesley Professional, 2015. — 304 p. — ISBN: 978-0134049946. Get Started Fast with Apache Hadoop 2, YARN, and Today’s Hadoop Ecosystem With Hadoop 2.x and YARN, Hadoop moves beyond MapReduce to become practical for virtually any type of data processing. Hadoop 2.x and the Data Lake concept represent a radical shift away from conventional approaches to data usage and...
Packt Publishing, 2013. — 368 p. — ISBN: 978-1-78216-516-3. На англ. языке. We are facing an avalanche of data. The unstructured data we gather can contain many insights that could hold the key to business success or failure. Harnessing the ability to analyze and process this data with Hadoop is one of the most highly sought after skills in today’s job market. Hadoop, by...
Manning Publications, 2021. — 482 p. — ISBN 978-1617296901. Data Pipelines with Apache Airflow teaches you the ins-and-outs of the Directed Acyclic Graphs (DAGs) that power Airflow, and how to write your own DAGs to meet the needs of your projects. With complete coverage of both foundational and lesser-known features, when you’re done you’ll be set to start using Airflow for...
O’Reilly, 2017. — 300 р. — ISBN: 978-1491959633. Up until recently, Hadoop deployments have existed on hardware owned and run by organizations, often alongside legacy “big-iron” hardware. Today, cloud service providers allow customers to effectively rent hardware and associated network connectivity, along with a variety of other features like databases and bulk storage. But...
O’Reilly, 2017. — 338 p. Until recently, Hadoop deployments existed on hardware owned and run by organizations. Now, of course, you can acquire the computing resources and network connectivity to run Hadoop clusters in the cloud. But there’s a lot more to deploying Hadoop to the public cloud than simply renting machines. This hands-on guide shows developers and systems...
Packt Publishing, 2018. — 220 p. — ASIN B07K46H6VV. A fast paced guide that will help you learn about Apache Hadoop 3 and its ecosystem Key Features Set up, configure and get started with Hadoop to get useful insights from large data sets Work with the different components of Hadoop such as MapReduce, HDFS and YARN Learn about the new features introduced in Hadoop 3 Book...
Apress, 2017. — 304 p. — ISBN: 978-1-4842-1909-6 Learn advanced analytical techniques and leverage existing toolkits to make your analytic applications more powerful, precise, and efficient. This book provides the right combination of architecture, design, and implementation information to create analytical systems which go beyond the basics of classification, clustering, and...
PE Press, 2021. — 120 р. — ISBN: 978-1-716-10839-6. This book provides alternative approach to get started with Big Data Query using Apache Impala. This book describes how to work with Apache Impala and to perform queries inside Apache Impala. Apache Impala is a modern, open source, distributed SQL query engine for Apache Hadoop. With Impala, we can query data, whether stored...
Addison-Wesley Professional, 2016. — 283 p. — (Addison-Wesley Data & Analytics). — ISBN10: 0134024141. — ISBN13: 978-0134024141. As adoption of Hadoop accelerates in the enterprise and beyond, there's soaring demand for those who can solve real world problems by applying advanced data science techniques in Hadoop environments. Now Practical Data Science with Hadoop(R) and Spark...
Packt Publishing, 2013. — 316 p. Helping developers become more comfortable and proficient with solving problems in the Hadoop space. People will become more familiar with a wide variety of Hadoop related tools and best practices for implementation. Hadoop Real-World Solutions Cookbook will teach readers how to build solutions using tools such as Apache Hive, Pig, MapReduce,...
Ravi Prasad», 2024. — 84 p. — ASIN B0DKDSB9NK. Unlock the power of big data with "Hadoop Essentials", your comprehensive guide to understanding and utilizing Hadoop for data processing and analysis. Designed specifically for beginners, this book breaks down complex concepts into manageable steps, making it easy for anyone to grasp the fundamentals of Hadoop. Preface. Frequently...
Packt Publishing, 2015. — 222 p. — ISBN: 978-1-78528-899-9. Integrate Elasticsearch into Hadoop to effectively visualize and analyze your data The Hadoop ecosystem is a de-facto standard for processing terra-bytes and peta-bytes of data. Lucene-enabled Elasticsearch is becoming an industry standard for its full-text search and aggregation capabilities. Elasticsearch-Hadoop...
Packt Publishing, 2017. — 348 p. — ISBN13: 9781787126732. Over 100 practical recipes to help you become an expert Hadoop administrator Hadoop enables the distributed storage and processing of large datasets across clusters of computers. Learning how to administer Hadoop is crucial to exploit its unique features. With this book, you will be able to overcome common problems...
Packt Publishing, 2015. — 100 p. — ISBN: 978-1-78328-155-8. Get to grips with the intricacies of Hadoop monitoring using the power of Ganglia and Nagios With the exponential growth of data and many enterprises crunching more and more data, Hadoop as a data platform has gained a lot of popularity. The Hadoop platform needs to be monitored with respect to how it works and...
O’Reilly, 2018. - 200p. - ISBN: 1491980257 Fast data ingestion, serving, and analytics in the Hadoop ecosystem have forced developers and architects to choose solutions using the least common denominator–either fast analytics at the cost of slow data ingestion or fast data ingestion at the cost of slow analytics. There is an answer to this problem. With the Apache Kudu...
Knowledge Powerhouse, 2016. — 58 p. Do you want to make career in Data science and Data warehouse in Big Data technology? Apache Hadoop is an essential part of Big Data systems. For a career in Data Science, Data Analytics and Data Warehousing, good knowledge of Hadoop is required. Your career in Data science, Data analytics and Data warehouse can get a boost with the knowledge...
Packt Publishing, 2015. — 518 p. — ISBN10: 1783285516, ISBN13: 9781783285518. Код примеров к книге выложен здесь. This book introduces you to the world of building data-processing applications with the wide variety of tools supported by Hadoop2. Starting with the core components of the framework—HDFS and YARN—this book will guide you through how to build applications using a...
Packt Publishing, 2016. — 979 p. — ISBN: 978-1-78712-516-2. Unlock the power of your data with Hadoop 2.X ecosystem and its data warehousing techniques across large data sets As Marc Andreessen has said “Data is eating the world,” which can be witnessed today being the age of Big Data, businesses are producing data in huge volumes every day and this rise in tide of data need to...
4th Edition. — O’Reilly, 2015. — 805 p. — ISBN: 1491901632. Get ready to unlock the power of your data. With the fourth edition of this comprehensive guide, you’ll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. This book is ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and...
Комментарии