Search Results: spark-cookbook

Spark Cookbook

Author: Rishi Yadav

Publisher: Packt Publishing Ltd

ISBN: 1783987073

Category: Computers

Page: 226

View: 9617

By introducing in-memory persistent storage, Apache Spark eliminates the need to store intermediate data in filesystems, thereby increasing processing speed by up to 100 times. This book will focus on how to analyze large and complex sets of data. Starting with installing and configuring Apache Spark with various cluster managers, you will cover setting up development environments. You will then cover various recipes to perform interactive queries using Spark SQL and real-time streaming with various sources such as Twitter Stream and Apache Kafka. You will then focus on machine learning, including supervised learning, unsupervised learning, and recommendation engine algorithms. After mastering graph processing using GraphX, you will cover various recipes for cluster optimization and troubleshooting.

Apache Spark 2.x Cookbook

Author: Rishi Yadav

Publisher: Packt Publishing Ltd

ISBN: 1787127516

Category: Computers

Page: 294

View: 6340

Over 70 recipes to help you use Apache Spark as your single big data computing platform and master its libraries About This Book This book contains recipes on how to use Apache Spark as a unified compute engine Cover how to connect various source systems to Apache Spark Covers various parts of machine learning including supervised/unsupervised learning & recommendation engines Who This Book Is For This book is for data engineers, data scientists, and those who want to implement Spark for real-time data processing. Anyone who is using Spark (or is planning to) will benefit from this book. The book assumes you have a basic knowledge of Scala as a programming language. What You Will Learn Install and configure Apache Spark with various cluster managers & on AWS Set up a development environment for Apache Spark including Databricks Cloud notebook Find out how to operate on data in Spark with schemas Get to grips with real-time streaming analytics using Spark Streaming & Structured Streaming Master supervised learning and unsupervised learning using MLlib Build a recommendation engine using MLlib Graph processing using GraphX and GraphFrames libraries Develop a set of common applications or project types, and solutions that solve complex big data problems In Detail While Apache Spark 1.x gained a lot of traction and adoption in the early years, Spark 2.x delivers notable improvements in the areas of API, schema awareness, Performance, Structured Streaming, and simplifying building blocks to build better, faster, smarter, and more accessible big data applications. This book uncovers all these features in the form of structured recipes to analyze and mature large and complex sets of data. Starting with installing and configuring Apache Spark with various cluster managers, you will learn to set up development environments. Further on, you will be introduced to working with RDDs, DataFrames and Datasets to operate on schema aware data, and real-time streaming with various sources such as Twitter Stream and Apache Kafka. You will also work through recipes on machine learning, including supervised learning, unsupervised learning & recommendation engines in Spark. Last but not least, the final few chapters delve deeper into the concepts of graph processing using GraphX, securing your implementations, cluster optimization, and troubleshooting. Style and approach This book is packed with intuitive recipes supported with line-by-line explanations to help you understand Spark 2.x's real-time processing capabilities and deploy scalable big data solutions. This is a valuable resource for data scientists and those working on large-scale data projects.

Scala Data Analysis Cookbook

Author: Arun Manivannan

Publisher: Packt Publishing Ltd

ISBN: 1784394998

Category: Computers

Page: 254

View: 3440

Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes About This Book Implement Scala in your data analysis using features from Spark, Breeze, and Zeppelin Scale up your data anlytics infrastructure with practical recipes for Scala machine learning Recipes for every stage of the data analysis process, from reading and collecting data to distributed analytics Who This Book Is For This book shows data scientists and analysts how to leverage their existing knowledge of Scala for quality and scalable data analysis. What You Will Learn Familiarize and set up the Breeze and Spark libraries and use data structures Import data from a host of possible sources and create dataframes from CSV Clean, validate and transform data using Scala to pre-process numerical and string data Integrate quintessential machine learning algorithms using Scala stack Bundle and scale up Spark jobs by deploying them into a variety of cluster managers Run streaming and graph analytics in Spark to visualize data, enabling exploratory analysis In Detail This book will introduce you to the most popular Scala tools, libraries, and frameworks through practical recipes around loading, manipulating, and preparing your data. It will also help you explore and make sense of your data using stunning and insightfulvisualizations, and machine learning toolkits. Starting with introductory recipes on utilizing the Breeze and Spark libraries, get to grips withhow to import data from a host of possible sources and how to pre-process numerical, string, and date data. Next, you'll get an understanding of concepts that will help you visualize data using the Apache Zeppelin and Bokeh bindings in Scala, enabling exploratory data analysis. iscover how to program quintessential machine learning algorithms using Spark ML library. Work through steps to scale your machine learning models and deploy them into a standalone cluster, EC2, YARN, and Mesos. Finally dip into the powerful options presented by Spark Streaming, and machine learning for streaming data, as well as utilizing Spark GraphX. Style and approach This book contains a rich set of recipes that covers the full spectrum of interesting data analysis tasks and will help you revolutionize your data analysis skills using Scala and Spark.

Fast Data Processing with Spark - Second Edition

Author: Krishna Sankar,Holden Karau

Publisher: Packt Publishing Ltd

ISBN: 1784399078

Category: Computers

Page: 184

View: 3501

Fast Data Processing with Spark - Second Edition is for software developers who want to learn how to write distributed programs with Spark. It will help developers who have had problems that were too big to be dealt with on a single computer. No previous experience with distributed programming is necessary. This book assumes knowledge of either Java, Scala, or Python.

Fast Data Processing with Spark 2

Author: Krishna Sankar

Publisher: Packt Publishing Ltd

ISBN: 1785882961

Category: Computers

Page: 274

View: 6148

Learn how to use Spark to process big data at speed and scale for sharper analytics. Put the principles into practice for faster, slicker big data projects. About This Book A quick way to get started with Spark – and reap the rewards From analytics to engineering your big data architecture, we've got it covered Bring your Scala and Java knowledge – and put it to work on new and exciting problems Who This Book Is For This book is for developers with little to no knowledge of Spark, but with a background in Scala/Java programming. It's recommended that you have experience in dealing and working with big data and a strong interest in data science. What You Will Learn Install and set up Spark in your cluster Prototype distributed applications with Spark's interactive shell Perform data wrangling using the new DataFrame APIs Get to know the different ways to interact with Spark's distributed representation of data (RDDs) Query Spark with a SQL-like query syntax See how Spark works with big data Implement machine learning systems with highly scalable algorithms Use R, the popular statistical language, to work with Spark Apply interesting graph algorithms and graph processing with GraphX In Detail When people want a way to process big data at speed, Spark is invariably the solution. With its ease of development (in comparison to the relative complexity of Hadoop), it's unsurprising that it's becoming popular with data analysts and engineers everywhere. Beginning with the fundamentals, we'll show you how to get set up with Spark with minimum fuss. You'll then get to grips with some simple APIs before investigating machine learning and graph processing – throughout we'll make sure you know exactly how to apply your knowledge. You will also learn how to use the Spark shell, how to load data before finding out how to build and run your own Spark applications. Discover how to manipulate your RDD and get stuck into a range of DataFrame APIs. As if that's not enough, you'll also learn some useful Machine Learning algorithms with the help of Spark MLlib and integrating Spark with R. We'll also make sure you're confident and prepared for graph processing, as you learn more about the GraphX API. Style and approach This book is a basic, step-by-step tutorial that will help you take advantage of all that Spark has to offer.

Fast Data Processing With Spark

Author: Holden Karau

Publisher: Packt Publishing Ltd

ISBN: 1782167072

Category: Computers

Page: 120

View: 6713

This book will be a basic, step-by-step tutorial, which will help readers take advantage of all that Spark has to offer.Fastdata Processing with Spark is for software developers who want to learn how to write distributed programs with Spark. It will help developers who have had problems that were too much to be dealt with on a single computer. No previous experience with distributed programming is necessary. This book assumes knowledge of either Java, Scala, or Python.

Apache Spark Deep Learning Cookbook

Over 80 recipes that streamline deep learning in a distributed environment with Apache Spark

Author: Ahmed Sherif,Amrith Ravindra

Publisher: Packt Publishing Ltd

ISBN: 1788471555

Category: Computers

Page: 474

View: 1092

A solution-based guide to put your deep learning models into production with the power of Apache Spark Key Features Discover practical recipes for distributed deep learning with Apache Spark Learn to use libraries such as Keras and TensorFlow Solve problems in order to train your deep learning models on Apache Spark Book Description With deep learning gaining rapid mainstream adoption in modern-day industries, organizations are looking for ways to unite popular big data tools with highly efficient deep learning libraries. As a result, this will help deep learning models train with higher efficiency and speed. With the help of the Apache Spark Deep Learning Cookbook, you’ll work through specific recipes to generate outcomes for deep learning algorithms, without getting bogged down in theory. From setting up Apache Spark for deep learning to implementing types of neural net, this book tackles both common and not so common problems to perform deep learning on a distributed environment. In addition to this, you’ll get access to deep learning code within Spark that can be reused to answer similar problems or tweaked to answer slightly different problems. You will also learn how to stream and cluster your data with Spark. Once you have got to grips with the basics, you’ll explore how to implement and deploy deep learning models, such as Convolutional Neural Networks (CNN) and Recurrent Neural Networks (RNN) in Spark, using popular libraries such as TensorFlow and Keras. By the end of the book, you'll have the expertise to train and deploy efficient deep learning models on Apache Spark. What you will learn Set up a fully functional Spark environment Understand practical machine learning and deep learning concepts Apply built-in machine learning libraries within Spark Explore libraries that are compatible with TensorFlow and Keras Explore NLP models such as Word2vec and TF-IDF on Spark Organize dataframes for deep learning evaluation Apply testing and training modeling to ensure accuracy Access readily available code that may be reusable Who this book is for If you’re looking for a practical and highly useful resource for implementing efficiently distributed deep learning models with Apache Spark, then the Apache Spark Deep Learning Cookbook is for you. Knowledge of the core machine learning concepts and a basic understanding of the Apache Spark framework is required to get the best out of this book. Additionally, some programming knowledge in Python is a plus.

Spark SQL 2.x Fundamentals and Cookbook

More than 35 Exercises (Edition 1.0)

Author: HadoopExam Learning Resources

Publisher: HadoopExam Learning Resources



Page: 160

View: 9212

Apache Spark is one of the fastest growing technology in BigData computing world. It support multiple programming languages like Java, Scala, Python and R. Hence, many existing and new framework started to integrate Spark platform as well in their platform e.g. Hadoop, Cassandra, EMR etc. While creating Spark certification material HadoopExam technical team found that there is no proper material and book is available for the Spark SQL (version 2.x) which covers the concepts as well as use of various features and found difficulty in creating the material. Therefore, they decided to create full length book for Spark SQL and outcome of that is this book. In this book technical team try to cover both fundamental concepts of Spark SQL engine and many exercises approx. 35+ so that most of the programming features can be covered. There are approximately 35 exercises and total 15 chapters which covers the programming aspects of SparkSQL. All the exercises given in this book are written using Scala. However, concepts remain same even if you are using different programming language.

Apache Spark for Data Science Cookbook

Author: Padma Priya Chitturi

Publisher: Packt Publishing Ltd

ISBN: 1785288806

Category: Computers

Page: 392

View: 2453

Over insightful 90 recipes to get lightning-fast analytics with Apache Spark About This Book Use Apache Spark for data processing with these hands-on recipes Implement end-to-end, large-scale data analysis better than ever before Work with powerful libraries such as MLLib, SciPy, NumPy, and Pandas to gain insights from your data Who This Book Is For This book is for novice and intermediate level data science professionals and data analysts who want to solve data science problems with a distributed computing framework. Basic experience with data science implementation tasks is expected. Data science professionals looking to skill up and gain an edge in the field will find this book helpful. What You Will Learn Explore the topics of data mining, text mining, Natural Language Processing, information retrieval, and machine learning. Solve real-world analytical problems with large data sets. Address data science challenges with analytical tools on a distributed system like Spark (apt for iterative algorithms), which offers in-memory processing and more flexibility for data analysis at scale. Get hands-on experience with algorithms like Classification, regression, and recommendation on real datasets using Spark MLLib package. Learn about numerical and scientific computing using NumPy and SciPy on Spark. Use Predictive Model Markup Language (PMML) in Spark for statistical data mining models. In Detail Spark has emerged as the most promising big data analytics engine for data science professionals. The true power and value of Apache Spark lies in its ability to execute data science tasks with speed and accuracy. Spark's selling point is that it combines ETL, batch analytics, real-time stream analysis, machine learning, graph processing, and visualizations. It lets you tackle the complexities that come with raw unstructured data sets with ease. This guide will get you comfortable and confident performing data science tasks with Spark. You will learn about implementations including distributed deep learning, numerical computing, and scalable machine learning. You will be shown effective solutions to problematic concepts in data science using Spark's data science libraries such as MLLib, Pandas, NumPy, SciPy, and more. These simple and efficient recipes will show you how to implement algorithms and optimize your work. Style and approach This book contains a comprehensive range of recipes designed to help you learn the fundamentals and tackle the difficulties of data science. This book outlines practical steps to produce powerful insights into Big Data through a recipe-based approach.

Apache Spark 2.x Machine Learning Cookbook

Author: Siamak Amirghodsi,Meenakshi Rajendran,Broderick Hall,Shuen Mei

Publisher: Packt Publishing Ltd

ISBN: 1782174605

Category: Computers

Page: 666

View: 850

Simplify machine learning model implementations with Spark About This Book Solve the day-to-day problems of data science with Spark This unique cookbook consists of exciting and intuitive numerical recipes Optimize your work by acquiring, cleaning, analyzing, predicting, and visualizing your data Who This Book Is For This book is for Scala developers with a fairly good exposure to and understanding of machine learning techniques, but lack practical implementations with Spark. A solid knowledge of machine learning algorithms is assumed, as well as hands-on experience of implementing ML algorithms with Scala. However, you do not need to be acquainted with the Spark ML libraries and ecosystem. What You Will Learn Get to know how Scala and Spark go hand-in-hand for developers when developing ML systems with Spark Build a recommendation engine that scales with Spark Find out how to build unsupervised clustering systems to classify data in Spark Build machine learning systems with the Decision Tree and Ensemble models in Spark Deal with the curse of high-dimensionality in big data using Spark Implement Text analytics for Search Engines in Spark Streaming Machine Learning System implementation using Spark In Detail Machine learning aims to extract knowledge from data, relying on fundamental concepts in computer science, statistics, probability, and optimization. Learning about algorithms enables a wide range of applications, from everyday tasks such as product recommendations and spam filtering to cutting edge applications such as self-driving cars and personalized medicine. You will gain hands-on experience of applying these principles using Apache Spark, a resilient cluster computing system well suited for large-scale machine learning tasks. This book begins with a quick overview of setting up the necessary IDEs to facilitate the execution of code examples that will be covered in various chapters. It also highlights some key issues developers face while working with machine learning algorithms on the Spark platform. We progress by uncovering the various Spark APIs and the implementation of ML algorithms with developing classification systems, recommendation engines, text analytics, clustering, and learning systems. Toward the final chapters, we'll focus on building high-end applications and explain various unsupervised methodologies and challenges to tackle when implementing with big data ML systems. Style and approach This book is packed with intuitive recipes supported with line-by-line explanations to help you understand how to optimize your work flow and resolve problems when working with complex data modeling tasks and predictive algorithms. This is a valuable resource for data scientists and those working on large scale data projects.

Flex 4 Cookbook

Real-world recipes for developing Rich Internet Applications

Author: Joshua Noble,Todd Anderson,Garth Braithwaite,Marco Casario,Rich Tretola

Publisher: "O'Reilly Media, Inc."

ISBN: 9781449390594

Category: Computers

Page: 768

View: 1002

With this collection of proven recipes, you have the ideal problem-solving guide for developing interactive Rich Internet Applications on the Adobe Flash Platform. You'll find answers to hundreds of common problems you may encounter when using Adobe Flex, Flex 4 Framework, or Flash Builder, Adobe's GUI-based development tool. Flex 4 Cookbook has hands-on recipes for everything from Flex basics to solutions for working with visual components and data access, as well as tips on application development, unit testing, and Adobe AIR. Each recipe provides an explanation of how and why it works, and includes sample code that you can use immediately. You'll get results fast, whether you're a committed Flex developer or still evaluating the technology. It's a great way to jumpstart your next web application. Topics include: Using Spark Component Text Layout Framework Groups and Layout Spark List and ItemRenderer Images, bitmaps, videos, and sounds CSS, styling, and skinning States and Effects Working with Collections Using DataBinding Validation, formatting, and regular expressions Using Charts Services and Data Access Using RSLs and Modules Working with Adobe AIR 2.0

PySpark Cookbook

Over 60 recipes for implementing big data processing and analytics using Apache Spark and Python

Author: Tomasz Drabas,Denny Lee

Publisher: Packt Publishing Ltd

ISBN: 1788834259

Category: Computers

Page: 330

View: 7899

Combine the power of Apache Spark and Python to build effective big data applications Key Features Perform effective data processing, machine learning, and analytics using PySpark Overcome challenges in developing and deploying Spark solutions using Python Explore recipes for efficiently combining Python and Apache Spark to process data Book Description Apache Spark is an open source framework for efficient cluster computing with a strong interface for data parallelism and fault tolerance. The PySpark Cookbook presents effective and time-saving recipes for leveraging the power of Python and putting it to use in the Spark ecosystem. You’ll start by learning the Apache Spark architecture and how to set up a Python environment for Spark. You’ll then get familiar with the modules available in PySpark and start using them effortlessly. In addition to this, you’ll discover how to abstract data with RDDs and DataFrames, and understand the streaming capabilities of PySpark. You’ll then move on to using ML and MLlib in order to solve any problems related to the machine learning capabilities of PySpark and use GraphFrames to solve graph-processing problems. Finally, you will explore how to deploy your applications to the cloud using the spark-submit command. By the end of this book, you will be able to use the Python API for Apache Spark to solve any problems associated with building data-intensive applications. What you will learn Configure a local instance of PySpark in a virtual environment Install and configure Jupyter in local and multi-node environments Create DataFrames from JSON and a dictionary using pyspark.sql Explore regression and clustering models available in the ML module Use DataFrames to transform data used for modeling Connect to PubNub and perform aggregations on streams Who this book is for The PySpark Cookbook is for you if you are a Python developer looking for hands-on recipes for using the Apache Spark 2.x ecosystem in the best possible way. A thorough understanding of Python (and some familiarity with Spark) will help you get the best out of the book.

Der Funke

Die Geschichte eines autistischen Jungen, der es allen gezeigt hat

Author: Kristine Barnett

Publisher: Kailash Verlag

ISBN: 3641093279

Category: Biography & Autobiography

Page: 320

View: 3934

In jedem Kind verbirgt sich der »Funke«. Kristines Sohn Jacob hat einen höheren IQ als Einstein und verfügt über ein fotografisches Gedächtnis. Und er ist Autist. »Der Funke« erzählt die Geschichte einer Mutter, die gegen den Rat aller Experten darum kämpft, ihrem Sohn ein normales, glückliches Leben zu ermöglichen, indem sie ihn ermutigt, seinem »Funken« zu folgen, sich auf das zu konzentrieren, was er liebt, statt auf das, was ihn hindert. Großartige Möglichkeiten können sich eröffnen, wenn wir lernen, das wahre Potenzial zu erwecken, das in jedem Kind ruht – und in jedem von uns.

The Sparkpeople Cookbook

Author: Meg Galvin

Publisher: Hay House, Inc

ISBN: 1401931340

Category: Health & Fitness

Page: 465

View: 1780

From the team that brought you, America's #1 weight-loss and fitness site, and the New York Times bestseller The Spark , comes The SparkPeople Cookbook . This practical yet inspirational guide, which is based on the same easy, real-world principles as the SparkPeople program, takes the guesswork out of making delicious, healthy meals and losing weight-once and for all. Award-winning chef Meg Galvin and SparkRecipes editor Stepfanie Romine have paired up to create this collection of more than 160 satisfying, sustaining, and stress-free recipes that streamline your healthy-eating efforts. With a focus on real food, generous portions, and great flavor, these recipes are not part of a fad diet. They aren't about spending money on obscure ingredients, eliminating key components of a balanced diet, or slaving away for hours at the stove. They are about making smart choices and eating food you love to eat. But this is more than just a collection of recipes —it's an education. The SparkPeople philosophy has always been about encouraging people to achieve personal goals with the help and support of others. And this cookbook works in the just the same way. Along with the recipes, you'll find step-by-step how-tos about the healthiest, most taste-enhancing cooking techniques; lists of kitchen essentials; and simple ingredient swaps that maximize flavor, while cutting fat and calories, plus you'll read motivational SparkPeople success stories from real members who have used these recipes as part of their life-changing transformations. In addition, you'll find: • Results from the SparkPeople "Ditch the Diet" Taste Test, which proves that you don't have to eat tasteless food to lose weight. • 150 meal ideas and recipes that take 30 minutes or less to prepare—plus dozens of other meals for days when you have more time. • Two weeks of meal plans that include breakfast, lunch, dinner, and snacks. So whether you're a novice taking the first steps to improve your health or a seasoned cook just looking for new, healthy recipes to add to your repertoire, this cookbook is for you. Learn to love your food, lose the weight, and ditch the diet forever!

Salz. Fett. Säure. Hitze

Die vier Elemente guten Kochens.

Author: Samin Nosrat

Publisher: Antje Kunstmann

ISBN: 3956142829

Category: Cooking

Page: 472

View: 6169

Samin Nosrat verdichtet ihre reiche Erfahrung als Köchin und Kochlehrerin zu einem so einfachen wie revolutionären Ansatz. Es geht dabei um die vier zentralen Grundlagen guten Kochens: Salz, Fett, Säure und Hitze. Salz – das die Aromen vertieft. Fett – das sie trägt und attraktive Konsistenzen ermöglicht. Säure – die alle Aromen ausbalanciert. Und Hitze – die die Konsistenz eines Gerichts letztendlich bestimmt. Wer mit diesen vier Elementen souverän umgeht, kann exzellent kochen, ohne sich an Rezepte klammern zu müssen. Voller profundem Wissen, aber mit leichter Hand und gewinnendem Ton führt Nosrat in alle theoretischen und praktischen Aspekte guten Kochens ein, vermittelt Grundlagen und Küchenchemie und verrät jede Menge inspirierender Tipps und Tricks. In über 100 unkomplizierten Rezepten wird das Wissen vertieft und erprobt: frische Salate, perfekt gewürzte Saucen, intensiv schmeckende Gemüsegerichte, die besten Pastas, 13 Huhn-Varianten, zartes Fleisch, köstliche Kuchen und Desserts. Samin Nosrats Rezepte ermuntern zum Ausprobieren und zum Improvisieren. Angereichert mit appetitanregenden Illustrationen und informativen Grafiken ist dieses Buch ein unverzichtbarer Küchenkompass, der Anfänger genauso glücklich macht wie geübte Köche.

Machine Learning with Spark

Author: Nick Pentreath

Publisher: Packt Publishing Ltd

ISBN: 1783288523

Category: Computers

Page: 338

View: 4411

If you are a Scala, Java, or Python developer with an interest in machine learning and data analysis and are eager to learn how to apply common machine learning techniques at scale using the Spark framework, this is the book for you. While it may be useful to have a basic understanding of Spark, no previous experience is required.

Arduino Kochbuch

Author: Michael Margolis

Publisher: O'Reilly Germany

ISBN: 3868993541

Category: Computers

Page: 624

View: 4089

Mit dem Arduino-Kochbuch, das auf der Version Arduino 1.0 basiert, erhalten Sie ein Füllhorn an Ideen und praktischen Beispielen, was alles mit dem Mikrocontroller gezaubert werden kann. Sie lernen alles über die Arduino-Softwareumgebung, digitale und analoge In- und Outputs, Peripheriegeräte, Motorensteuerung und fortgeschrittenes Arduino-Coding. Egal ob es ein Spielzeug, ein Detektor, ein Roboter oder ein interaktives Kleidungsstück werden soll: Elektronikbegeisterte finden über 200 Rezepte, Projekte und Techniken, um mit dem Arduino zu starten oder bestehende Arduino-Projekt mit neuen Features aufzupimpen.

Kingsford Complete Grilling Cookbook

Author: Rick Rodgers

Publisher: John Wiley & Sons

ISBN: 0470079142

Category: Cooking

Page: 266

View: 5361

A definitive compilation of recipes and grilling tips from America's leading brand of charcoal features more than 120 delicious recipes for meat, poultry, seafood, vegetables, and desserts, along with helpful information, grilling techniques and cooking methods, and tips on selecting the right cuts of meat. Original.

Learning PySpark

Author: Tomasz Drabas,Denny Lee

Publisher: Packt Publishing Ltd

ISBN: 1786466252

Category: Computers

Page: 274

View: 6255

Build data-intensive applications locally and deploy at scale using the combined powers of Python and Spark 2.0 About This Book Learn why and how you can efficiently use Python to process data and build machine learning models in Apache Spark 2.0 Develop and deploy efficient, scalable real-time Spark solutions Take your understanding of using Spark with Python to the next level with this jump start guide Who This Book Is For If you are a Python developer who wants to learn about the Apache Spark 2.0 ecosystem, this book is for you. A firm understanding of Python is expected to get the best out of the book. Familiarity with Spark would be useful, but is not mandatory. What You Will Learn Learn about Apache Spark and the Spark 2.0 architecture Build and interact with Spark DataFrames using Spark SQL Learn how to solve graph and deep learning problems using GraphFrames and TensorFrames respectively Read, transform, and understand data and use it to train machine learning models Build machine learning models with MLlib and ML Learn how to submit your applications programmatically using spark-submit Deploy locally built applications to a cluster In Detail Apache Spark is an open source framework for efficient cluster computing with a strong interface for data parallelism and fault tolerance. This book will show you how to leverage the power of Python and put it to use in the Spark ecosystem. You will start by getting a firm understanding of the Spark 2.0 architecture and how to set up a Python environment for Spark. You will get familiar with the modules available in PySpark. You will learn how to abstract data with RDDs and DataFrames and understand the streaming capabilities of PySpark. Also, you will get a thorough overview of machine learning capabilities of PySpark using ML and MLlib, graph processing using GraphFrames, and polyglot persistence using Blaze. Finally, you will learn how to deploy your applications to the cloud using the spark-submit command. By the end of this book, you will have established a firm understanding of the Spark Python API and how it can be used to build data-intensive applications. Style and approach This book takes a very comprehensive, step-by-step approach so you understand how the Spark ecosystem can be used with Python to develop efficient, scalable solutions. Every chapter is standalone and written in a very easy-to-understand manner, with a focus on both the hows and the whys of each concept.

Find eBook