Mining big data requires a deep investment in people and time. How can you be sure you’re building the right models? With this hands-on book, you’ll learn a flexible toolset and methodology for building effective analytics applications with Hadoop. Using lightweight tools such as Python, Apache Pig, and the D3.js library, your team will create an agile environment for exploring data, starting with an example application to mine your own email inboxes. You’ll learn an iterative approach that enables you to quickly change the kind of analysis you’re doing, depending on what the data is telling you. All example code in this book is available as working Heroku apps. Create analytics applications by using the agile big data development methodology Build value from your data in a series of agile sprints, using the data-value stack Gain insight by using several data structures to extract multiple features from a single dataset Visualize data with charts, and expose different aspects through interactive reports Use historical data to predict the future, and translate predictions into action Get feedback from users after each sprint to keep your project on track
Building Full-Stack Data Analytics Applications with Spark
Author: Russell Jurney
Publisher: "O'Reilly Media, Inc."
Data science teams looking to turn research into useful analytics applications require not only the right tools, but also the right approach if they’re to succeed. With the revised second edition of this hands-on guide, up-and-coming data scientists will learn how to use the Agile Data Science development methodology to build data applications with Python, Apache Spark, Kafka, and other tools. Author Russell Jurney demonstrates how to compose a data platform for building, deploying, and refining analytics applications with Apache Kafka, MongoDB, ElasticSearch, d3.js, scikit-learn, and Apache Airflow. You’ll learn an iterative approach that lets you quickly change the kind of analysis you’re doing, depending on what the data is telling you. Publish data science work as a web application, and affect meaningful change in your organization. Build value from your data in a series of agile sprints, using the data-value pyramid Extract features for statistical models from a single dataset Visualize data with charts, and expose different aspects through interactive reports Use historical data to predict the future via classification and regression Translate predictions into actions Get feedback from users after each sprint to keep your project on track
This book demonstrates the use of a wide range of strategic engineering concepts, theories and applied case studies to improve the safety, security and sustainability of complex and large-scale engineering and computer systems. It first details the concepts of system design, life cycle, impact assessment and security to show how these ideas can be brought to bear on the modeling, analysis and design of information systems with a focused view on cloud-computing systems and big data analytics. This informative book is a valuable resource for graduate students, researchers and industry-based practitioners working in engineering, information and business systems as well as strategy.
This book has two main goals: to define data science through the work of data scientists and their results, namely data products, while simultaneously providing the reader with relevant lessons learned from applied data science projects at the intersection of academia and industry. As such, it is not a replacement for a classical textbook (i.e., it does not elaborate on fundamentals of methods and principles described elsewhere), but systematically highlights the connection between theory, on the one hand, and its application in specific use cases, on the other. With these goals in mind, the book is divided into three parts: Part I pays tribute to the interdisciplinary nature of data science and provides a common understanding of data science terminology for readers with different backgrounds. These six chapters are geared towards drawing a consistent picture of data science and were predominantly written by the editors themselves. Part II then broadens the spectrum by presenting views and insights from diverse authors – some from academia and some from industry, ranging from financial to health and from manufacturing to e-commerce. Each of these chapters describes a fundamental principle, method or tool in data science by analyzing specific use cases and drawing concrete conclusions from them. The case studies presented, and the methods and tools applied, represent the nuts and bolts of data science. Finally, Part III was again written from the perspective of the editors and summarizes the lessons learned that have been distilled from the case studies in Part II. The section can be viewed as a meta-study on data science across a broad range of domains, viewpoints and fields. Moreover, it provides answers to the question of what the mission-critical factors for success in different data science undertakings are. The book targets professionals as well as students of data science: first, practicing data scientists in industry and academia who want to broaden their s cope and expand their knowledge by drawing on the authors combined experience. Second, decision makers in businesses who face the challenge of creating or implementing a data-driven strategy and who want to learn from success stories spanning a range of industries. Third, students of data science who want to understand both the theoretical and practical aspects of data science, vetted by real-world case studies at the intersection of academia and industry.
Selected papers of the 20th AGILE conference on Geographic Information Science
Author: Arnold Bregt
This book contains the full research papers presented at the 20th AGILE Conference on Geographic Information Science, held in 2017 at Wageningen University & Research in Wageningen, the Netherlands. The selected contributions show trends in the domain of geographic information science directed to spatio-temporal perception and spatio-temporal analysis. For that reason the book is also of interest to professionals and researchers in fields outside geographic information science, in which the application of geoinformation could be instrumental in sparking societal innovation.
Selected papers of the 19th AGILE Conference on Geographic Information Science
Author: Tapani Sarjakoski
This book collects innovative research presented at the 19th Conference of the Association of Geographic Information Laboratories in Europe (AGILE) on Geographic Information Science, held in Helsinki, Finland in 2016.
Geographic Information Science as an Enabler of Smarter Cities and Communities
Author: Fernando Bacao
This is a book is a collection of articles that will be submitted as full papers to the AGILE annual international conference. These papers go through a rigorous review process and report original and unpublished fundamental scientific research. Those published cover significant research in the domain of geographic information science systems. This year the focus is on geographic information science as an enabler of smarter cities and communities, thus we expect contributions that help visualize the role and contribution of GI science in their development.
Enabling Game-Changing Outcomes in the Age of Exponential Data
Author: Anu Jain
Category: Business & Economics
This book looks at strategies to help companies become more intelligent, connected, and agile. It discusses how companies can define and measure high-impact outcomes and use effectively analytics technology to achieve them. It also looks at the technology needed to implement the analytics necessary to achieve high-impact outcomes—from both analytics tool and technical infrastructure perspective. Also discussed are ancillary, but critical, topics such as data security and governance that may not traditionally be a part of analytics discussions but are essential in helping companies maintain a secure environment for their analytics and access the quality data they need to gain critical insights and drive better decision-making.
Over the past 5 years, the concept of big data has matured, data science has grown exponentially, and data architecture has become a standard part of organizational decision-making. Throughout all this change, the basic principles that shape the architecture of data have remained the same. There remains a need for people to take a look at the "bigger picture" and to understand where their data fit into the grand scheme of things. Data Architecture: A Primer for the Data Scientist, Second Edition addresses the larger architectural picture of how big data fits within the existing information infrastructure or data warehousing systems. This is an essential topic not only for data scientists, analysts, and managers but also for researchers and engineers who increasingly need to deal with large and complex sets of data. Until data are gathered and can be placed into an existing framework or architecture, they cannot be used to their full potential. Drawing upon years of practical experience and using numerous examples and case studies from across various industries, the authors seek to explain this larger picture into which big data fits, giving data scientists the necessary context for how pieces of the puzzle should fit together. New case studies include expanded coverage of textual management and analytics New chapters on visualization and big data Discussion of new visualizations of the end-state architecture