This textbook provides an introduction to the free software Python and its use for statistical data analysis. It covers common statistical tests for continuous, discrete and categorical data, as well as linear regression analysis and topics from survival analysis and Bayesian statistics. Working code and data for Python solutions for each test, together with easy-to-follow Python examples, can be reproduced by the reader and reinforce their immediate understanding of the topic. With recent advances in the Python ecosystem, Python has become a popular language for scientific computing, offering a powerful environment for statistical data analysis and an interesting alternative to R. The book is intended for master and PhD students, mainly from the life and medical sciences, with a basic knowledge of statistics. As it also provides some statistics background, the book can be used by anyone who wants to perform a statistical data analysis.
Introduction to statistical modeling and probabilistic programming using PyMC3 and ArviZ, 2nd Edition
Author: Osvaldo Martin
Publisher: Packt Publishing Ltd
Bayesian modeling with PyMC3 and exploratory analysis of Bayesian models with ArviZ Key Features A step-by-step guide to conduct Bayesian data analyses using PyMC3 and ArviZ A modern, practical and computational approach to Bayesian statistical modeling A tutorial for Bayesian analysis and best practices with the help of sample problems and practice exercises. Book Description The second edition of Bayesian Analysis with Python is an introduction to the main concepts of applied Bayesian inference and its practical implementation in Python using PyMC3, a state-of-the-art probabilistic programming library, and ArviZ, a new library for exploratory analysis of Bayesian models. The main concepts of Bayesian statistics are covered using a practical and computational approach. Synthetic and real data sets are used to introduce several types of models, such as generalized linear models for regression and classification, mixture models, hierarchical models, and Gaussian processes, among others. By the end of the book, you will have a working knowledge of probabilistic modeling and you will be able to design and implement Bayesian models for your own data science problems. After reading the book you will be better prepared to delve into more advanced material or specialized statistical modeling if you need to. What you will learn Build probabilistic models using the Python library PyMC3 Analyze probabilistic models with the help of ArviZ Acquire the skills required to sanity check models and modify them if necessary Understand the advantages and caveats of hierarchical models Find out how different models can be used to answer different data analysis questions Compare models and choose between alternative ones Discover how different models are unified from a probabilistic perspective Think probabilistically and benefit from the flexibility of the Bayesian framework Who this book is for If you are a student, data scientist, researcher, or a developer looking to get started with Bayesian data analysis and probabilistic programming, this book is for you. The book is introductory so no previous statistical knowledge is required, although some experience in using Python and NumPy is expected.
A Python Approach to Concepts, Techniques and Applications
Author: Laura Igual
This accessible and classroom-tested textbook/reference presents an introduction to the fundamentals of the emerging and interdisciplinary field of data science. The coverage spans key concepts adopted from statistics and machine learning, useful techniques for graph analysis and parallel programming, and the practical application of data science for such tasks as building recommender systems or performing sentiment analysis. Topics and features: provides numerous practical case studies using real-world data throughout the book; supports understanding through hands-on experience of solving data science problems using Python; describes techniques and tools for statistical analysis, machine learning, graph analysis, and parallel programming; reviews a range of applications of data science, including recommender systems and sentiment analysis of text data; provides supplementary code resources and data at an associated website.
This book introduces Python programming language and fundamental concepts in algorithms and computing. Its target audience includes students and engineers with little or no background in programming, who need to master a practical programming language and learn the basic thinking in computer science/programming. The main contents come from lecture notes for engineering students from all disciplines, and has received high ratings. Its materials and ordering have been adjusted repeatedly according to classroom reception. Compared to alternative textbooks in the market, this book introduces the underlying Python implementation of number, string, list, tuple, dict, function, class, instance and module objects in a consistent and easy-to-understand way, making assignment, function definition, function call, mutability and binding environments understandable inside-out. By giving the abstraction of implementation mechanisms, this book builds a solid understanding of the Python programming language.
This book is suitable for use in a university-level first course in computing (CS1), as well as the increasingly popular course known as CS0. It is difficult for many students to master basic concepts in computer science and programming. A large portion of the confusion can be blamed on the complexity of the tools and materials that are traditionally used to teach CS1 and CS2. This textbook was written with a single overarching goal: to present the core concepts of computer science as simply as possible without being simplistic.
Master predictive analytics, from start to finish Start with strategy and management Master methods and build models Transform your models into highly-effective code—in both Python and R This one-of-a-kind book will help you use predictive analytics, Python, and R to solve real business problems and drive real competitive advantage. You’ll master predictive analytics through realistic case studies, intuitive data visualizations, and up-to-date code for both Python and R—not complex math. Step by step, you’ll walk through defining problems, identifying data, crafting and optimizing models, writing effective Python and R code, interpreting results, and more. Each chapter focuses on one of today’s key applications for predictive analytics, delivering skills and knowledge to put models to work—and maximize their value. Thomas W. Miller, leader of Northwestern University’s pioneering program in predictive analytics, addresses everything you need to succeed: strategy and management, methods and models, and technology and code. If you’re new to predictive analytics, you’ll gain a strong foundation for achieving accurate, actionable results. If you’re already working in the field, you’ll master powerful new skills. If you’re familiar with either Python or R, you’ll discover how these languages complement each other, enabling you to do even more. All data sets, extensive Python and R code, and additional examples available for download at http://www.ftpress.com/miller/ Python and R offer immense power in predictive analytics, data science, and big data. This book will help you leverage that power to solve real business problems, and drive real competitive advantage. Thomas W. Miller’s unique balanced approach combines business context and quantitative tools, illuminating each technique with carefully explained code for the latest versions of Python and R. If you’re new to predictive analytics, Miller gives you a strong foundation for achieving accurate, actionable results. If you’re already a modeler, programmer, or manager, you’ll learn crucial skills you don’t already have. Using Python and R, Miller addresses multiple business challenges, including segmentation, brand positioning, product choice modeling, pricing research, finance, sports, text analytics, sentiment analysis, and social network analysis. He illuminates the use of cross-sectional data, time series, spatial, and spatio-temporal data. You’ll learn why each problem matters, what data are relevant, and how to explore the data you’ve identified. Miller guides you through conceptually modeling each data set with words and figures; and then modeling it again with realistic code that delivers actionable insights. You’ll walk through model construction, explanatory variable subset selection, and validation, mastering best practices for improving out-of-sample predictive performance. Miller employs data visualization and statistical graphics to help you explore data, present models, and evaluate performance. Appendices include five complete case studies, and a detailed primer on modern data science methods. Use Python and R to gain powerful, actionable, profitable insights about: Advertising and promotion Consumer preference and choice Market baskets and related purchases Economic forecasting Operations management Unstructured text and language Customer sentiment Brand and price Sports team performance And much more
This book introduces basic computing skills designed for industry professionals without a strong computer science background. Written in an easily accessible manner, and accompanied by a user-friendly website, it serves as a self-study guide to survey data science and data engineering for those who aspire to start a computing career, or expand on their current roles, in areas such as applied statistics, big data, machine learning, data mining, and informatics. The authors draw from their combined experience working at software and social network companies, on big data products at several major online retailers, as well as their experience building big data systems for an AI startup. Spanning from the basic inner workings of a computer to advanced data manipulation techniques, this book opens doors for readers to quickly explore and enhance their computing knowledge. Computing with Data comprises a wide range of computational topics essential for data scientists, analysts, and engineers, providing them with the necessary tools to be successful in any role that involves computing with data. The introduction is self-contained, and chapters progress from basic hardware concepts to operating systems, programming languages, graphing and processing data, testing and programming tools, big data frameworks, and cloud computing. The book is fashioned with several audiences in mind. Readers without a strong educational background in CS--or those who need a refresher--will find the chapters on hardware, operating systems, and programming languages particularly useful. Readers with a strong educational background in CS, but without significant industry background, will find the following chapters especially beneficial: learning R, testing, programming, visualizing and processing data in Python and R, system design for big data, data stores, and software craftsmanship.
Scientific Computing and Data Science Applications with Numpy, SciPy and Matplotlib
Author: Robert Johansson
Leverage the numerical and mathematical modules in Python and its standard library as well as popular open source numerical Python packages like NumPy, SciPy, FiPy, matplotlib and more. This fully revised edition, updated with the latest details of each package and changes to Jupyter projects, demonstrates how to numerically compute solutions and mathematically model applications in big data, cloud computing, financial engineering, business management and more. Numerical Python, Second Edition, presents many brand-new case study examples of applications in data science and statistics using Python, along with extensions to many previous examples. Each of these demonstrates the power of Python for rapid development and exploratory computing due to its simple and high-level syntax and multiple options for data analysis. After reading this book, readers will be familiar with many computing techniques including array-based and symbolic computing, visualization and numerical file I/O, equation solving, optimization, interpolation and integration, and domain-specific computational problems, such as differential equation solving, data analysis, statistical modeling and machine learning. What You'll Learn Work with vectors and matrices using NumPy Plot and visualize data with Matplotlib Perform data analysis tasks with Pandas and SciPy Review statistical modeling and machine learning with statsmodels and scikit-learn Optimize Python code using Numba and Cython Who This Book Is For Developers who want to understand how to use Python and its related ecosystem for numerical computing.