Pandas for Everyone

Python Data Analysis

By Daniel Chen in Python Pandas Education

December 1, 2017

Pandas for Everyone: Python Data Analysis Logo

Pandas for Everyone: Python Data Analysis is an introductory book that teaches Python from a data perspective by using the Pandas data processing library.

  • Pandas for Everyone: Python Data Analysis, First Edition
  • by Daniel Y. Chen
  • Released December 2017
  • Publisher(s): Addison-Wesley Professional
  • ISBN: 9780134547046

Book Description

The Hands-On, Example-Rich Introduction to Pandas Data Analysis in Python.

Today, analysts must manage data characterized by extraordinary variety, velocity, and volume. Using the open source Pandas library, you can use Python to rapidly automate and perform virtually any data analysis task, no matter how large or complex. Pandas can help you ensure the veracity of your data, visualize it for effective decision-making, and reliably reproduce analyses across multiple datasets.

Pandas for Everyone brings together practical knowledge and insight for solving real problems with Pandas, even if you’re new to Python data analysis. Daniel Y. Chen introduces key concepts through simple but practical examples, incrementally building on them to solve more difficult, real-world problems.

Chen gives you a jumpstart on using Pandas with a realistic dataset and covers combining datasets, handling missing data, and structuring datasets for easier analysis and visualization. He demonstrates powerful data cleaning techniques, from basic string manipulation to applying functions simultaneously across dataframes.

Once your data is ready, Chen guides you through fitting models for prediction, clustering, inference, and exploration. He provides tips on performance and scalability, and introduces you to the wider Python data analysis ecosystem.

  • Work with DataFrames and Series, and import or export data
  • Create plots with matplotlib, seaborn, and pandas
  • Combine datasets and handle missing data
  • Reshape, tidy, and clean datasets so they’re easier to work with
  • Convert data types and manipulate text strings
  • Apply functions to scale data manipulations
  • Aggregate, transform, and filter large datasets with groupby
  • Leverage Pandas’ advanced date and time capabilities
  • Fit linear models using statsmodels and scikit-learn libraries
  • Use generalized linear modeling to fit models with different response variables
  • Compare multiple models to select the “best”
  • Regularize to overcome overfitting and improve performance
  • Use clustering in unsupervised machine learning

Table of Contents

Cover Page
About This E-Book
Title Page
Copyright Page
Dedication Page
About the Author
I Introduction
    1 Pandas DataFrame Basics
        1.1 Introduction
        1.2 Loading Your First Data Set
        1.3 Looking at Columns, Rows, and Cells
        1.4 Grouped and Aggregated Calculations
        1.5 Basic Plot
        1.6 Conclusion
    2 Pandas Data Structures
        2.1 Introduction
        2.2 Creating Your Own Data
        2.3 The Series
        2.4 The DataFrame
        2.5 Making Changes to Series and DataFrames
        2.6 Exporting and Importing Data
        2.7 Conclusion
    3 Introduction to Plotting
        3.1 Introduction
        3.2 Matplotlib
        3.3 Statistical Graphics Using matplotlib
        3.4 Seaborn
        3.5 Pandas Objects
        3.6 Seaborn Themes and Styles
        3.7 Conclusion
II Data Manipulation
    4 Data Assembly
        4.1 Introduction
        4.2 Tidy Data
        4.3 Concatenation
        4.4 Merging Multiple Data Sets
        4.5 Conclusion
    5 Missing Data
        5.1 Introduction
        5.2 What Is a NaN Value?
        5.3 Where Do Missing Values Come From?
        5.4 Working With Missing Data
        5.5 Conclusion
    6 Tidy Data
        6.1 Introduction
        6.2 Columns Contain Values, Not Variables
        6.3 Columns Contain Multiple Variables
        6.4 Variables in Both Rows and Columns
        6.5 Multiple Observational Units in a Table (Normalization)
        6.6 Observational Units Across Multiple Tables
        6.7 Conclusion
III Data Munging
    7 Data Types
        7.1 Introduction
        7.2 Data Types
        7.3 Converting Types
        7.4 Categorical Data
        7.5 Conclusion
    8 Strings and Text Data
        8.1 Introduction
        8.2 Strings
        8.3 String Methods
        8.4 More String Methods
        8.5 String Formatting
        8.6 Regular Expressions (RegEx)
        8.7 The regex Library
        8.8 Conclusion
    9 Apply
        9.1 Introduction
        9.2 Functions
        9.3 Apply (Basics)
        9.4 Apply (More Advanced)
        9.5 Vectorized Functions
        9.6 Lambda Functions
        9.7 Conclusion
    10 Groupby Operations: Split–Apply–Combine
        10.1 Introduction
        10.2 Aggregate
        10.3 Transform
        10.4 Filter
        10.5 The pandas.core.groupby .DataFrameGroupBy Object
        10.6 Working With a MultiIndex
        10.7 Conclusion
    11 The datetime Data Type
        11.1 Introduction
        11.2 Python’s datetime Object
        11.3 Converting to datetime
        11.4 Loading Data That Include Dates
        11.5 Extracting Date Components
        11.6 Date Calculations and Timedeltas
        11.7 Datetime Methods
        11.8 Getting Stock Data
        11.9 Subsetting Data Based on Dates
        11.10 Date Ranges
        11.11 Shifting Values
        11.12 Resampling
        11.13 Time Zones
        11.14 Conclusion
IV Data Modeling
    12 Linear Models
        12.1 Introduction
        12.2 Simple Linear Regression
        12.3 Multiple Regression
        12.4 Keeping Index Labels From sklearn
        12.5 Conclusion
    13 Generalized Linear Models
        13.1 Introduction
        13.2 Logistic Regression
        13.3 Poisson Regression
        13.4 More Generalized Linear Models
        13.5 Survival Analysis
        13.6 Conclusion
    14 Model Diagnostics
        14.1 Introduction
        14.2 Residuals
        14.3 Comparing Multiple Models
        14.4 k-Fold Cross-Validation
        14.5 Conclusion
    15 Regularization
        15.1 Introduction
        15.2 Why Regularize?
        15.3 LASSO Regression
        15.4 Ridge Regression
        15.5 Elastic Net
        15.6 Cross-Validation
        15.7 Conclusion
    16 Clustering
        16.1 Introduction
        16.2 k-Means
        16.3 Hierarchical Clustering
        16.4 Conclusion
V Conclusion
    17 Life Outside of Pandas
        17.1 The (Scientific) Computing Stack
        17.2 Performance
        17.3 Going Bigger and Faster
    18 Toward a Self-Directed Learner
        18.1 It’s Dangerous to Go Alone!
        18.2 Local Meetups
        18.3 Conferences
        18.4 The Internet
        18.5 Podcasts
        18.6 Conclusion
VI Appendixes
    A Installation
        A.1 Installing Anaconda
        A.2 Uninstall Anaconda
    B Command Line
        B.1 Installation
        B.2 Basics
    C Project Templates
    D Using Python
        D.1 Command Line and Text Editor
        D.2 Python and IPython
        D.3 Jupyter
        D.4 Integrated Development Environments (IDEs)
    E Working Directories
    F Environments
    G Install Packages
        G.1 Updating Packages
    H Importing Libraries
    I Lists
    J Tuples
    K Dictionaries
    L Slicing Values
    M Loops
    N Comprehensions
    O Functions
        O.1 Default Parameters
        O.2 Arbitrary Parameters
    P Ranges and Generators
    Q Multiple Assignment
    R numpy ndarray
    S Classes
    T Odo: The Shapeshifter
Code Snippets
Posted on:
December 1, 2017
5 minute read, 907 words
Python Pandas Education
See Also: