Author : MEENACHISUNDARAM.M
Publisher : MEENACHI SUNDARAM
ISBN 13 :
Total Pages : 242 pages
Book Rating : 4./5 ( download)
Book Synopsis ADVANCED PYTHON WITH STATISTICAL CONCEPTS by : MEENACHISUNDARAM.M
Download or read book ADVANCED PYTHON WITH STATISTICAL CONCEPTS written by MEENACHISUNDARAM.M and published by MEENACHI SUNDARAM. This book was released on 2024-09-04 with total page 242 pages. Available in PDF, EPUB and Kindle. Book excerpt: PYTHON WITH DATA SCIENCE By M.Meenachi Sundaram TABLE OF CONTENTS PYTHON WITH DATA SCIENCE.. 13 CHAPTER 1: STATISTICS CONCEPTS.. 13 1. Population and sample. 13 2. Normal distribution. 14 3. Measures of central tendency. 15 4. Variance and standard deviation. 16 5. Covariance and correlation. 16 6. Central limit theorem.. 18 7. P-value. 19 8. Expected value of random variables. 21 9. Conditional probability. 23 10. Bayes’ theorem.. 24 IMPORTANT 5 STATISTICAL CONCEPTS FOR EVERY DATA SCIENTIST.. 25 1. Descriptive statistics. 25 2. Probability distributions. 28 3. Dimensionality reduction. 31 4. Under-sampling and Over-sampling. 32 5. Bayesian statistics. 33 PYTHON STATISTICS MODULE.. 34 Statistics Methods. 34 CHAPTER 2: PROBABILITY.. 35 Python, Random Numbers and Probability. 35 Random Numbers with Python. 36 Random Numbers Satisfying sum-to-one Condition. 40 Generating Random Strings or Passwords with Python. 41 Random Integer Numbers. 42 Random Choices with Python. 45 Random Samples with Python. 48 True Random Numbers. 50 Weighted Random Choices. 52 CHAPTER 3: STANDARD DEVIATION.. 58 Python statistics.stdev() Method. 58 Definition and Usage. 58 Syntax. 59 Parameter Values. 59 Parameter 59 Description. 59 Data. 59 Required. The data values to be used (can be any sequence, list or iterator) 59 Xbar. 59 Optional. The mean of the given data. If omitted (or set to None), the mean is automatically calculated 59 Technical Details. 59 Return Value: 59 CHAPTER 4: BIAS AND VARIANCE.. 60 What are Bias and Variance?. 60 Bias and Variance using Python. 60 CHAPTER 5: DISTANCE METRICS.. 63 Understanding Distance Metrics Used in Machine Learning. 63 We will study: 63 What Are Distance Metrics?. 63 Types of Distance Metrics in Machine Learning. 64 Euclidean Distance. 65 Formula for Euclidean Distance. 66 Manhattan Distance. 68 Formula for Manhattan Distance. 68 Minkowski Distance. 70 Formula for Minkowski Distance. 70 Hamming Distance. 72 Conclusion. 74 Points. 74 CHAPTER 6: OUTLIER ANALYSIS.. 76 Outlier detection is the process of identifying data points that have extreme values compared to the rest of the distribution. Learn three methods of outlier detection in Python. 76 What Is Outlier Detection?. 76 Benefits of Outlier Detection. 76 Methods for Outlier Detection in Python. 77 Prerequisite to Outlier Detection: Reading in Data. 77 Using Box Plots for Outlier Detection. 78 Using Isolation Forests for Outlier Detection. 81 Using OneClassSVM for Outlier Detection. 83 Mastering Outlier Detection. 84 Outlier 85 What are Outliers?. 86 When are outliers dangerous?. 87 Which statistics are affected by the outliers?. 90 When to drop or keep outliers?. 91 Table of Contents. 94 How to Treat Outliers?. 94 Trimming. 94 Capping. 94 Discretization. 94 How to Detect Outliers?. 96 For Normal Distributions. 96 For Skewed Distributions. 96 For Other Distributions. 96 How to Detect and Remove Outliners in Python. 97 Z-score Treatment 97 IQR Based Filtering. 99 Percentile Method. 102 Conclusion. 105 Frequently Asked Questions. 106 CHAPTER 7: MISSNG VALUE TREATMENTS.. 107 How to Handle Missing Data. 107 Why Fill in the Missing Data?. 107 How to Know If the Data Has Missing Values?. 109 Different Methods of Dealing with Missing Data. 111 1. Deleting the column with missing data. 111 2. Deleting the row with missing data. 112 3. Filling the Missing Values – Imputation. 114 4. Other imputation methods. 116 5. Imputation with an additional column. 116 6. Filling with a Regression Model 119 Conclusion. 122 Frequently Asked Questions. 122 Pandas – Replace NaN Values with Zero in a Column. 123 1. Example of Replace NaN with Zero. 123 2. Replace NaN Values with Zero on pandas DataFrame. 124 3. Replace NaN Values with Zero on a Single or Multiple Columns. 125 4. Replace NaN Values with Zeroes Using replace() 126 5. Using DataFrame.replace() on All Columns. 126 6. Complete Example For Replace NaN Values with Zeroes in a Column. 128 CHAPTER 8: CORRELATION.. 130 NumPy, SciPy, and pandas: Correlation With Python. 130 Correlation. 130 Example: NumPy Correlation Calculation. 133 Example: SciPy Correlation Calculation. 135 Example: pandas Correlation Calculation. 136 This page and Next page just for reference. 139 Linear Correlation. 139 Pearson Correlation Coefficient 139 Linear Regression: SciPy Implementation. 142 Pearson Correlation: NumPy and SciPy Implementation. 145 Pearson Correlation: pandas Implementation. 147 Rank Correlation. 152 Rank: SciPy Implementation. 153 Rank Correlation: NumPy and SciPy Implementation. 155 Rank Correlation: pandas Implementation. 157 Visualization of Correlation. 158 X-Y Plots with a Regression Line. 159 Heatmaps of Correlation Matrices. 161 Conclusion. 162 CHAPTER 9: ERROR METRICS (ERROR MEASURES) 164 Mean Squared Error 164 Mean Absolute Error 164 Mean Absolute Percent Error 164 Measuring Regression Errors with Python. 165 Measuring Regression Errors. 165 Six Error Metrics for Measuring Regression Errors. 166 Mean Absolute Error (MAE) 167 Mean Absolute Percentage Error (MAPE) 168 Mean Squared Error (MSE) 168 Median Absolute Error (MedAE) 169 Root Mean Squared Error (RMSE) 170 Median Absolute Percentage Error (MdAPE) 170 Implementing Regression Error Metrics in Python: Time Series Prediction. 171 Step #1 Generate Synthetic Time Series Data. 171 Step #2 Preparing the Data. 172 Step #3 Training a Time Series Regression Model 174 Step #4 Making Test Predictions. 174 Step #5 Calculating the Regression Error Metrics: Implementation and Evaluation. 175 CHAPTER 10: REGRESSION.. 177 Linear Regression. 178 Logistic Regression. 178 Polynomial Regression. 179 Ridge Regression. 179 Lasso Regression. 179 Regression Applications. 180 Difference between Regression and Classification in data mining. 180 Regression. 181 CHAPTER 11: MACHINE LEARNING.. 183 Machine Learning vs. Deep Learning vs. Neural Networks. 183 Machine learning methods. 184 Supervised machine learning. 184 Unsupervised machine learning. 185 Semi-supervised learning. 185 Common machine learning algorithms. 186 Real-world machine learning use cases. 187 Data Structure for Machine Learning. 188 What is Data Structure?. 190 Types of Data Structure. 190 1. Linear Data structure: 191 2. Non-linear Data Structures. 193 Dynamic array data structure: 196 How is Data Structure used in Machine Learning?. 197 Conclusion. 197 SUPERVISED LEARNING.. 198 Supervised Machine Learning. 198 How Supervised Learning Works?. 198 Steps Involved in Supervised Learning: 200 Types of supervised Machine learning Algorithms: 200 1. Regression. 201 2. Classification. 201 Advantages of Supervised learning: 202 Disadvantages of supervised learning: 202 Linear Regression. 202 How does it Work?. 204 R for Relationship. 208 Predict Future Values. 210 Bad Fit?. 211 Logistic Regression. 215 How does it work?. 215 Probability. 217 Function Explained. 217 Results Explained. 219 How to Save a Machine Learning Model 219 Two Ways to Save a Model from scikit-learn: 219 UNSUPERVISED LEARNING.. 225 Unsupervised Machine Learning. 225 Why use Unsupervised Learning?. 227 Working of Unsupervised Learning. 227 Types of Unsupervised Learning Algorithm: 228 Unsupervised Learning algorithms: 229 Advantages of Unsupervised Learning. 229 Disadvantages of Unsupervised Learning. 229 Supervised vs. Unsupervised Learning. 230 Preparing Data for Unsupervised Learning. 231 Clustering. 232 Hierarchical Clustering. 234 Difference between K-Means and Hierarchical clustering. 236 t-SNE Clustering. 237 DBSCAN Clustering. 238 OTHER MACHINE LEARNING (ML) ALGORITHMS.. 240 ABOUT THE AUTHOR.. 241 PYTHON WITH DATA SCIENCE CHAPTER 1: STATISTICS CONCEPTS Data science is an interdisciplinary field. One of the building blocks of data science is statistics. Without a decent level of statistics knowledge, it would be highly difficult to understand or interpret the data. Statistics helps us explain the data. Statistics is used to infer results about a population based on a sample drawn from that population. Furthermore, machine learning and statistics have plenty of overlaps. Statistics concepts helps us to become a data scientist.