→ Your Approach Should be to check Whether the Features in the dataset is Following a Normal (Symmetric) Distribution Curve or Skewed.
if The Dataset is Skewed then we will replace it with the Median Value, because the Median is not affected by the Outliers.
if The Dataset is Following a Normal (Symmetric) Distribution then we can replace it with any of the terms (i.e. Mean, Median, Mode).
→ Central Mode of Tendency is a Single term that represents the whole data. The most common Central Mode of Tendencies are Mean, Median, Mode.
→ The central limit theorem states that if you have a population with mean μ and standard deviation σ and take sufficiently large random samples from the population with replacement, then the distribution of the sample means will be approximately normally distributed.
A type 1 error is also known as a false positive and occurs when a researcher incorrectly rejects a true null hypothesis.
A type II error is also known as a false negative and occurs when a researcher fails to reject a null hypothesis which is really false.
→ The process of using data analysis to deduce the properties of an underlying distribution of probability. Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates.
→ 68 Percentile of the Distribution is Covered between μ-σ and μ+σ.
→ 95 Percentile of the Distribution is Covered between μ-2σ and μ+2σ.
→ 99.7 Percentile of the Distribution is Covered between μ-3σ and μ+3σ.
2. This Entire Article Was Written by Open AI’s GPT2
3. Learning To Classify Images Without Labels
4. Becoming a Data Scientist, Data Analyst, Financial Analyst and Research Analyst
→ The measure of Variability describes the spread of the Dispersion of the Dataset.
→ The measure of Central Tendency that divides a group of data into 4 SubGroups.
→ The difference between the first and the third Quartile is known as the Inter-Quartile Range. (Q3-Q1)
→ Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms.
→ Exploratory data analysis is an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task.
→ the principal component analysis is a method to project data in a higher-dimensional space into a lower-dimensional space by maximizing the variance of each dimension.
→ Removing Irrelevant Features.
→ Data Augmentation.
→ the Euclidean distance or Euclidean metric is the “ordinary” straight-line distance between two points in Euclidean space.
To be continued…….. 🙂