Cluster analysis
Different results of cluster analysis on an artificial dataset (called "Mouse")
Cluster analysis or clustering is a way of comparing data by splitting it into groups of similar data points. These groups are called clusters .
There are many algorithms to put data into clusters. Clustering algorithms can use different ways of measuring similarity between data points.[ 1] As a result, different clustering algorithms can get different clusters on the same data.
References
↑ Estivill-Castro, Vladimir (June 2002). "Why so many clustering algorithms: a position paper". ACM SIGKDD Explorations Newsletter . 4 (1): 65– 75. doi :10.1145/568574.568575 .
Further reading
Ezugwu, Absalom E.; Ikotun, Abiodun M.; Oyelade, Olaide O.; Abualigah, Laith; Agushaka, Jeffery O.; Eke, Christopher I.; Akinyelu, Andronicus A. (1 April 2022). "A comprehensive survey of clustering algorithms: State-of-the-art machine learning applications, taxonomy, challenges, and future research prospects". Engineering Applications of Artificial Intelligence . 110 : 104743. doi :10.1016/j.engappai.2022.104743 .
Regression
MSE
MAE
sMAPE
MAPE
MASE
MSPE
RMS
RMSE/RMSD
R2
MDA
MAD
Classification
F-score
P4
Accuracy
Precision
Recall
Kappa
MCC
AUC
ROC
Sensitivity and specificity
Logarithmic Loss
Clustering
Silhouette
Calinski-Harabasz
Davies-Bouldin
Dunn index
Hopkins statistic
Jaccard index
Rand index
Similarity measure
SMC
SimHash
Ranking Computer Vision NLP Deep Learning Related Metrics Recommender system
Coverage
Intra-list Similarity
Similarity
Continuous data
Count data Summary tables Dependence Graphics
Bar chart
Biplot
Box plot
Control chart
Correlogram
Fan chart
Forest plot
Histogram
Pie chart
Q–Q plot
Run chart
Scatter plot
Stem-and-leaf display
Radar chart
Violin plot
Study design
Population
Statistic
Effect size
Statistical power
Optimal design
Sample size determination
Replication
Missing data
Survey methodology Controlled experiments Adaptive Designs
Adaptive clinical trial
Up-and-Down Designs
Stochastic approximation
Observational Studies
Cross-sectional study
Cohort study
Natural experiment
Quasi-experiment
Statistical theory Frequentist inference
Point estimation
Estimating equations
Unbiased estimators
Mean-unbiased minimum-variance
Rao–Blackwellization
Lehmann–Scheffé theorem
Median unbiased
Plug-in
Interval estimation Testing hypotheses
1- & 2-tails
Power
Uniformly most powerful test
Permutation test
Multiple comparisons
Parametric tests
Likelihood-ratio
Score/Lagrange multiplier
Wald
Specific tests
Goodness of fit Rank statistics
Sign
Signed rank (Wilcoxon)
Rank sum (Mann–Whitney)
Nonparametric anova
1-way (Kruskal–Wallis)
2-way (Friedman)
Ordered alternative (Jonckheere–Terpstra)
Bayesian inference
Correlation Regression analysis
Errors and residuals
Regression validation
Mixed effects models
Simultaneous equations models
Multivariate adaptive regression splines (MARS)
Linear regression Non-standard predictors
Nonlinear regression
Nonparametric
Semiparametric
Isotonic
Robust
Heteroscedasticity
Homoscedasticity
Generalized linear model Partition of variance
Analysis of variance (ANOVA, anova)
Analysis of covariance
Multivariate ANOVA
Degrees of freedom
Categorical / Multivariate / Time-series / Survival analysis
Categorical
Cohen's kappa
Contingency table
Graphical model
Log-linear model
McNemar's test
Cochran-Mantel-Haenszel statistics
Multivariate
Regression
Manova
Principal components
Canonical correlation
Discriminant analysis
Cluster analysis
Classification
Structural equation model
Multivariate distributions
Time-series
General
Decomposition
Trend
Stationarity
Seasonal adjustment
Exponential smoothing
Cointegration
Structural break
Granger causality
Specific tests
Dickey–Fuller
Johansen
Q-statistic (Ljung–Box)
Durbin–Watson
Breusch–Godfrey
Time domain
Autocorrelation (ACF)
Cross-correlation (XCF)
ARMA model
ARIMA model (Box–Jenkins)
Autoregressive conditional heteroskedasticity (ARCH)
Vector autoregression (VAR)
Frequency domain
Survival
Survival function
Kaplan–Meier estimator (product limit)
Proportional hazards models
Accelerated failure time (AFT) model
First hitting time
Hazard function Test
Applications
Biostatistics Engineering statistics
Chemometrics
Methods engineering
Probabilistic design
Process / quality control
Reliability
System identification
Social statistics Spatial statistics
Cartography
Environmental statistics
Geographic information system
Geostatistics
Kriging
The article is a derivative under the Creative Commons Attribution-ShareAlike License .
A link to the original article can be found here and attribution parties here
By using this site, you agree to the Terms of Use . Gpedia ® is a registered trademark of the Cyberajah Pty Ltd