Traditional Culture Encyclopedia - Traditional customs - DBSCAN vs kmeans, OPTICS difference?
DBSCAN vs kmeans, OPTICS difference?
1)Both K-means and DBSCAN are divisional clustering algorithms that assign each object to a single cluster, but K-means generally clusters all the objects, while DBSCAN discards the objects that it recognizes as noise.
2) K-mean uses the prototype-based concept of clusters, while DBSCAN uses the density-based concept.
3)K-mean has difficulty with non-spherical clusters and clusters of different sizes.DBSCAN can handle clusters of different sizes or shapes and is less affected by noise and outliers. Both algorithms perform poorly when clusters have very different densities.
4)K-means can only be used with data that has a well-defined center of mass (such as the mean or median).DBSCAN requires that the density definition (based on the traditional Euclidean notion of density) be meaningful for the data.
5) K-means can be used for sparse, high-dimensional data, such as document data.DBSCAN typically performs poorly on such data because for high-dimensional data, traditional Euclidean density definitions do not handle them well.
6)The original versions of both K-means and DBSCAN were designed for Euclidean data, but they have been extended to handle other types of data.
7)The basic K-means algorithm is equivalent to a statistical clustering method (mixed model) that assumes that all clusters come from spherical Gaussian distributions with different means but the same covariance matrix.DBSCAN makes no assumptions about the distribution of the data.
8)K-mean DBSCAN and both look for clusters that use all attributes, i.e., neither of them look for clusters that may involve only a subset of attributes.
9)K-mean finds clusters that are not clearly separated, even if the clusters overlap, but DBSCAN merges clusters that overlap.
10)The time complexity of the K-means algorithm is O(m), while the time complexity of DBSCAN is O(m^2), except for special cases such as low-dimensional Euclidean data.
11) DBSCAN produces the same result for multiple runs, whereas K-means usually uses random initialization of the center of mass and does not produce the same result.
12)DBSCAN automatically determines the number of clusters, which needs to be specified as a parameter for K-means. However, DBSCAN must specify two other parameters: Eps (neighborhood radius) and MinPts (minimum number of points).
13)K-mean clustering can be viewed as an optimization problem, i.e., minimizing the sum of the squares of the errors from each point to the nearest center of mass, and can be viewed as a special case of statistical clustering (a mixture of models).DBSCAN is not based on any formal model.
Difference between DBSCAN and OPTICS:
DBSCAN algorithm, there are two initial parameters E (neighborhood radius) and minPts (minimum number of points in E neighborhood) need to be manually set by the user as inputs, and the results of the clustering of the class clusters of the two parameters are very sensitive to the value of the two parameters, the different values will produce different clustering results, in fact, this is the same as most of other clustering algorithms. clustering algorithms that require initialization parameters.
In order to overcome this drawback of DBSCAN algorithm, the OPTICS algorithm (Ordering Points to identify the clustering structure) is proposed. OPTICS does not generate the resultant class clusters on the display, but rather generates an augmented cluster ordering for the clustering analysis (e.g., with the reach distance as the Instead, OPTICS generates a generalized cluster ordering for the cluster analysis (e.g., a plot of coordinates with reach distance on the vertical axis and the order of the sample point outputs on the horizontal axis), which represents the density-based clustering structure of each sample point. It contains information equivalent to the density-based clustering obtained from a wide range of parameter settings; in other words, the clustering results of the DBSCAN algorithm based on any parameter E and minPts can be obtained from this ordering.
- Related articles
- Why are bats carved on buildings?
- Expected risk level of steady growth of banks in China.
- What preliminary work should be done to install solar street lamps?
- Making method and formula of rice cake
- Please introduce dress etiquette of different dynasties in the history of China.
- 24 solar terms health
- Illustration of the method of stretching and knotting trousers rope
- British festivals and their celebration methods
- Anhui Huoqiu is engaged, does the woman give the man money? And what do you call each other's parents after you get engaged? It belongs to Huoqiu, Anhui.
- How to evaluate children's indie game reviews?