A LOF K-Means Clustering on Hotspot Data

K-Means is the most popular of clustering method, but its drawback is sensitivity to outliers. This paper discusses the addition of the outlier removal method to the K-Means method to improve the performance of clustering. The outlier removal method was added to the Local Outlier Factor (LOF). LOF is the representative outlier’s detection algorithm based on density. In this research, the method is called LOF K-Means . The first applying clustering by using the K-Means method on hotspot data and then finding outliers using the LOF method. The object detected outliers are then removed. Then new centroid for each group is obtained using the K-Means method again . This dataset was taken from the FIRM are provided by the National Aeronautics and Space Administration (NASA). Clustering was done by varying the number of clusters ( k = 10, 15, 20, 25, 30, 35, 40, 45 and 50) with cluster optimal is k = 20. The result based on the value of Sum of Squared Error (SSE) shown the LOF K-Means method was better than the K-Means method.

2. The distance between each data point xj and centroid was calculated. In this paper, the distance was calculated using Euclidean distance. Equation of Euclidean distance between xj and centroid cj, based on equation (1).
 Set data point into the centroid, whose distance of data point with centroid is the nearest of all centroids. 4. Recalculate the centroid k position if all the objects are placed. 5. Repeat steps 2 and 3 until the centroid k position does not change. Output: A set of k cluster

C. LOF (Local Outlier Factor)
Flowchart determines the LOF value shown in Fig.1. LOF is comparing the local density of an object's environment with the neighboring local density based on equation (2), and (3). An object that has LOF >> 1 is called outlier. While, if an object has LOF << 1, the object is not an outlier. A high LOF value indicates that the object has a low density of its environment [11]. The LOF of p(xi, yi) is defined as [12] [15]: where, ( ) variable is local reachability density of an object p, ℎ − ( , ) variable is reachability distance of an object p with object o, and − ( ) variable a number of neighbors p whose distance from p is not greater than k-distance.   2 illustrates the distance range with k = 4. An object p is far from o, exemplified by 2 , the distance between all is the original distance. But, if they are "close enough", the case within the figure is 1 , the original distance is supplanted by kdistance o. The statistical fluctuations of d (p,o) for all the p that are near to the variable o can be significantly reduced, that's reason for that. The parameter k can control strength of this smoothing [15]. ℎ − ( 1 , ) and ℎ − ( 2 , ), for k=4 [15] Where,

D. LOF K-Means
LOF K-Means is the addition of the LOF method to eliminate outliers in the K-Means clustering method. LOF K-Means description to determine the centroid of the hotspot data is shown in Fig. 3. Hotspot data is initially grouped by the K-Means method. The next step is to detect outliers for each group resulting from clustering with LOF. The object discovered outliers are then removed. Then new centroid for each group is obtained using the K-Means method again. Overall the system for clustering of hotspot data using LOF K-Means is shown in Fig.4.

III. RESULT AND DISCUSSION
The centroid points of the hotspot data clustering using LOF K-Means results for k=10, and the number of outliers from each cluster for k=10 also shown in the table I. The results of the Sum of Squared Error (SSE) of both clustering methods (K-Means and LOF K-Means) are shown in Fig.5. Clustering was done by varying the number of clusters (k = 10, 15, 20, 25, 30, 35, 40, 45 and 50). From Figure 3, both methods have the same pattern. The optimal cluster in both methods is shown in the same cluster, cluster k = 20. This is shown from the biggest decrease in SSE value in cluster 20. The SSE value of LOF K-Means for each number of clusters is lower than the SSE value of K-Means in Fig.5. This shows the LOF K-Means method is better than the K-Means method. The average SSE value of LOF K-Means is 33285,56, while the average SSE value of K-Means is 37469,22. The best SSE value of LOF K-Means is 25147 at k = 40.

IV. CONCLUSION
This paper presented the clustering method for clustering hotspot data. Clustering was done by varying the number of cluster k = 10, 15, 20, 25, 30, 35, 40, 45 and 50. Clustering method: K-Means and LOF K-Means were evaluated for their SSE values. The evaluation results have shown that LOF K-Means was better than K-Means. Further studies are needed to be related to the outlier removal method other than LOF to be combined with the K-Means method, to obtain a better clustering method.