Clustering Data Streams Based on Shared Density Between Micro-Clusters

As more and more applications produce streaming data, clustering data streams has become an important technique for data and knowledge engineering. A typical approach is to summarize the data stream in real-time with an online process into a large number of so called micro-. Micro- represent local density estimates by aggregating the information of many data points in a defined area. On demand, a (modified) conventional clustering algorithm is used in a second offline step to recluster the micro- into larger final . For reclustering, the centers of the micro- are used as pseudo points with the density estimates used as their weights. However, information about density in the area between micro- is not preserved in the online process and reclustering is based on possibly inaccurate assumptions about the distribution of data within and between micro- (e.g., uniform or Gaussian). This paper describes DBSTREAM, the first micro-cluster-based online clustering component that explicitly captures the density between micro- via a shared density graph. The density information in this graph is then exploited for reclustering based on actual density between adjacent micro-. We discuss the space and time complexity of maintaining the shared density graph. Experiments on a wide range of synthetic and real data sets highlight that using shared density improves clustering quality over other popular data stream clustering methods which require the creation of a larger number of smaller micro- to achieve comparable results.

System Architecture

Project Overview Fetch medical (thyroid) data and cluster the data with respect the disease and later it recommends the associated doctors and prediction of best suitable doctor of the selected disease. Processing is happening over 5000 records.

System Requirement
Hardware Requirement Processor - Dual Core
Speed - 1.1 G Hz
RAM - 512 MB (min)
Hard - 20 GB
Key Board - Standard Windows Keyboard
Mouse - Two or Three Button Mouse

Software Requirement
Operating System : Windows xp,7,8
Front End : Java 7
Technology : Swings, Core java.
IDE : Netbeans.

Sample Code

String disName = Disease.getSelectedItem ().toString();
ArrayList indexes= new ArrayList();
for(int i=0;i {
if( decease.get(i).toString().equals(disName))

The above code show how to select the different disease names.