Chemical plants maintain large data base to store past recording measurements of the sensors to do advanced data analyse. With the help of this data base plant operators and engineers interpret the meaning of the live trends in database.
To execute best technique for process monitoring, Data should be arranged and grouped together before training. Advanced strategies for process monitoring will take time to create organized and grouped data this is an obstacle to use this kind of techniques in process industries. Engineers can group the data according to the requirements and find the fault states in historical databases with the help of knowledge discovery and data mining techniques from computer science literature survey. This research shows that how both feature extraction and data clustering reveal the useful trends in chemical process industrial data. Two processes are studied in this research work, one is Tennessee Eastman process simulation and another one is industrial scale separation tower. Author clearly explained how these processes demonstrate feature extraction and data clustering effectively disclose significant trend line from high dimensional data. Cluster results compared against true labels in the data with the help of supervised clustering metrics and process knowledge to compare the performance of data clustering approached and dimensionality reduction with different combination 1.
2.1 Approaches used in data miningIn this paper data mining approached clearly explained, first DR technique is used to remove the redundant data’s from the raw process data file to improve the quality of the data. DR technique gives output in two dimensions or three dimensions to show visualization or the DR technique just remove the redundant data alone. After data projection, clustering techniques used to partition the data with the help of data algorithms. Clusters are formed depend upon the data and the parameters used in the clustering process. Finally obtained clusters are assigned with the label by analyse the data in the data cluster and relate this cluster with process event 1.
2.2 Dimensionality reductionCurse of dimensionality addressed by dimensionality reduction, it is the important step in data mining .Problems lead by high dimensional space such as empty space phenomena(volume increased by increasing dimensionality such that data become scattered),correlation between variables and Euclidean distance are weaker discrimination power of metrics. In this research dimensionality reduction method is chosen based on the computational cost and characteristics. Principle component analysis is the most commonly used dimensionality analysis and in statistical processing it has number of successful application