Discuss major issues in data mining.

Major issues in data mining are as follows

(i) Mining Methodology and User Interaction Issues—

These reflect the kinds, of knowledge mined, the ability to mine knowledge at multiple granularities, the use of domain knowledge, Adhoc mining, and knowledge visualization.

(a) Mining Different Kinds of Knowledge in Databases —

Because different users are interested in different kinds of knowledge, data mining should cover a wide spectrum of data analysis and knowledge discovery tasks. These tasks may use the same database in different ways and require the development of numerous data mining techniques.

(b) Interactive Mining of Knowledge at Multiple Levels of Abstraction —

Because it is difficult to know exactly what can be discovered within a database, the data mining process should be’ interactive. Interactive mining allows users to focus the search for patterns, providing and refining. data mining requests based on returned results. In this way, the user can interact with the data mining system to view data and discover patterns at multiple granularities and from different angles.

(c) Incorporation of Background Knowledge —

Background knowledge, or information regarding the. domain under study may be used to guide the discovery. process and allow .discovered patterns to be. expressed in concise terms. and„at different, levels of abstraction. Domin knowledge selected to databases. such as integrity constraints and deduction rules, can help focus. and speed up a data mining process, or judge the interestingness of discovered patterns.

(d) Data Mining Query Languages and Adhoc Data Mining —

High-level data mining query Languages need to be developed to allow users to describe Adhoc data mining tasks by faCilitating the specification of a relevant set of data for analysis, the domain knowledge, the kinds of knowledge .to be ‘mined, arid the conditions and constraints to be enforced on the discovered patterns. Such a language should be integrated with a database or warehouse query language and optimized for efficient and flexible data mining.

(e) Presentation and Visualization of Data Mining Results —

Discovered knowledge should be expressed in high-level languages, visual repre4entations„ or other expressive forms so, that the knowledge can be easily understood, and directly usable by humans. This is especially crucial if the data mining system is to be interactive. This requires the system to adopt expressive knowledge representation techniques, such as trees, tables, rules, graphs, charts, or curves.

(f) Handling Noisy or Incomplete Data –

The data stored in the databases may reflect noise; exceptional cases, or incomplete data objects. When mining data regularities, these objects may confuse the process, causing the knowledge model constructed to overfit the data. As a result, the accuracy of the discovery patterns, cats are poor.

(g) Pattern Evaluation —

A data mining system can uncover thousands of patterns. Many of the patterns discovered may be uninteresting to the given user, either because they represent common knowledge or lack, novelty. The use of interestingness measures or user-specified constraints to guide the discovery process and reduce the search space is another activity area of research.

(II) Performance Issues —

These include efficiency, scalability, and parallelization of data mining algorithms.

(a) Efficiency and Scalability of Data Mining Algorithms —

To effectively extract information from a huge amount of data in databases, data mining algorithms must be efficient and scalable. From a database. perspective on knowledge discovery, efficiency. and scalability is a key issue. in the implementation of data mining systems.

(b) Parallel, Distributed, and Incremental Mining Algorithms –

Such algorithms divide the data into partitions, which are processed in parallel. The results from the partitions are then merged. Moreover, the high cost of some data mining processes promotes the need for incremental data mining algorithms that incorporate database update without having to mine the entire-data again.

(iii) Issue Relating to the Diversity of Database Types –

(a) Handling Relational and complex types of data

it is unrealistic to expect one system to Mine all kinds of Vedanta, given the diversity of data types and different goals of data mining. Specific data mining systems should be constructed for mining-specific kinds of data. therefore One may expect to have different data mining Systems for different kinds of data.

(b) Mining Information from Heterogeneous Database and Global Information Systems –

The discovery of knowledge from different sources of structured, semistructured, or unstructured data with diverse data semantics poses great data mining. Data minim may help disclose high-level data regularities in multiple heterogeneous databases that are Unlikely to be discovered by simple query systems and may improve information exchange and interoperability in heterogeneous databases. Web mining, which uncovers interesting knowledge about Web contents, Web. structures, Web usage, and Web dynamics become a very challenging and fast-evolving field in data mining.

Leave a Reply

%d bloggers like this: