Data warehouse function and data flow within the data warehouse is as follows
1. Extract and Load Data
Data. extraction takes data source systems and makes it available to a data warehouse, data load takes extracted data and loads it into the data warehouse. When we extract data from a physical database, whatever form it is held in, the original information content will have been modified and extended over the years in order to support the data/performance requirements of the operational system. Before loading the data into the data warehouse this information content must be reconstructed.
In essence, information can be defined as data with context and meaning. the data warehouse extract and load process must take data and add context and meaning in order to convert it into value-adding business information.
Within a data warehouse, this is achieved by extracting the data from source systems, loading it into the database. stripping out any detail that is sere to support the operating system rather than the business requirement, siding more context, and then reconciling., the data with other sources.
(a) Controlling the Process
This is a mechanism that determines when to extract the data, run the transformations and consistency checks, and so on.
(b) When to Initiate the Extract
Source data should be extracted only at a point, where it represents the same instance of time as the extracts from the other data source.
(c) Loading the Data
Once data is extracted from the source systems, it is then typically loaded into a temporary data store in order for it to be cleaned up and made consistent.
(d) Copy Management Tools and Data Cleanup
If the source systems do not overlap much, and the consistency checks are simplistic, a copy management tool will cut down the coding effort required. If this is not the case, a copy management tool may not add sufficient value to justify the purchase.
2. Clean and Transform Data
This is the system process, that takes the loaded data and structures it for query performance, and for minimizing operational costs. There are in essence a small number of steps within the process.
(a) Clean and transform the leaded data into a structure that speeds up queries.
(b) Partition the data in order to speed up queries, optimize hardware performance and simplify the management of the data warehouse.
(c) Create aggregations, to speed up the common queries.
3. Backup and Archive Process
As in operational systems, the data within the data warehouse is backed tip regularly in order to ensure that the data warehouse can always be recovered from data loss, software failure
or hardware failure.
4. Query Management Process
The query management process is the system process that manages the queries and speeds them tip by directing queries to the most effective data source. This process must also ensure that all the system resources are used in the most effective way, usually by scheduling the execution of queries. The query management process may also be required to monitor the actual query profile. This information would then be used by the warehouse management process to determine which aggregations to generate.
Unlike the other system process, query management does not generally operate during the regular load of information into the data warehouse. Few set of facilities that are constantly in operation are as follow —
(a). Directing Queries —
A data warehouse that contains summary data potentially provides a number of distinct data sources to respond to a specific query. These are detailed information itself, and any number of aggregation that satisfies the query’s information need.
(b) Maximizing System Resource —
Regardless of the processing power available to run the data warehouse, it is all too possible that a single large query can soak up all system resources, affecting the performance of the entire system.
(c) Query Capture —
As users become used to the facilities provided by a data warehouse, they will change the kinds of queries they ask. This is inevitable and should be encouraged because it indicates that users are exploiting the information content of the data warehouse.