DATA WAREHOUSING. FUNDAMENTALS. A Comprehensive Guide for. IT Professionals. PAULRAJ PONNIAH. A Wiley-Interscience. Library of Congress Cataloging-in-Publication Data: Ponniah, Paulraj. Data warehousing fundamentals for IT professionals / Paulraj Ponniah. Data Warehousing Fundamentals for it Professionals, Second Edition. Author(s). Paulraj Ponniah. First published May Print ISBN

Data Warehousing Fundamentals By Paulraj Ponniah Pdf

Language:English, German, Hindi
Genre:Personal Growth
Published (Last):14.08.2016
ePub File Size:26.80 MB
PDF File Size:12.36 MB
Distribution:Free* [*Sign up for free]
Uploaded by: LONNY

Data warehousing fundamentals for IT professionals / Paulraj Ponniah.—2nd ed. p. cm. Previous ed. published under title: Data warehousing fundamentals. DOWNLOAD PDF The Compelling Need for Data Warehousing Chapter Objectives 1 Escalating Need for Strategic Information 2 .. professionals about my faculty colleague Paulraj Ponniah's textbook Data Warehousing Fundamentals. Textbook: 1. Data Warehousing Fundamentals for IT Professionals by Ponniah, ISBN: Course Length: This is a semester long 4 credit hour .

The end result is the creation of a new computing environment for the purpose of providing the strategic information every enterprise needs desperately.

First the large companies that were able to quickly afford the outlay of resources began to launch data warehousing projects. Medium-sized companies also entered the data warehousing arena. Much research began to be focused on this new phenomenon.

Many vendors began to offer hardware and software products to support the different functions within the data warehouse.

Prior to the data warehousing concept with an architectural model for the movement of data from operational systems to decision support environments, companies attempted multiple decision-support environments within their organizations. This had to be done with enormous costs fraught with large amounts of data redundancies and inconsistencies. The adoption of data warehousing changed all of this. Similar to industrial warehouses, data warehouses are intended for large-scale collection and storage of corporate data to provide strategic information for the overall needs.

Data Warehousing Milestones As data warehousing gained acceptance during the s and s, we may trace some of the highlights of the movement. This institution has since emerged as the leading voice in the data warehousing and business intelligence arena providing education, research, and support.

Expansion of globalization opened the arena for competitors, more in number and greater in power. New privacy regulations created the need to revise methods of collection and use of information. Improper architecture of some initial data warehousing systems produced fragmented views of corporate data and tended to produce disparate information silos. Query, reporting, and analysis tools provided to the users in the early data warehousing environments for self-service proved to be too complex and overwhelming for use by the users themselves.

Companies began to perceive that the goal of decision-support systems is twofold: transformation of data to information; derivation of knowledge from information. Each of these two aspects needs to be emphasized and strengthened appropriately to provide the necessary results. Business intelligence for an organization requires two environments, one to concentrate on transformation of data into information and the other to deal with transformation of information into knowledge.

Business intelligence BI , therefore, is a broad group of applications and technologies. First, the term refers to the systems and technologies for gathering, cleansing, consolidating, and storing corporate data.

Next, business intelligence relates to the tools, techniques, and applications for analyzing the stored data. The Gartner Group popularized BI as an umbrella term to include concepts and methods to improve business decision making by fact-based support systems. In this environment data from multiple operational systems are extracted, integrated, cleansed, transformed and stored as information in specially designed repositories.

Information to Knowledge. In this environment analytical tools are made available to users to access and analyze the information content in the specially designed repositories and turn information into knowledge. Again, using this information with sophisticated tools for proper decision making is equally challenging.

Therefore, the trend is to consider these as two distinct environments for corporate BI.

Search This Blog

Vendors also tend to specialize in tools appropriate for these two distinct environments. However, the two environments are complementary and need to work together. Figure shows the two complementary environments, the data warehousing environment, which transforms data into information, and the analytical environment, which produces knowledge from information. These are the sys- tems that are used to run the day-to-day core business of the company.

They are the so- called bread-and-butter systems.

Operational systems make the wheels of business turn see Figure They support the basic business processes of the company. These sys- tems typically get the data into the database. Each transaction processes information about a single entity such as a single order, a single invoice, or a single customer. Watching the Wheels of Business Turn On the other hand, specially designed and built decision-support systems are not meant to run the core business processes.

They are used to watch how the business runs, and then make strategic decisions to improve the business see Figure Decision-support systems are developed to get strategic information out of the data- base, as opposed to OLTP systems that are designed to put the data into the database. De- cision-support systems are developed to provide strategic information. Different Scope, Different Purposes Therefore, we find that in order to provide strategic information we need to build infor- mational systems that are different from the operational systems we have been building to run the basic business.

It will be worthless to continue to dip into the operational systems for strategic information as we have been doing in the past. As companies face fiercer competition and businesses become more complex, continuing the past practices will only lead to disaster.

Get the information out Watching the wheels of business turn K Show me the top-selling products K Show me the problem regions K Tell me why drill down K Let me see other data drill across K Show the highest margins K Alert me when a district sells below target Figure Operational and informational systems.

How are they different?

Data Warehousing Fundamentals for IT Professionals

The type of information needed for strategic decision making is different from that available from operational systems. We need a new type of system environment for the purpose of providing strategic information for analysis, discerning trends, and monitoring performance.

Let us examine the desirable features and processing requirements of this new type of system environment. Let us also consider the advantages of this type of system environ- ment designed for strategic information. There are four levels of analytical processing requirements: Running of simple queries and reports against current and historical data 2. Ability to query, step back, analyze, and then continue the process to any desired length 4.

Spot historical trends and apply them for future results Business Intelligence at the Data Warehouse This new system environment that users desperately need to obtain strategic information happens to be the new paradigm of data warehousing.

Stay ahead with the world's most comprehensive technology and business learning platform.

Enterprises that are building data warehouses are actually building this new system environment. This new environment is kept separate from the system environment supporting the day-to-day operations. The data warehouse essentially holds the business intelligence for the enterprise to enable strategic decision making. The data warehouse is the only viable solution.

We have clearly seen that solutions based on the data extracted from operational systems are all totally unsatisfacto- ry. Figure shows the nature of business intelligence at the data warehouse. At a high level of interpretation, the data warehouse contains critical measurements of the business processes stored along business dimensions.

For example, a data warehouse might contain units of sales, by product, day, customer group, sales district, sales region, and promotion. Here the business dimensions are product, day, customer group, sales dis- trict, sales region, and promotion.

From where does the data warehouse get its data? The data is derived from the opera- tional systems that support the basic business processes of the organization. In between the operational systems and the data warehouse, there is a data staging area. In this stag- ing area, the operational data is cleansed and transformed into a form suitable for place- ment in the data warehouse for easy retrieval.

We arrived at this conclusion based on the functions of the new system environment called the data warehouse. So, let us try to come up with a functional definition of the data warehouse.

A Simple Concept for Information Delivery In the final analysis, data warehousing is a simple concept. It is born out of the need for strategic information and is the result of the search for a new way to provide such infor- mation.

The methods of the last two decades using the operational computing environ- ment, were unsatisfactory. The new concept is not to generate fresh data, but to make use of the large volumes of existing data and to transform it into forms suitable for providing strategic information. The data warehouse exists to answer questions users have about the business, the per- formance of the various operations, the business trends, and about what can be done to improve the business. The data warehouse exists to provide business users with direct ac- cess to data, to provide a single unified version of the performance indicators, to record the past accurately, and to provide the ability to view the data from many different per- spectives.

In short, the data warehouse is there to support decisional processes. Data warehousing is really a simple concept: Take all the data you already have in the organization, clean and transform it, and then provide useful strategic information.

What could be simpler than that? An Environment, Not a Product A data warehouse is not a single software or hardware product you download to provide strategic information. It is, rather, a computing environment where users can find strategic information, an environment where users are put directly in touch with the data they need to make better decisions. It is a user-centric environment. Let us summarize the characteristics of this new computing environment called the data warehouse: The basic concept of data ware- housing is: Different technologies are, therefore, needed to support these functions.

Figure shows how data warehouse is a blend of many technologies needed for the various functions. Although many technologies are in use, they all work together in a data warehouse. The end result is the creation of a new computing environment for the purpose of provid- ing the strategic information every enterprise needs desperately.

There are several vendor tools available in each of these technologies. You do not have to build your data warehouse from scratch. Information needed for strategic decision making is not readily available.

This was mainly because IT has been trying to provide strategic information from opera- tional systems. Opera- tional systems are not designed for strategic information. The data warehouse promises to be this new computing environment.

There is a compelling need for data ware- housing for every enterprise. What do we mean by strategic information? For a commercial bank, name five types of strategic objectives. Do you agree that a typical retail store collects huge volumes of data through its operational systems? Name three types of transaction data likely to be collected by a retail store in large volumes during its daily operations.

Examine the opportunities that can be provided by strategic information for a medical center.

Can you list five such opportunities? Why were all the past attempts by IT to provide strategic information failures? List three concrete reasons and explain. Describe five differences between operational systems and informational systems. Why are operational systems not suitable for providing strategic information? Give three specific reasons and explain. Name six characteristics of the computing environment needed to provide strate- gic information. What types of processing take place in a data warehouse?

A data warehouse in an environment, not a product. Data warehousing is the only viable means to resolve the information crisis and to provide strategic information. List four reasons to support this assertion and ex- plain them. Match the columns: OLTP application 2.

Explain via some examples how exactly technology trends do help. You are the IT Director of a nationwide insurance company. Write a memo to the Executive Vice President explaining the types of opportunities that can be realized with readily available strategic information. For an airlines company, how can strategic information increase the number of fre- quent flyers?

Discuss giving specific details. You are a Senior Analyst in the IT department of a company manufacturing auto- mobile parts. The marketing VP is complaining about the poor response by IT in providing strategic information. Draft a proposal to him explaining the reasons for the problems and why a data warehouse would be the only viable solution. In this system, you integrate and transform enterprise data into information suitable for strategic decision making.

You take all the historic data from the various operational sys- tems, combine this internal data with any relevant data from outside sources, and pull them together.

You resolve any conflicts in the way data resides in different systems and transform the integrated data content into a format suitable for providing information to the various classes of users. Finally, you implement the information delivery methods. In order to set up this information delivery system, you need different components or building blocks.

These building blocks are arranged together in the most optimal way to serve the intended purpose. They are arranged in a suitable architecture.

Before we get into the individual components and their arrangement in the overall architecture, let us first look at some fundamental features of the data warehouse. Bill Inmon, considered to be the father of Data Warehousing provides the following de- finition: The data in the data warehouse is: Separate Available 19 Data Warehousing Fundamentals: What about the nature of the data in the data warehouse?

How is this data dif- ferent from the data in any operational system? Why does it have to be different? How is the data content in the data warehouse used? Subject-Oriented Data In operational systems, we store data by individual applications. In the data sets for an or- der processing application, we keep the data for that particular application. But these data sets contain only the data that is needed for those functions relating to this particular application.

We will have some data sets containing data about individual orders, customers, stock status, and de- tailed transactions, but all of these are structured around the processing of orders. Similarly, for a banking institution, data sets for a consumer loans application contain data for that particular application. Data sets for other distinct applications of checking accounts and savings accounts relate to those specific applications. In every industry, data sets are organized around individual applications to support those particular operational systems.

These individual data sets have to provide data for the specific applications to perform the specific functions efficiently. Therefore, the data sets for each application need to be organized around that specific application.

In striking contrast, in the data warehouse, data is stored by subjects, not by applica- tions. If data is stored by business subjects, what are business subjects? Business subjects differ from enterprise to enterprise. These are the subjects critical for the enterprise. For a manufacturing company, sales, shipments, and inventory are critical business subjects. For a retail store, sales at the check-out counter is a critical subject.

Figure distinguishes between how data is stored in operational systems and in the data warehouse. In the operational systems shown, data for each application is organized separately by application: For example, Claims is a critical business subject for an insurance company.

Data Warehousing Fundamentals: A Comprehensive Guide for IT Professionals

Claims under automobile insurance policies are processed in the Auto Insurance application. Claims data for automobile insurance is organized in that application. In a data warehouse, there is no application flavor. The data in a data warehouse cut across applications.

Integrated Data For proper decision making, you need to pull together all the relevant data from the vari- ous applications. The data in the data warehouse comes from several operational systems. Source data are in different databases, files, and data segments.

These are disparate appli- cations, so the operational platforms and operating systems could be different. The file layouts, character code representations, and field naming conventions all could be differ- ent. In addition to data from internal operational systems, for many enterprises, data from outside sources is likely to be very important.

Companies such as Metro Mail, A. Nielsen, and IRI specialize in providing vital data on a regular basis. Your data warehouse may need data from such sources. This is one more variation in the mix of source data for a data warehouse. Figure illustrates a simple process of data integration for a banking institution.

Here the data fed into the subject area of account in the data warehouse comes from three different operational applications. Even within just three applications, there could be sev- eral variations.

Naming conventions could be different; attributes for data items could be different. The account number in the Savings Account application could be eight bytes long, but only six bytes in the Checking Account application.

282227769-DWH-Ch-6-Question-Answers.pdf - Page 1 of 9 Book...

Before the data from various disparate sources can be usefully stored in a data ware- house, you have to remove the inconsistencies. You have to standardize the various data el- ements and make sure of the meanings of data names in each source application. Before moving the data into the data warehouse, you have to go through a process of transforma- tion, consolidation, and integration of the source data.

Here are some of the items that would need standardization: In an order entry system, the status of an order is the current status of the order. In a con- sumer loans application, the balance amount owed by the customer is the current amount. Of course, we store some past transactions in operational systems, but, essentially, opera- tional systems reflect current information because these systems support day-to-day cur- rent operations.

On the other hand, the data in the data warehouse is meant for analysis and decision making. If a user is looking at the downloading pattern of a specific customer, the user needs data not only about the current download, but on the past downloads as well.

When a user wants to find out the reason for the drop in sales in the North East division, the user needs all the sales data for that division over a period extending back in time. When an analyst in a grocery chain wants to promote two or more products together, that analyst wants sales of the selected products over a number of past quarters.

A data warehouse, because of the very nature of its purpose, has to contain historical data, not just current values.

Data is stored as snapshots over past and current periods. Every data structure in the data warehouse contains the time element. This aspect of the data ware- house is quite significant for both the design and the implementation phases. For example, in a data warehouse containing units of sale, the quantity stored in each file record or table row relates to a specific time element.

Depending on the level of the details in the data warehouse, the sales quantity in a record may relate to a specific date, week, month, or quarter. The data in the data warehouse is not intended to run the day-to-day business. When you want to process the next order received from a customer, you do not look into the data ware- house to find the current stock status.

The operational order entry application is meant for that purpose. In the data warehouse, you keep the extracted stock status data as snap- shots over time. You do not update the data warehouse every time you process a single order. Data from the operational systems are moved into the data warehouse at specific inter- vals. Depending on the requirements of the business, these data movements take place twice a day, once a day, once a week, or once in two weeks.

In fact, in a typical data ware- house, data movements to different data sets may take place at different frequencies. The changes to the attributes of the products may be moved once a week. Any revisions to ge- ographical setup may be moved once a month. The units of sales may be moved once a day. You plan and schedule the data movements or data loads based on the requirements of your users. As illustrated in Figure , every business transaction does not update the data in the data warehouse.

The business transactions update the operational system databases in real time. We add, change, or delete data from an operational system as each transaction hap- pens but do not usually update the data in the data warehouse. You do not delete the data in the data warehouse in real time. Once the data is captured in the data warehouse, you do not run individual transactions to change the data there. Data updates are commonplace in an operational database; not so in a data warehouse.

The data in a data warehouse is not as volatile as the data in an operational database is. The data in a data warehouse is primarily for query and analysis.

Data Granularity In an operational system, data is usually kept at the lowest level of detail. In a point-of- sale system for a grocery store, the units of sale are captured and stored at the level of units of a product per transaction at the check-out counter. In an order entry system, the quantity ordered is captured and stored at the level of units of a product per order received from the customer. If you are looking for units of a product ordered this month, you read all the orders entered for the entire month for that product and add up.

You do not usually keep summa- ry data in an operational system. When a user queries the data warehouse for analysis, he or she usually starts by look- ing at summary data. The user may start with total sale units of a product in an entire re- gion. Then the user may want to look at the breakdown by states in the region. The next step may be the examination of sale units by the next level of individual stores.

Frequent- ly, the analysis begins at a high level and moves down to lower levels of detail. In a data warehouse, therefore, you find it efficient to keep data summarized at differ- ent levels. Depending on the query, you can then go to the particular level of detail and satisfy the query.

Data granularity in a data warehouse refers to the level of detail. The lower the level of detail, the finer the data granularity. Of course, if you want to keep data in the lowest level of detail, you have to store a lot of data in the data warehouse. You will have to decide on the granularity levels based on the data types and the expected system performance for queries. Figure shows examples of data granularity in a typical data warehouse.

Some authors and vendors use the two terms synonymously. Some make distinctions that are not clear enough. At this point, it would be worthwhile for us to examine these two terms and take our position. Let us examine this statement and take a stand. Before deciding to build a data warehouse for your organization, you need to ask the following basic and fundamental questions and address the relevant issues: These are critical issues requiring careful examination and planning.

Should you look at the big picture of your organization, take a top-down approach, and build a mammoth data warehouse? Or, should you adopt a bottom-up approach, look at the individual local and departmental requirements, and build bite-size departmental data marts? Should you build a large data warehouse and then let that repository feed data into lo- cal, departmental data marts? On the other hand, should you build individual local data marts, and combine them to form your overall data warehouse?

Should these local data marts be independent of one another? Or, should they be dependent on the overall data warehouse for data feed?

Should you build a pilot data mart? These are crucial questions. How are They Different? Let us take a close look at Figure It's easier to figure out tough problems faster using Chegg Study. Unlike static PDF Data Warehousing Fundamentals for IT Professionals solution manuals or printed answer keys, our experts show you how to solve each problem step-by-step.

No need to wait for office hours or assignments to be graded to find out where you took a wrong turn. You can check your reasoning as you tackle a problem using our interactive solutions viewer. Plus, we regularly update and improve textbook solutions based on student ratings and feedback, so you can be sure you're getting the latest information available.

Our interactive player makes it easy to find solutions to Data Warehousing Fundamentals for IT Professionals problems you're working on - just go to the chapter for your book.De- cision-support systems are developed to provide strategic information.

For example, they must be able to review sales quanti- ties by product, salesperson, district, region, and customer groups. You get snapshots of transactions that happen at specific times. How can this book be exactly suitable for IT professionals?

The fundamental reason for the inability to provide strategic information is that we have been trying all along to provide strategic information from the operational systems.

Each independent data mart will be blind to the overall requirements of the entire organization. This limitation caused frustration and executive in- formation systems did not last long in many companies. Depending on the requirements of the business, these data movements take place twice a day, once a day, once a week, or once in two weeks.

Depending on the industries you have worked in, you must have been involved in applications such as order processing, general ledger, inventory, in-patient billing, checking accounts, insurance claims, and so on. Why then do we talk about an information crisis?