In Transaction Server, data is prone to be changed at any given point of time. And therefore, it is not advisable to do both analysis and transaction on the same server.
Data Warehouse and Transactional Server Comparison
Data Warehouse
|
Operational/Transactional
|
Subject oriented
|
Application oriented
|
Summarized, refined & detailed
|
Detailed
|
Represents value over time
|
Accurate as of moment
|
Supports managerial needs
|
Supports day-to-day needs
|
Read only data
|
Can be updated
|
Batch processing
|
Real time transactions
|
Completely different life cycle25
|
Software Development Life Cycle
|
Analysis driven
|
Transaction driven
|
Dimensional model
|
Entity Relational Diagram
|
Large amount of data
|
Small amount of data
|
Relaxed availability
|
High availability
|
Flexible structure
|
Static structure
|
ETL (Extraction, Transformation and Loading) are together used as a process through which we extract data from single or multiple data sources, then transform in one format and load at a common place i.e. Data Warehouse.
ETL tool has API of all Databases supported by the ETL tool.
At the DATA STAGING AREA- Data is in unchanged format.
ETL TOOLS:
- Informatica
- Data Stage (IBM Product)
- AB Initio
- OWB (Oracle Warehouse Builder)
Benefits of Multi-Tier Architecture:
A few points under which we can see obvious benefits of Multi Tier Architecture over 2 Tier Architecture
- Database Server and Processing is kept separately and thus it is easier to serve more requests
- Authorization and Authentication by adding another security layer
- Performance, Stability and Transparency.
DATAWAREHOUSE TEAM IS DIVIDED INTO FOLLOWING TEAMS
PRE-REQUISITE TO EXTRACTION
· DDL (Data Definition Language), DML (Data Manipulation Language), DCL (Data Control Language) should be checked on OLTP Database. If there are any errors – they should be rectified.
Earlier to SAP BW, ETL in SAP was done through separate tools for ETL. Extraction (Data Stage), Transformation (Erwin), Loading (Cognos).
Drawbacks of Separate Extraction, Transformation, Loading:
- Not Cost Effective
- Often it does not give desired result because of Technical constraint between different products.
SAP BW is complete package (DW tool under SAP) that contains Extraction, Transformation, and Loading.
ETL is set of processes through which we are extracting data from different and separate data sources, processing and keeping them after processing at a place called Data warehousing.
OLTP is based on E.R. Diagram. This is based on joins which are time consuming while processing.
In BW, we have FACT and DIMENSIONS which have Parent Child Relationship.
Landscape of SAP BW
Amount of data doubles every 18months, which affects response time and the sheer ability to comprehend its content.
There are many technology related reasons for the existence of DW.
- DW is designed to address the incompatibility of informational and operational transaction systems. (Since the requirement of them are incompatible).
MODELLING
Business Intelligence:
Business (in layman terms) is any transaction between two difference entities. If the transaction happens in the same entity - it is not considered as business.
Intelligence- extra set of features needed for aiding in business decision making.
FACTS AND DIMENSIONS
FACT represents numerical data in a business transaction which is cumulative. e.g.: Employee salary, product price.
FACT TABLE: Where these numerical data are stored also known as Measures or Metrics.
Work of FACT TABLE in BI: It stores all business related Metrics, Measures, numbers in their own business area. If a data is not in FACT table, it is not possible to analyse that data.
DIMENSION: Descriptive Information which has definite value. It describes the FACT. Every FACT must contain Dimension, also it must contain UNIT. Unit can be any metric – most frequently used UNIT is kg or INR (Indian units for weight and currency).
There will be only one FACT Table in an SAP BW lifecycle. In BW we can make a maximum of 16 Dimensions, of which 3 are system defined and 13 can be user defined.
System Defined Dimensions:
Time: Client requirement decides the time dimension – year, week, day or hour.
Unit: most frequently used UNIT is kg or INR (Indian units for weight and currency
Data Packet: size of data to be stored.
example:
Employee Salary 50,000 per month.
In above statement: Employee is Dimension
Salary is Fact
50,000 is Fact Value
Month is Time
BI Content (BC) or Business Content:
Sample Analysis Reports of all business modules’ are already pre-manufactured and stored in BC.
TYPES OF DATA IN SAP BI:
- Master Data (does not often change – it plays part of reference from where we can refer the Facts) e.g. EMP ID, Bank Account, Surname.
- Transaction Data : DWH BI analysis mainly happens for this type of data. This type of data changes frequently. e.g. Salary, tax, amount, price
SCHEMA
It is an old concept. Every Database consists of number of schemas. It is collection of tables. Table is collection of rows and columns. Data is in the column.
With respect to Data warehouse, Schema is presentation of data behind the BI Server.
In SAP BI, two types of Schemas are used.
- Star Schema (http://www.learndatamodeling.com/star.htm)
- Extended Star Schema or Multi Dimension Modeling.
Star Schema
Dimension IDs are used to select unique records.
Fact Table contains: respective Dimension IDs and FACT Data.
In Star Schema, Master Data is embedded with the Dimensions.
Drawback of Star Schema:
SAP AG made a rule that Master Data will be executed before Transaction Data.
Extended Star Schema
In star schema the central fact table is surrounded by master data tables(=dimension tables).since master data lies within the cube. we cannot reuse the master data information for other cubes.where as in Extended star schema the central fact table is surrounded by dimension tables (!=master data tables). Master data tables lies outside the info cube. So,we can share the master data information for other cubes.
In star schema the fact table holds Alpha-numeric information.so,it degrades the database performance and query performance.where as in Extended star schema using SID tables we can see all the information in the fact table as numeric.so,it improves the database performance and query performance.
In star schema the fact table is directly connected to the master data tables and it can be connected to a maximum of 16 master data tables.so,we cannot analyze the data more than 16 angles.where as in Extended star schema the fact table is connect to maximum 16 dimension tables and each dimension table can be connected to a 248 SID tables. so we can analyze the data in 16*248 angles
Star schema
- It can be analyzed only in 16 dimensions.
- Master data is not reusable.
- Performance is degraded because of the alpha-numeric values in the fact table.
Extended Star Schema
- It can be analyzed in 16*248 dimensions.
- Master data can be reusable as it is not inside the cube.
- Better performance as there r no alphanumeric values in the fact table.
Earlier, request comes --> goes to Mater Data -->Transaction Data
Now with Extended Star Schema, goto Dimension and connect to the respective Master Data using the SID Table.
After Primary Key (Dim ID), SID is unique ID which is system generate
Operational Data: Stored in DW or cube, data which is part of operations and not legacy data.
If there is data which is going to be used after 5 years, we can keep it in separate DW.
MODELING
Basically involves designing the Schema, Framework and Methodology.
Before designing, we should know definition of all data fields (only fields, not their values). This is called Extract Structure. Full Database with data value is needed in Extraction.
Modeling Terms
The need and definition of Modeling: We create the schema.
If not designed properly, there will be problems like:
- Integration Problem.
- Difficulty in fetching data.
If designed properly, it helps in Business Analysis:
Info objects: It is the smallest entity in BI and Infocube.
They are of two types:
- Characteristics (e.g. EMP ID, Name, Bank Account Number) These are descriptions or Descriptive Items
- Key Figure (e.g. Salary, Price, Amount)
Point to note is FACTs and Key Figures are not same.
Fact is part of schema where Key figure is not direct property of Schema.
Character is going to store in Dimensions.
Since Server understands Dimensions and not Character. And all the data has to go through schema(basic entity), we convert key figure into FACT and characteristics to Dimension. It is not a direct copy – there is conversion involved.
Infoobjects are kept in folders called Info Area. They cannot be kept independently. It is common place where we are going to keep Info Objects and Info Cube.
Whenever we search an InfoCube, we look for its Info Area.
Info Object Catalogue – It is used to keep separate type of data in different container/path/segment.
There will be 2 info-object catalogues always (because we have only two types of data – charecteristics and key figure)
When doing Modeling, first step is to create a source system. It is the system from where we are going to fetch and process data, it can be R/3, flat file, oracle, Sybase, XML, Another B/W System etc.
Source System should have IP settings and drivers.
Most often, the data is extracted and given to us since it is a lengthy and costly process.
File System / Oracle / SAP --> Extract Structure --> (Replication) (Data in BW System Format) Transfer Structure --> PSA (Transfer Rule) Communication Structure --> (Update Rule) Info Object (Info Provider)
Transfer and Communication Structure are together called Info Source. They are already system created, we don’t create it. We only keep our structure in them.
Transfer Rule is collection of instructions which indicates transfer of structure from 1 stage to another stage.
PSA is Persistent Staging Area which is Temporary Backup Storage Area.
First Infocube is created then the Update Rule is created.
From Transfer Structure to Infocube creation – the work is done Admin Workbench (RSA1 or RSA1OLD).
Infocube is also known as Data Target/ Info Provider.
Info Provider are of two types: Physical and Virtual. It is a place where all the data is stored finally for analysis purpose.
Info Provider