How are you ?
In this post I will demonstrate how to create your first multidimensional cube in the star schema. This post ended up being done without the idealization that I usually do, because I was writing another blog post and I needed to create a simple Analysis Services cube for the demonstration and I understood it as an opportunity to talk a bit about BI too, subject that I work in my day by day and really, I hardly write about. So I decided to create this post demonstrating the step-by-step for creating this cube.
Some BI Concepts
Multidimensional modeling, or dimensional modeling as it is sometimes called, is the database modeling technique for assisting Data Warehouse queries from a variety of perspectives. Multidimensional view enables the most intuitive use for analytical processing by the Online Analytical Processing (OLAP) tools.
Every dimensional modeling has two essential elements: the Facts tables and the Dimensions tables. Both are mandatory and have complementary characteristics within a Data Warehouse.
Online Analytical Processing (OLAP)
OLAP has a set of techniques for handling the data contained in the multidimensional view of the Data Warehouse. OLAP tools can be of different types: MOLAP, ROLAP, or HOLAP.
Multidimensional OLAP (MOLAP) is the type of tool that uses multidimensional database structure. Relational OLAP (ROLAP) uses the relational data architecture, where the database has the traditional structure. Already the hybrid OLAP (HOLAP) is the junction of the previous two, using the best aspects and features of each of the two types.
The Dimensions are the descriptors of the data from Fato. It has the qualitative character of the information and the “one to many” relationship with the Fact table. It is the Dimension that allows the visualization of information from different aspects and perspectives. Ex: Customer registration, payment methods registration, etc.
Measures are numerical values that represent Fact and the performance of a business indicator associated with dimensions. Ex: Sales Amount, Sales Amount, Average Ticket, etc.
The Facts contains the metrics. It has the quantitative character of the descriptive information stored in the Dimensions. It is where business events are stored and has a “many to one” relationship with the peripheral tables (Dimension). It is a table that has only the measurements and the ID's that are linked to the dimensions to display the complete information regarding the data that will be viewed.
Multidimensional modeling models
Dimensional modeling has two models: the star schema model and the snow flake model. Each with different applicability depending on the specificity of the problem.
The dimensions of the star model are denormalized, unlike snow flake, which partially has normalization. The relational structure differs from the multidimensional structure mainly due to normalization, little redundancy and the frequency of supported updates. The multidimensional structure typically has table denormalization, high redundancy, and supports much shorter data refresh times than a conventional relational structure.
Generating Dimensions and Facts (ETL)
In this part of cube creation, I will generate the dimensions and facts with random data. Of course this data is for demonstration purposes only. In a real scenario, random data should not be used in BI analyzes to create multidimensional cubes, as the analysis would not have a real value.
For the random generation of dates I used the function fncRand (), available in post SQL Server - Msg 443 Invalid use of a side-effecting operator 'rand' within a function.
-- CRIAÇÃO DAS DIMENSÕES
IF (OBJECT_ID('dbo.Dim_Cliente') IS NOT NULL) DROP TABLE dbo.Dim_Cliente
CREATE TABLE dbo.Dim_Cliente (
Codigo INT IDENTITY(1, 1),
INSERT INTO dbo.Dim_Cliente
('Dirceu Resende', '1987-05-28', 'Masculino', 'RJ'),
('Cliente 1', '1971-01-15', 'Feminino', 'ES'),
('Cliente 2', '1955-02-05', 'Masculino', 'ES'),
('Cliente 3', '1997-08-07', 'Masculino', 'RJ'),
('Cliente 4', '2001-11-14', 'Masculino', 'MG'),
('Cliente 5', '1985-12-13', 'Masculino', 'SP'),
('Cliente 6', '1982-09-22', 'Masculino', 'ES')
IF (OBJECT_ID('dbo.Dim_Forma_Pagamento') IS NOT NULL) DROP TABLE dbo.Dim_Forma_Pagamento
CREATE TABLE dbo.Dim_Forma_Pagamento (
Codigo INT IDENTITY(1, 1),
INSERT INTO dbo.Dim_Forma_Pagamento
('Cartão de Crédito')
IF (OBJECT_ID('dbo.Dim_Produto') IS NOT NULL) DROP TABLE dbo.Dim_Produto
CREATE TABLE dbo.Dim_Produto (
Codigo INT IDENTITY(1, 1),
INSERT INTO dbo.Dim_Produto
('Toalha', 25, 'Cama, Mesa e Banho', 19.99),
('TV 55', 3200, 'Eletro', 3500),
('TV 42', 2500, 'Eletro', 2359.70),
('Celular Top Android Novo', 120, 'Celulares', 1890),
('Celular Top iOS Usado', 114, 'Celulares', 4999.99),
('Cama Box', 7510, 'Cama, Mesa e Banho', 1249.99),
('Toalha de Rosto', 15, 'Cama, Mesa e Banho', 12.99),
('Prato', 250, 'Cozinha', 34.80),
('Talher', 25, 'Cozinha', 22.50),
('Panela', 250, 'Cozinha', 69.80),
('Microondas', 1450, 'Eletro', 369.99),
('Encosto de Mesa', 35, 'Cama, Mesa e Banho', 15.50)
-- CRIAÇÃO DA FATO
IF (OBJECT_ID('dbo.Fato_Venda') IS NOT NULL) DROP TABLE dbo.Fato_Venda
CREATE TABLE dbo.Fato_Venda (
@Contador INT = 1, @Total INT = 1000
WHILE(@Contador <= @Total)
INSERT INTO dbo.Fato_Venda
SELECT TOP 1
(SELECT TOP 1 Codigo FROM dbo.Dim_Cliente ORDER BY NEWID()) AS Cod_Cliente,
Codigo AS Cod_Produto,
(SELECT TOP 1 Codigo FROM dbo.Dim_Forma_Pagamento ORDER BY NEWID()) AS Cod_Forma_Pagamento,
DATEADD(DAY, dbo.fncRand(1885), '2012-01-01') AS Dt_Venda,
Preco AS Vl_Venda
SET @Contador += 1
Viewing the random mass of tests:
SELECT * FROM dbo.Dim_Cliente
SELECT * FROM dbo.Dim_Forma_Pagamento
SELECT * FROM dbo.Dim_Produto
SELECT * FROM dbo.Fato_Venda
Creating the cube in Visual Studio (Data Tools)
Formerly known as Business Intelligence Development Studio (BIDS) or Visual Studio Shell, Microsoft Data Tools has been incorporated into Visual Studio as of the 2015 release and is now part of a single product and is now just a Visual Studio plugin that can be downloaded accessing this link.
To start, open SQL Server Data Tools 2015 and from the File> New> Project menu and select the project type “Analysis Services Multidimensional and Datamining Project”.
Creating a new data sourceView content
Creating a new Data Source ViewView content
After Data Source View is created, you can create logical relationships by clicking the Foreign Key column in the Fact and dragging to the Primary Key column in the dimension. After completing the action, if there is no Foreign Key created in the database, you will see this alert message:
You can create the “Yes” button to complete the creation of the logical relationship.
Creating a new cubeView content
Managing PermissionsView content
Remember that this permission is only in the database. To manage Analysis Services Administrator permissions, you must open Analysis Services from SQL Server Management Studio, select the “Analysis Services” option
In the Analysis Services properties screen, click on the “Security” tab and you're done. In this screen you can configure who are the administrators of Analysis Services, being able to create / delete / change all the cubes of the instance.
Creating the Time DimensionView content
Note: If the date of your Fact table is DATETIME with time, you will need to remove the time from the date column of your Fact table or create a new column without the time because the Time dimension has no time, as shown below, and this will cause JOIN to fail correctly (Will only return DATETIME column records with time zero)
Processing the cube and publishing to Analysis ServicesView content
Now that we have finished creating the Cube objects, we can process it, which is to publish the Cube to the Analysis Services server and update the data. To do this, right click on the Cube and select the option “Process…”
- Process Default: Performs the least effort possible (with the least amount of tasks) to process the structure and data. The server converts this option to the one that best suits your environment right now. Applicable: All Objects
- Process full: Processes all structure and data, deleting and recreating objects. This means that processing discards everything that exists and re-creates the analytical structure and then processes the data for this structure. If any new attributes are added to the dimension, a process full should roll. Applicable: All Objects
- Process Update: When any attribute changes occur in the dimension, whether adding, deleting or updating, this process must be performed. It has enough intelligence to process only the differential of the structure and data that are new, but it is slower (to apply “intelligence”). Applicable: Dimension
- Process Data: Discards all stored data and processes all data again, ignoring any changes in structure and indexes. The focus is only on the data. Applicable: Dimension, Cube, Measure and Partition
- Process Add (Incremental): Processes only new data, ignoring existing data as well as any new changes to structure or indexes. Applicable: Dimension and Partition
Remember: Any changes you make to the Cube will need to be reprocessed. If you do a Process Full on one dimension, you will need to perform a Process Full on the Cube as well.
That's it folks!