Hello everybody!
How are you ?
In this post I will demonstrate how to create your first multidimensional cube in the star schema. This post ended up being done without the idealization that I usually do, because I was writing another blog post and I needed to create a simple Analysis Services cube for the demonstration and I understood it as an opportunity to talk a bit about BI too, subject that I work in my day by day and really, I hardly write about. So I decided to create this post demonstrating the step-by-step for creating this cube.
Some BI Concepts
Multidimensional modeling
Multidimensional modeling, or dimensional modeling as it is sometimes called, is the database modeling technique for assisting Data Warehouse queries from a variety of perspectives. Multidimensional view enables the most intuitive use for analytical processing by the Online Analytical Processing (OLAP) tools.
Every dimensional modeling has two essential elements: the Facts tables and the Dimensions tables. Both are mandatory and have complementary characteristics within a Data Warehouse.
Online Analytical Processing (OLAP)
OLAP has a set of techniques for handling the data contained in the multidimensional view of the Data Warehouse. OLAP tools can be of different types: MOLAP, ROLAP, or HOLAP.
Multidimensional OLAP (MOLAP) is the type of tool that uses multidimensional database structure. Relational OLAP (ROLAP) uses the relational data architecture, where the database has the traditional structure. Already the hybrid OLAP (HOLAP) is the junction of the previous two, using the best aspects and features of each of the two types.
Dimensions
The Dimensions are the descriptors of the data from Fato. It has the qualitative character of the information and the “one to many” relationship with the Fact table. It is the Dimension that allows the visualization of information from different aspects and perspectives. Ex: Customer registration, payment methods registration, etc.
Measures:
Measures are numerical values that represent Fact and the performance of a business indicator associated with dimensions. Ex: Sales Amount, Sales Amount, Average Ticket, etc.
Suits
The Facts contains the metrics. It has the quantitative character of the descriptive information stored in the Dimensions. It is where business events are stored and has a “many to one” relationship with the peripheral tables (Dimension). It is a table that has only the measurements and the ID's that are linked to the dimensions to display the complete information regarding the data that will be viewed.
Multidimensional modeling models
Dimensional modeling has two models: the star schema model and the snow flake model. Each with different applicability depending on the specificity of the problem.
The dimensions of the star model are denormalized, unlike snow flake, which partially has normalization. The relational structure differs from the multidimensional structure mainly due to normalization, little redundancy and the frequency of supported updates. The multidimensional structure typically has table denormalization, high redundancy, and supports much shorter data refresh times than a conventional relational structure.
Generating Dimensions and Facts (ETL)
In this part of cube creation, I will generate the dimensions and facts with random data. Of course this data is for demonstration purposes only. In a real scenario, random data should not be used in BI analyzes to create multidimensional cubes, as the analysis would not have a real value.
For the random generation of dates I used the function fncRand (), available in post SQL Server - Msg 443 Invalid use of a side-effecting operator 'rand' within a function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 |
------------------------------------------------------------------------ -- CRIAÇÃO DAS DIMENSÕES ------------------------------------------------------------------------ IF (OBJECT_ID('dbo.Dim_Cliente') IS NOT NULL) DROP TABLE dbo.Dim_Cliente CREATE TABLE dbo.Dim_Cliente ( Codigo INT IDENTITY(1, 1), Ds_Nome VARCHAR(100), Dt_Nascimento DATETIME, Sg_Sexo VARCHAR(20), Sg_UF VARCHAR(2), ) INSERT INTO dbo.Dim_Cliente VALUES ('Dirceu Resende', '1987-05-28', 'Masculino', 'RJ'), ('Cliente 1', '1971-01-15', 'Feminino', 'ES'), ('Cliente 2', '1955-02-05', 'Masculino', 'ES'), ('Cliente 3', '1997-08-07', 'Masculino', 'RJ'), ('Cliente 4', '2001-11-14', 'Masculino', 'MG'), ('Cliente 5', '1985-12-13', 'Masculino', 'SP'), ('Cliente 6', '1982-09-22', 'Masculino', 'ES') IF (OBJECT_ID('dbo.Dim_Forma_Pagamento') IS NOT NULL) DROP TABLE dbo.Dim_Forma_Pagamento CREATE TABLE dbo.Dim_Forma_Pagamento ( Codigo INT IDENTITY(1, 1), Ds_Nome VARCHAR(100) ) INSERT INTO dbo.Dim_Forma_Pagamento VALUES ('Dinheiro'), ('Cheque'), ('Boleto'), ('Cartão de Crédito') IF (OBJECT_ID('dbo.Dim_Produto') IS NOT NULL) DROP TABLE dbo.Dim_Produto CREATE TABLE dbo.Dim_Produto ( Codigo INT IDENTITY(1, 1), Ds_Nome VARCHAR(100), Peso INT, Categoria VARCHAR(50), Preco FLOAT ) INSERT INTO dbo.Dim_Produto VALUES ('Toalha', 25, 'Cama, Mesa e Banho', 19.99), ('TV 55', 3200, 'Eletro', 3500), ('TV 42', 2500, 'Eletro', 2359.70), ('Celular Top Android Novo', 120, 'Celulares', 1890), ('Celular Top iOS Usado', 114, 'Celulares', 4999.99), ('Cama Box', 7510, 'Cama, Mesa e Banho', 1249.99), ('Toalha de Rosto', 15, 'Cama, Mesa e Banho', 12.99), ('Prato', 250, 'Cozinha', 34.80), ('Talher', 25, 'Cozinha', 22.50), ('Panela', 250, 'Cozinha', 69.80), ('Microondas', 1450, 'Eletro', 369.99), ('Encosto de Mesa', 35, 'Cama, Mesa e Banho', 15.50) ------------------------------------------------------------------------ -- CRIAÇÃO DA FATO ------------------------------------------------------------------------ IF (OBJECT_ID('dbo.Fato_Venda') IS NOT NULL) DROP TABLE dbo.Fato_Venda CREATE TABLE dbo.Fato_Venda ( Cod_Cliente INT, Cod_Produto INT, Cod_Forma_Pagamento INT, Dt_Venda DATETIME, Vl_Venda FLOAT ) DECLARE @Contador INT = 1, @Total INT = 1000 WHILE(@Contador <= @Total) BEGIN INSERT INTO dbo.Fato_Venda SELECT TOP 1 (SELECT TOP 1 Codigo FROM dbo.Dim_Cliente ORDER BY NEWID()) AS Cod_Cliente, Codigo AS Cod_Produto, (SELECT TOP 1 Codigo FROM dbo.Dim_Forma_Pagamento ORDER BY NEWID()) AS Cod_Forma_Pagamento, DATEADD(DAY, dbo.fncRand(1885), '2012-01-01') AS Dt_Venda, Preco AS Vl_Venda FROM dbo.Dim_Produto ORDER BY NEWID() SET @Contador += 1 END |
Viewing the random mass of tests:
1 2 3 4 |
SELECT * FROM dbo.Dim_Cliente SELECT * FROM dbo.Dim_Forma_Pagamento SELECT * FROM dbo.Dim_Produto SELECT * FROM dbo.Fato_Venda |
Creating the cube in Visual Studio (Data Tools)
Formerly known as Business Intelligence Development Studio (BIDS) or Visual Studio Shell, Microsoft Data Tools has been incorporated into Visual Studio as of the 2015 release and is now part of a single product and is now just a Visual Studio plugin that can be downloaded accessing this link.
To start, open SQL Server Data Tools 2015 and from the File> New> Project menu and select the project type “Analysis Services Multidimensional and Datamining Project”.
Creating a new data source
View contentCreating a new Data Source View
View contentCreating a new cube
View contentManaging Permissions
View contentCreating the Time Dimension
View contentProcessing the cube and publishing to Analysis Services
View contentThat's it folks!
Hug!
Good Morning. Congratulations on the tutorial. But I was left with a doubt: When I am going to process Dimension it is giving me an incorrect login and password error. The access data is already correct, the user is an OS administrator and the SQL Server services are all active. Do you have any idea what it might be?
Dirceu, great cube material, Congratulations!
I have a doubt, as I would do after cubes were ready and wanted to insert a new field, would you need to give UPDATE in fact with the new field?
Hugs!
Hey Rafa, how are you?
You have to run the alter table in fact to insert the new field, using the update later to populate the new field data.
After you have updated dsv in data tools, make the necessary adjustments to the cube and reprocess the cube.
Congratulations, very good.
Glad to hear you liked it 🙂
Dirceu Resende ...
Wow, you're the man! May God continue to bless you.
I wish you all the best. What a complete and amazing article. It was what I needed.
Hugs from an Angolan,
Edilásio Paulo
Good day.
I've been looking for such material for a long time, I after a lot of research, no doubt this was the best and most complete I found, I just have to thank this sharing of information.
May God continue to bless you so much, a great hug.
Marcelo,
Good afternoon.
Thank you for your feedback!! This is what motivates me to keep posting.