Star Schema vs Snowflake Schema vs
Fact Constellation Schema
Star Schema
The Star Schema is the simplest data warehouse scheme. When we data warehousing with the star schema, database diagram resembles a star design. That is why this schema called as star schema. One fact table surrounded by dimension tables and if all the dimension tables are parent table then it's star schema data warehouse model. In this schema, a dimension cannot be a child table to a parent table. That means, dimension tables are in denormalized form and all the attributes which are defining the dimension will be in that particular dimension table. There is only one join statement between fact table and the dimension table. The data redundancy is high in this schema. So we need more storage capacity but simple queries are enough to retrieve data and we can get good query performance through this data warehouse model.
Snowflake schema is a more complex data warehouse model than a star schema. In this schema, there should be at least one dimension table will have hierarchy pattern. That means dimension tables have child tables. Dimension tables are in normalized form in this schema. It reduces data redundancy and need of storage capacity. But query performance is less than star schema and need complex queries to retrieve data.
Ex: Snowflake Schema |
Fact Constellation Schema
The Fact Constellation schema may required for sophisticated applications like multiple fact tables share dimension tables. This schema is similar to a collection of stars. Fact constellation schema also called as galaxy schema.
Ex: Fact Constellation Schema |
Summary:
Star schema is the simplest schema and fact constellation schema is the complicated schema. Star schema is most efficient in query performance but needs more storage capacity. Snowflake schema is poor in query performance but less in data redundancy. Every schema has good and bad but the choice will be made on business domain and the requirements.
Published: 09.07.2017
Different Ways Of Generating Data For Date Dimension Table
Time is the most precious thing in the world. In business
domain, time is money and everything. When we are doing data warehouse, we need
date dimension or using it to track business trends, analyze data and figure out
changes in the business pattern over the time. There are many ways and methods for generating
data to populate date dimension. Generate date with SQL Statement and generate
date in Excel and migrate it to database table are two methods from that many.
GENERATE DATE WITH SQL STATEMENT
We can get the data what we need for populating date
dimension through a SQL Statement as follow:
01. Create a table with CREATE statement.
01. Create a table with CREATE statement.
CREATE TABLE Table_Name (
[DateID] int NOT NULL IDENTITY(1, 1),
[Date] datetime NOT NULL,
[Year] int NOT NULL,
[Month] int NOT NULL,
[Day] int NOT NULL,
[Quarter] int NOT NULL,
CONSTRAINT Constraint_Name PRIMARY KEY CLUSTERED (DateID)
)
02. Generate data and insert it into the table
DECLARE @StartPoint datetime
DECLARE @EndPoint datetime
/*
*assign values of the first and the last day of the range
*into the StartPoint and the EndPoint respectively
*/
*assign values of the first and the last day of the range
*into the StartPoint and the EndPoint respectively
*/
SET @StartPoint = '01/01/2017'
SET @EndPoint = '12/31/2017'
/*
*using a while loop increment from the first date
*to the last date
*/
DECLARE @LoopRunner datetime
SET @LoopRunner = @StartPoint
WHILE @LoopRunner <= @EndPoint
BEGIN
-- add each record one by one into the table for this date
INSERT INTO Table_Name VALUES (
@LoopRunner,
Year(@LoopRunner),
Month(@LoopRunner),
Day(@LoopRunner),
CASE
/*
*using a while loop increment from the first date
*to the last date
*/
DECLARE @LoopRunner datetime
SET @LoopRunner = @StartPoint
WHILE @LoopRunner <= @EndPoint
BEGIN
-- add each record one by one into the table for this date
INSERT INTO Table_Name VALUES (
@LoopRunner,
Year(@LoopRunner),
Month(@LoopRunner),
Day(@LoopRunner),
CASE
WHEN Month(@LoopRunner) IN (1, 2, 3) THEN 1
WHEN Month(@LoopRunner) IN (4, 5, 6) THEN 2
WHEN Month(@LoopRunner) IN (7, 8, 9) THEN 3
WHEN Month(@LoopRunner) IN (10, 11, 12) THEN 4
WHEN Month(@LoopRunner) IN (4, 5, 6) THEN 2
WHEN Month(@LoopRunner) IN (7, 8, 9) THEN 3
WHEN Month(@LoopRunner) IN (10, 11, 12) THEN 4
END
)
/*
*increment the LoopRunner by 1 to get the next date
*then start the loop again
*/
SET @LoopRunner = DateAdd(d, 1, @LoopRunner)
END
)
/*
*increment the LoopRunner by 1 to get the next date
*then start the loop again
*/
SET @LoopRunner = DateAdd(d, 1, @LoopRunner)
END
GENERATE DATE THROUGH EXCEL AND TRANSFER IT INTO DATABASE TABLE
We create a table in excel with the records of the date
dimension table and then create a table in database using a create statement.
After that import the excel records into the table through import wizard.
Summary:
There are many ways out there to generate data for populate
the date dimension. Above we have discussed two ways, one is using SQL
Statement and another one is using excel to generate data and migrate it into
the database table through import wizard. SQL Statement is hard for the people
who are not in technology domain and Excel method will be easy for them. But in
SQL Statement method we can do everything straight away in the database. So
there will not be any compatible error or other exceptions.
Published: 17.07.2017
No comments:
Post a Comment