Common SQL interview questions.
Q1. What is normalization? Explain different levels of normalization?
Normalization is the process of efficiently organizing data by eliminating redundant data and maintaining dependencies.
Summary: First Normal Form
- A relation is in 1NF if it contains no repeating groups
- To convert an unnormalised relation to 1NF either:
- Flatten the table and change the primary key, or
- Decompose the relation into smaller relations, one for the repeating groups and one for the non-repeating groups.
- Remember to put the primary key from the original relation into both new relations.
- This option is liable to give the best results.
Summary: Second Normal Form
- A relation is in 2NF if it contains no repeating groups and no partial key functional dependencies
- Rule: A relation in 1NF with a single key field must be in 2NF
- To convert a relation with partial functional dependencies to 2NF. create a set of new relations:
- One relation for the attributes that are fully dependent upon the key.
- One relation for each part of the key that has partially dependent attributes
Summary: Third Normal Form
- A relation is in 3NF if it contains no repeating groups, no partial functional dependencies, and no transitive functional dependencies
- To convert a relation with transitive functional dependencies to 3NF, remove the attributes involved in the transitive dependency and put them in a new relation
- Rule: A relation in 2NF with only one non-key attribute must be in 3NF
- In a normalised relation a non-key field must provide a fact about the key, the whole key and nothing but the key.
- Relations in 3NF are sufficient for most practical database design problems. However, 3NF does not guarantee that all anomalies have been removed. See example
Q2. What is denormalization and when would you go for it?
As the name indicates, denormalization is the reverse process of normalization. It’s the controlled introduction of redundancy in to the database design. It helps improve the query performance as the number of joins could be reduced. Generally done in the data warehouse systems.
Q3. What is a Primary Key?
A primary key is a column or combination of columns which uniquely identifies a specific a row in a table. Primary key does not allow NULL values.
Q4. What is Unique Key?
A unique key is a column or combination of columns which uniquely identifies a specific a row in a table. Unique key allows one NULL value.
Q5. What is a Surrogate Key?
A surrogate key is used as a substitution for the natural primary key in situations where the natural primary key values change over time. Generally surrogate key is a number(sequence) or a unique identifier. Commonly used in Data warehouse dimension tables.
Q6. What’s the difference between a primary key and a unique key?
Both primary key and unique enforce uniqueness of the column on which they are defined. But by default primary key creates a clustered index on the column, where are unique creates a nonclustered index by default. Another major difference is that, primary key doesn’t allow NULLs, but unique key allows one NULL only.
Q7. How do you implement one-to-one, one-to-many and many-to-many relationships while designing tables?
- One-to-One relationship can be implemented as a single table and rarely as two tables with primary and foreign key relationships.
- One-to-Many relationships are implemented by splitting the data into two tables with primary key and foreign key relationships.
- Many-to-Many relationships are implemented using a junction table with the keys from both the tables forming the composite primary key of the junction table.
Q8. Define candidate key, alternate key and composite key.
- A candidate key is one that can identify each row of a table uniquely.
- Generally a candidate key becomes the primary key of the table. If the table has more than one candidate key, one of them will become the primary key, and the rest are called alternate keys.
- A key formed by combining at least two or more columns is called composite key.
Q9. What is a database index?
- A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of slower writes and increased storage space.
- Indexes can be created using one or more columns of a database table, providing the basis for both rapid random look ups and efficient access of ordered records.
- The disk space required to store the index is typically less than that required by the table (since indexes usually contain only the key-fields according to which the table is to be arranged, and exclude all the other details in the table), yielding the possibility to store indexes in memory for a table whose data is too large to store in memory.
- In a relational database, an index is a copy of one part of a table. Some databases extend the power of indexing by allowing indexes to be created on functions or expressions.
- Indexes may be defined as unique or non-unique. A unique index acts as a constraint on the table by preventing duplicate entries in the index and thus the backing table.
Q10. Explain different types of index architectures?
Index architecture generally classified into Clustered indexes and Non-Clustered indexes.
Clustered Index: In a clustered index, all the rows in the table are stored in the order of the clustered index key. So, there can be only one clustered index per table. Basically the data is re-ordered and stored everytime when new records are added.
Non-clustered Index: The non-clustered index tree contains the index keys in sorted order, with the leaf level of the index containing the pointer to the page and the row number in the data page. Typically created on column used in JOIN, WHERE, and ORDER BY clauses.