Many software developers are intimidated by working with backend code, most explicitly working with databases. Understanding how information flows is a crucial part of software development. For this week’s topic, the discussion will focus on the concept of data modeling. They are specifically understanding first, second, and third normal forms. The question answered is why the third normal form is an excellent basis for modeling the real world and application development. Vahid states, “relational database stores information in tables composed of rows and columns.” The concept of an entity relationship diagram (ERD) captures how this information is stored. From Jajodia, we learn, "An entity is an object of interest which can be distinctly identified; a specific person or place or event is an example of an entity.” Entities will be things like a person, a car, a product order, or items sold on the order. Relationships are used to establish the number and manner of connections between entities. For example, an order will have one customer and one or more order items. The purpose of this modeling is to reduce redundancy. For example, when a customer’s phone number changes, it is preferable not to have to change it in multiple spots. Link states, “Data redundancy can speed up the evaluation of queries but is typically the source of inconsistency and processing difficulties with update operations.” Understanding this tradeoff greatly helps in communicating software systems and architecture.

A description of normal forms follows. Each is additive, as the rules of the second normal form (2NF) assume all the criteria of 1NF are met. 
    

  • 0NF: Not normalized. Data contains repeating attributes.  Home addresses, Street addresses, and Shipping addresses are in the same column.
  • 1NF: No repeating Groups. A customer and a contact are separate tables. Customers may have multiple contact IDs relating to the Contact table.
  • 2NF: Eliminate redundant data: Each table has its own key columns. The idea of relationship tables is introduced. Customers no longer have the contact IDs as part of their records. Instead, we added a customer-contact table to establish this relationship.
  • 3NF: Eliminate Transitive Dependency: No properties that are not related to the entity. We would not keep an account manager's address id on the customer record. Instead, we would add the account manager ID to the customer, followed by a new AccountManager table that could have additional dependencies to an address table. 

    Understanding data modeling is a key concept in understanding modern software design. Once a data model is established and entity relationships are defined, the shape and functionality of the software become apparent. The proper structure of data decreases the amount of storage needed as data is not repeated while increasing the speed of systems as it becomes possible to request just the data needed for a process. This reduces hardware costs and allows the tuning of database and web operations in the web to use a subscription level that is appropriate for service utilization. Efficient database queries go a long way in reducing the overall costs and times of this processing. 


References

Jajodia, S., Ng, P. A., & Springsteel, F. N. (1983). The problem of equivalence for Entity-Relationship Diagrams. IEEE Transactions on Software Engineering, Software Engineering, IEEE Transactions on, IIEEE Trans. Software Eng, SE-9(5), 617–630. https://doi.org/10.1109/TSE.1983.235262

Link, S. (2021). Embedded Functional Dependencies and Data-completeness Tailored Database Design. ACM Transactions on Database Systems, 46(2), 87-132. https://10.1145/3450518

Vahid, F., Lysecky, S., Wheatland, N., & Siu, R. (2019). TEC 101: Fundamentals of Information Technology & Literacy | zyBooks

Comments