|
Data modeling has become mission-critical for both traditional data-management contexts and emerging information-worker contexts. It is in the midst of a transition from a historically specialized and often arcane practice to a mainstream technique with significant value for database specialists, information workers, and anyone charged with ensuring regulatory compliance. Although database-focused specialists will remain voracious consumers of data models, others can and must begin to leverage data models and data modeling. These new beneficiaries include technologists other than database specialists; they also include all members of an enterprise who interact with enterprise data. Definitions As the reach of data modeling has grown, its vocabulary has grown accordingly. Even the term “data model” has multiple definitions. To a certain extent, this is appropriate; the various definitions distinguish among the various types of data modeling and data models. The Basics In general, a data model is a set of assertions about the kinds of information that an enterprise considers worth remembering. At a minimum, a data model must include: Named categories of data: For example, a data model for a hospital could include a category called patient and a category called disease.
Named traits that describe the categories of data: For example, the patient category might include a trait accommodating the patient’s date of birth.
Relationships that indicate how members of one category can be associated with members of other categories: For example, the patient category can be related to the disease category (because individual patients can have diseases, and individual diseases can afflict patients).
Ways to distinguish any member of a category from all other members of that category: For example, the patient category can include an arbitrary identifier known as patient ID. In Figure 1, the data model shows two categories of data: Customer and Dialogue. In most modeling notations, such categories are known as entities. (Note: Different tool vendors and notation designers use terms that vary from slightly to significantly. “The Details” section of this overview highlights those conflicting vocabularies.) The data model also shows eight traits that describe the categories. These traits are typically called attributes. The attributes are: customer ID, customer name, industry, address, subscription renewal date, customer dialogue number, dialogue date, and dialogue topic. In addition, the data model shows how customers can be associated with dialogues. Such associations are typically known as relationships. In the diagram, the relationship appears as a line connecting the two boxes. The line touches two boxes because the relationship contains two assertions—one about the dialogue entity and one about the customer entity. Each of these assertions is called a link. One link indicates that each dialogue can have one customer. The other link indicates that each customer can have many dialogues. For each entity, the data model shows how instances of that entity are distinguished from each other. That is, it shows an identifier for each entity. The identifier of customer consists of the attribute customer ID—indicated with a horizontal bar under the attribute. The identifier of dialogue is the combination of customer and customer dialogue number. The diagram asserts this identifier with two bars: a horizontal one under the attribute customer dialogue number and a vertical one crossing the relationship line (near the dialogue entity). Reading a Data Model A data model is a set of assertions. These assertions can be expressed rigorously in English. For example, the sample model shown in Figure 1 contains exactly these assertions (and no others): We can remember customers. About each customer, we can remember its customer ID, customer name, industry, address, subscription renewal date, and dialogues. Every customer must have a customer ID, which distinguishes that customer from any other customer we might choose to remember.
We can remember dialogues. About each dialogue, we can remember its customer dialogue number, its dialogue date, its dialogue topic, and its customer. Each dialogue must have a customer and a customer dialogue number; we distinguish any dialogue from all other dialogues with the combination of its customer and its customer dialogue number. Two dialogues could have the same customer or the same customer dialogue number, but no two dialogues can have the same customer and the same customer dialogue number.
|