Algorithmic Design And Data Structures
In computing, we desire to solve problems quickly. Put information in a database, and you replace the old way of going to a file cabinet, digging up the records, and looking for the relevant information on some piece of paper in the folder. We know computers can find this information much faster than a person rifling through the files, but that depends on the information being captured into the database in the first place.
When we discuss algorithmic design, we look for a sufficiently efficient means of accomplishing the goal. If I need information from 10 files, sending Aunt Edna to the records storage center is a reasonable solution, and we can expect her to find an answer in an hour or two. But this assumes the records are stored in a way that allows her to quickly find them. If the files are stored alphabetically, she can narrow the search to just the cabinets with records for that name. If the records are stored by social security number, she can reference the master index and go directly to each record she needs.
It seems obvious that the more records we ask Edna to get, the longer we need to expect to wait. When we discuss algorithmic design, we look at time complexity concerns, such as how long we can afford to wait. And regarding space complexity, how much data do we have to handle to determine the answer we need?
In the example of asking Edna to look records up by social security number, we are giving her a problem where she compares a record to an index and goes directly to that file. In algorithmic analysis, we will use a notation of O() to show the relative complexity of possible solutions. Since Edna only has to look at the exact record in the index, we call this O(1). There is one answer, and she can go directly to this answer. By presorting the records with a unique key, we have enhanced the efficiency of finding the records.
Contrast this with asking Edna to find a record in an unsorted series of boxes dumped into the basement. She will have to look through each box and each record until she finds the record she wants. We refer to this as O(n). N is the number of records.
It is more common for files to be stored alphabetically by last name, then first name. In such a situation, Edna can ignore the file cabinets that do not contain records for that last name. When an operation allows us to remove records from consideration and narrow our search, we have achieved O(log n).
If Edna works in an office with 10 records total, the choice of method above does not really matter. She can handle 10 records in 10 boxes in a reasonable amount of time. But what happens when we add more records? Edna will get frustrated if we just dump boxes into the basement. At 100 records, she will ask for a raise. At 1000 records, she will be rethinking her life choices, and at 10000 records, she will quit. It is easy to recognize how an O(n) search at 10000 records will take Edna a great deal of time, raising her frustration even before dealing with her boss or customers having to wait.
Choosing how to organize information is part of simplifying the problem. If we pre-sort the records as we store them, retrieving them simplifies. Choosing the appropriate algorithm is evaluated by the number of steps to achieve the desired result. If Edna has to open each file, look for the appropriate document in the file, and then find the appropriate row, we see that the number of steps she will have to take in total grows with the number of records and files in each record.
| A complex mess! |
Knowing how many records we are working with and the number of operations needed allows us to make the tradeoffs in selecting data structures and algorithms. Inefficient, simple algorithms are good enough for the job when dealing with a small information set. Those same steps become frustrating as the number of records grows. The choice of data structure can significantly affect the speed of our operations. We can place records in a hash set if our information has unique natural keys (a natural key identifies only one object). A hash set allows direct access to the record through the key. This provides a lookup of O(1), our best scenario. More complex lookups will require an index to filter the number of records to compare against and provide an O(log n) solution. The worst case is when we have unordered complex information that must be parsed repeatedly O(n^2). Using this evaluation method, we can make more informed decisions before throwing the computer at the problem. We can give ourselves meaningful answers and avoid the computer (or Aunt Edna) throwing in the towel.
Comments
Post a Comment