From Data to Objects
Article Index
From Data to Objects
Records

Banner

You could say that first you learn programming and then you graduate to “data-ing”. Personally I think that Niklaus Wirth said it better in the title of his classic text on computer science (now scandalously out of print!)

“Algorithms+ Data structures =Programs”

Here I am going to deal with two fundamental  data structures- the array and the record. The array leads on to other more interesting data structures - mainly based on the associative array or dictionary. The record on the other hand leads on to one of the nicest ideas in programming - the object.

Array

The most basic of data abstractions is the variable. It's an area of memory that you can store one item of data in. Of course exactly what you mean by "one item" of data varies according to what you are doing. An integer is clearly one item of data but a string of characters? The key idea is that for the current purpose a variable stores a chunk of data that has no structure of its own i.e. it is what is stored in the variable.

Variables are all very well but you quickly discover that they aren’t enough. For example if you want to store a table of numbers then you really do need something more than a single variable. The most common way of storing tables of data is to use an “array”.

This is an “indexed” collection of variables and the meaning of “indexed” will become apparent in a moment. For example, in Basic you might write:

 DIM A(10)

This creates an array with ten “elements” - A(1), A(2), A(3) and so on to A(10). Each element is a variable in its own right and you can store a single item of data in it. For example:

 A(3)=15

stores 15 in A(3).

 

fig1

An array of variables

Why bother with an array? Why not just use 10 distinct variables? The answer is this index business. The power of an array comes from being able to pick out which element of the array you are interested in using by way of a variable. For example:

I=3
A(I)=15

again stores 15 in element 3 of the array.

In this case I is called the “index” and using this sort of idea you can write programs that lookup information in arrays such as “give me the 56th element” or process the entire array by changing the index as part of the working of the program. Notice that the key idea is the ability to select which variable you are referring to at run time. That is you can write A1 or even A(1) and this will always refer to the same variable but A(I) is anything you like depending on the value that I has at the time. Arrays bring dynamic to data.

<ASIN: 0130224189>

<ASIN:0672324539>

<ASIN:0201000237>

Getting inside

At this point many accounts of data start to explain exactly how data is stored in a machine. This is important because how it works affects how quickly it works and how much memory you need to implement it. For example an AGE array designed to store 50 million ages would need something like 100Mbytes of storage to store a 2-digit age for everyone. (Roughly speaking you need 1 byte per digit.)

Now while this sort of information is essential to building practical working systems it could be argued that efficiency is something for the hardware and compiler people to sort out.

This is a philosophical turn of mind – are we being pure or practical?

For example, once you have invented the simple array you might well go on to invent something that you deem to be a small extension but most programmers have been taught to consider as something completely different.

If you can use an index like 3 or 7 or 8 why not use an index like 3.5 or 6.7 or “john” or anything really. Why not use an array that will store peoples ages so that they can be looked up by name e.g. AGE(“john doe”)?

The answer is that you can but the difficulty of implementing such an array causes programmers to think of it as something different. Such an array is usually called a “collection”, a “dictionary” or an “associative array”.

The basic idea is the same as an array but there are some important differences that might just be important enough to merit the change of name. The first obvious difference is that you can’t “step through” an associative array.

For example, to see all of A(I) then you simply set I to 1, 2, 3 and so on to 10. To see all of AGE(name) you haven’t a clue as to how to vary name in an obvious or natural sequence to go though everyone. There also isn’t an obvious first or last element in AGE and there isn’t even an obvious “next” or “previous” element in general. These are the real reasons for considering an associative array to be something different and new – not the difficulty of actually implementing it!

So how would you implement an associative array? Well the simplest way would be to use two standard arrays – store the name in one and the age in the other at the same index value you stored the name.

Now when you want to look up an age you have to scan through the NAME array until you find the name, stored at index I say and then you lookup the age stored in the age array as AGE(I).

You can see that this isn’t very efficient as soon as you consider searching through 60 million names in random order. You can improve the search time by keeping the array of names in alphabetical order but now you are really getting complicated because you have to sort the arrays and keep them in order if a new name has to be added or deleted! It is easy to see that this gets very complicated very quickly and hence most programming languages don’t offer associative arrays as a fundamental language facility. Python and Perl are exceptions.

fig2

An associative array – look up in one array, return the answer from the second

<ASIN:0521880378>
<ASIN:0072253592>

<ASIN:0131997467>



 
 

   
Banner
RSS feed of all content
I Programmer - full contents
Copyright © 2012 i-programmer.info. All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.