I gotta have my orange juice.

Jesu, Juva

Archive for October 2004

Introduction to Pointers in C++

leave a comment »

Originally written in February 1998. At least one person seems to have found it useful. It’s a little incomplete, because it doesn’t deal with malloc and free, nor with C++ references. If you find this useful, feel free to copy and pass along, with attribution. Thanks!

Basics

All data and code are stored in memory. The location in memory where they are stored is known as the address of that data or code. Usually they are accessed through variable names that represent them, such as counter, printf, etc. We can, however, also access data using its address, rather than a formal name. This is done using pointers, special variables which store the address of data. Following are several annotated examples of simple pointers at work.

int*   x;                             // Declare x, a pointer to an integer.
int    y;                             // Declare y, an integer.
float* r;                             // Declare r, a pointer to a float.
float  s;                             // Declare s, a float.

x = &y;                               // x gets y's address -- it points to y.
r = &s;                               // r gets s's address -- it points to s.

The next few are a tad trickier. We use the “*” to dereference the pointer. Basically, this means to access whatever it is the pointer is pointing to. You can think of it as counteracting the “*” used in the declaration of the pointer; they neutralize each other, making the result a regular variable.

*x = 15;                              // Set value pointed to by x -- y -- to 15.
cout << *r;                           // Print value pointed to by r: s.

Complications

If it were that simple, of course, then nobody would have trouble with pointers. The fact of the matter is, however, that there are a number of complications — extensions to the idea of pointer — that can be hard to keep track of.

Pointers as Lists

The first is the idea of pointers being equivalent to lists. This is a crucial idea in C and C++. Essentially, instead of thinking of a pointer as pointing to a single variable, you can think of it as pointing to the first variable in a list of variables. Likewise, a list can be accessed without any subscripts to find the pointer to the first element in the list. It works like this:

int* x;
int  y[15];
 
x = new int[8];                       // Allocate array of 8 integers.

*y   = 8;                             // Set first element of y-list to 8.
x[3] = 7;                             // Set fourth element of x-list to 7.

Note how pointer notation can be used for the list y, and how list notation can be used for the pointer x.

This brings up a similar topic: pointer arithmetic. Since a pointer is a memory address, you might think that adding 1 to a pointer would simply make it point to the next byte of memory. The C compiler, however, is smarter than that; it realizes that you’re adding something to a pointer, you probably want to make it point to the next element of whatever you’re pointing at. So instead of adding whatever you specified to the pointer, it adds that times the size of the object the pointer points to. For example:

int* x = new int[8];

x++;                                  // Add four to x pointer, to
                                      //   point at next integer.
x[0] = 5;                             // This was originally the second
                                      //   element in the array.
x--;                                  // Subtract 4 again from pointer.
*(x + 2) = 6;                         // Set third element (second after the
                                      //   first) to 6.

cout << x[1];                         // Will now print "5".
cout << x[2];                         // Prints "6".

Pointers to Pointers (aka “The Middleman”)

Another pointer curiosity that C throws our way is pointers that point to other pointers. This may seem like a needless feature, but it comes in very handy when you have multidimensional data whose size you don’t know before-hand. You can then use these pointers to pointers to set up an arbitrary-sized multidimensional array.

It works like this: you can think of a pointer to a pointer as being essentially a list of lists. It’s kind of like the words in the dictionary: the first pointer tells you where to find each of the lists for the letters of the alphabet. Each letter is then itself a pointer that forms a list (by pointing to the first element) of all of the words beginning with that letter. If you add another dimension (and make a pointer to a pointer to a pointer), you can have each word also be a list, pointing to the first out of several different meanings for the word.

Here’s how it works in C++. The following program reads in a table of numbers and finds the average for each row and column. The first two numbers it reads in tell how many rows and columns there are. The rest of the numbers are the ones in the table.

#include <iostream.h>

void main(void)
{
  int** table;                        // *Two*-dimensional pointer.
  int   rows, cols, i, j, sum;        // Dimensions of table.

  cin >> rows >> cols;                // Find out the number of rows & cols.

// What we'd really like to do here is say:
//   table = new int[rows][cols];
// Unfortunately, this doesn't work in C++, since it instead tries to set up
// a one-dimensional array like this:
//   table = new int[rows * cols];
// And a one-dimensional array (a pointer to integers) is absolutely
// incompatible with a two-dimensional array (a pointer to a pointer to
// integers), so our program will crash.  Note that the syntax above will
// work, however, in Java.

  table = new (int*)[rows];           // Allocate the rows.
  for(i = 0; i < rows; i++)
    table[i] = new int[cols];         // Allocate each row's columns.

  for(i = 0; i < rows; i++)           // Find row sums.
  {
    sum = 0;
    for(j = 0; j < cols; j++)
      sum += table[i][j];
    cout << "Row sum for row " << i << ": " << sum << endl;
  }

  for(j = 0; j < cols; j++)           // Find column sums.
  {
    sum = 0;
    for(i = 0; i < rows; i++)
      sum += table[i][j];
    cout << "Col sum for col " << j << ": " << sum << endl;
  }
}

Notice how we had to explicitly allocate both dimensions of the array, starting with the first dimension, the rows; and then for each row we allocated its columns. You may be wondering why the outer dimension is allocated using “new (int*)[n]”, while the inner dimension is allocated using “new int[n]”. This is because each dimension but the last is a pointer to the next dimension. The final array dimension is obviously simply a list of integers, so in the inner loop we merely allocate a list of integers. The next dimension up, however, is not a list of integers; it’s a list of lists of integers. As such, each entry in this list will itself be a pointer to a list of integers — the final dimension. Therefore, the code allocating the outside dimension must allocate a list of pointers to integers.

If we had additional dimensions, for each column in each row we would then have to allocate the list of cells, and so forth. Here’s how a three-dimensional array allocation might look like. Assume that x, y, and z are the size of each array dimension. Notice how the outer dimension is now a list of (int**)’s — a list of lists of lists — and the second dimension is now the one that is (int*) — a list of lists — while the final dimension is still a list of integers.

int    i, j;
int*** array3d;
 
array3d = new (int**)[x];
for(i = 0; i < x; i++)
{
  array3d[i] = new(int*)[y];
  for(j < 0; j < y; j++)
    array3d[i][j] = new int[z];
}

Written by Scott Moonen

October 2, 2004 at 12:44 pm

Posted in C/C++

Tagged with , , , ,

Python persistence

leave a comment »

I’ve implemented a rudimentary persistent object store in Python. It is implemented as a Python extension module that implements a set of persistent types: strings, integers, floats, lists, and dictionaries. Each of these are backed by the persistent object store, which is implemented using a memory-mapped file.

In addition, using a specially crafted Python base class for Python objects, Python objects may be stored in the object store (as dictionaries) and instantiated out of the object store.

The result is an persistent object graph (the root of which is a persistent dictionary) whose objects and attributes may be manipulated in-place using native Python syntax. Rudimentary locking is provided so that multiple Python threads / processes may concurrently manipulate the object store.

Details

Some aspects of this system are:

  • It is a Python extension module written in C and C++.
  • It is tested on Linux. It will likely work on *BSD systems, though it is possible that the location of the mapped storage may need to be moved.
  • It is implemented in a hierarchical manner:
    • A page manager handles the allocation of 4kB pages within a memory-mapped file. It is multi-process safe. It is, in a sense, a glorified sbrk() for a memory-mapped file.
    • A heap manager abstracts the page manager’s services to manage the allocation and deallocation of arbitrary-sized storage segments within the memory-mapped file. It is essentially a malloc() and free() for a memory-mapped file. This is also multi-process safe.
    • An object manager manages five new base types (persistent int, float, string, list, and dictionary) backed by persistent storage, using the heap manager’s services. It also provides rudimentary locking facilities for concurrency-safeness.
    • The persist Python extension uses the object manager’s services to implement persistent types that mimic the equivalent Python types. Additionally, it has the ability to reinstantiate a Python object that was stored as a dictionary (using the appropriate Python base class). The object manager’s locking facilities are made available for application usage.
  • Only one file may be mapped at a time (because it is mapped to a fixed logical address).
  • It is available for use under the MIT license. Contact me if you are interested in using it.

Examples

Some examples of its use are a simple counter:

import persist

root = persist.root()
root.lockExcl()
try :    root['x'] += 1                # Increment counter
except : root['x'] = 1                 # First pass; initialize
print "Content-type: text/html\n"
print "<p>You are visitor " + str(root['x']) + " to visit this site!</p>"
root.unlock()

and rudimentary objects:

import persist
from pbase import pbase

class person (pbase) :
  def __init__(self, name = "", age = 0) :
    pbase.__init__(self)
    self.name = name
    self.age = age

  def printAge(self) :
    print "<p>" + self.name + " is " + str(self.age) + " years old</p>"

root = persist.root()
root.lockExcl()

if not root.has_key('Joe') :            # First time through
  root['Joe'] = person('Joe', 27)
if not root.has_key('John') :           # First time through
  root['John'] = person("John", 29)

# On subsequent passes we will retrieve the objects stored on the first pass.

print "Content-type: text/html\n"
root['Joe'].printAge()
root['John'].printAge()

root.unlock()

Written by Scott Moonen

October 1, 2004 at 11:07 am