Saturday, January 16, 2010

Array Data Structure

An array is a systematic arrangement of objects, usually in rows and columns. Specifically, it may refer to several things.



In Computer Science

Generally, a collection of data items that can be selected by indices computed at run-time, including:
  • Array data structure, an arrangement of items at equally spaced addresses in computer memory
  • Array data type, used in a programming language to specify a variable that can be indexed
  • Associative array, an abstract data structure model that generalizes arrays to arbitrary indices
or various kinds of the above, such as
or various related concepts:
or also:
  • Video Graphics Array (VGA), a display adapter and video format, and many variants thereof (EVGA, FWVGA, QVGA, QXGA, SVGA, SXGA, SXGA+, TXGA, UVGA, XGA, XGA+, ...)

Array Data Structure

In computer science, an array data structure or simply array is a data structure consisting of a collection of elements (values or variables), each identified by one or more integer indices, stored so that the address of each element can be computed from its index tuple by a simple mathematical formula. For example, an array of 10 integer variables, with indices 0 through 9, may be stored as 10 words at memory addresses 2000, 2004, 2008, … 2036; so that the element with index i has address 2000 + 4 × i.


Array structures are the computer analog of the mathematical concepts of vector, matrix, and tensor. Indeed, an array with one or two indices is often called a vector or matrix structure, respectively. Arrays are often used to implement tables, especially lookup tables; so the word table is sometimes used as synonym of array.

Arrays are among the oldest and most important data structures, and are used by almost every program and are used to implement many other data structures, such as lists and strings. They effectively exploit the addressing machinery of computers; indeed, in most modern computers (and many external storage devices), the memory is a one-dimensional array of words, whose indices are their addresses. Processors, especially vector processors, are often optimized for array operations.

Arrays are useful mostly because the element indices can be computed at run time. Among other things, this feature allows a single iterative statement to process arbitrarily many elements of an array. For that reason, the elements of an array data structure are required to have the same size and should use the same data representation. The set of valid index tuples and the addresses of the elements (and hence the element addressing formula) are usually fixed while the array is in use.

The terms array and array structure are often used to mean array data type, a kind of data type provided by most high-level programming languages that consists of a collection of values or variables that can be selected by one or more indices computed at run-time. Array types are often implemented by array structures; however, in some languages they may be implemented by hash tables, linked lists, search trees, or other data structures.

The terms are also used, especially in the description of algorithms, to mean associative array or "abstract array", a theoretical computer science model (an abstract data type or ADT) intended to capture the essential properties of arrays.

Array types in C are traditionally of a fixed, static size specified at compile time. (The more recent C99 standard also allows a form of variable-length arrays.) However, it is also possible to allocate a block of memory (of arbitrary size) at run-time, using the standard library's malloc function, and treat it as an array. C's unification of arrays and pointers (see below) means that true arrays and these dynamically-allocated, simulated arrays are virtually interchangeable. Since arrays are always accessed (in effect) via pointers, array accesses are typically not checked against the underlying array size, although the compiler may provide bounds checking as an option. Array bounds violations are therefore possible and rather common in carelessly written code, and can lead to various repercussions, including illegal memory accesses, corruption of data, buffer overruns, and run-time exceptions.

C does not have a special provision for declaring multidimensional arrays, but rather relies on recursion within the type system to declare arrays of arrays, which effectively accomplishes the same thing. The index values of the resulting "multidimensional array" can be thought of as increasing in row-major order.

Although C supports static arrays, it is not required that array indices be validated (bounds checking). For example, one can try to write to the sixth element of an array with five elements, generally yielding undesirable results.

This type of bug, called a buffer overflow or buffer overrun, is notorious for causing a number of security problems. Since bounds checking elimination technology was largely nonexistent when C was defined, bounds checking came with a severe performance penalty, particularly in numerical computation. A few years earlier, some Fortran compilers had a switch to toggle bounds checking on or off; however, this would have been much less useful for C, where array arguments are passed as simple pointers.

Multidimensional arrays are commonly used in numerical algorithms (mainly from applied linear algebra) to store matrices. The structure of the C array is well suited to this particular task. However, since arrays are passed merely as pointers, the bounds of the array must be known fixed values or else explicitly passed to any subroutine that requires them, and dynamically sized arrays of arrays cannot be accessed using double indexing. (A workaround for this is to allocate the array with an additional "row vector" of pointers to the columns.)

C99 introduced "variable-length arrays" which address some, but not all, of the issues with ordinary C arrays.

No comments:

Post a Comment