For many operations Manifold will automatically use multiple processors or multiple processor cores if installed in a computer system. In addition to this basic multiprocessing capability Manifold includes the ability to utilize massively parallel multiprocessing in the form of NVIDIA CUDA-enabled products.
Although NVIDIA CUDA in the present form has some limitations, it is not an exaggeration to say that this technology could well be the most revolutionary thing to happen in computing since the invention of the microprocessor. It's that fast, that inexpensive and has that much potential. The current implementation just scratches the surface of what is possible and what will be implemented in future updates and Manifold releases.
Introduction
NVIDIA is best known for a line of outstanding graphics processors that have become popular as the basis for graphics cards. In the quest for maximum speed, NVIDIA's GPUs (Graphics Processing Units) have evolved far beyond single processors. Modern NVIDIA GPUs are not single processors but rather are parallel supercomputers on a chip that consist of very many, very fast processors. Contemporary NVIDIA GPUs now include GPUs ranging from 16 to 128 stream processors per GPU, delivering incredibly powerful computing bandwidth.
Although the market impetus behind the creation of such supercomputers-on-a-plugin-board has been the computational demands of the PC gaming market, such "graphics" boards have become so powerful that the scientific computing community has begun using them for general purpose computing. It turns out that the mathematical computations, such as matrix multiplication and transposition, which are required for complex visual and physics simulations in games are also exactly the same computations that must be performed in a wide variety of scientific computing applications, including GIS.
NVIDIA has supported this trend by releasing the CUDA (Compute Unified Device Architecture) interface library to allow intrepid applications developers to write code that can be uploaded for execution by NVIDIA's massively parallel GPUs. This allows applications developers to plug in a 500 gigaflop, 128-processor, NVIDIA-based card for a mere few hundred dollars and upload applications to run within the NVIDIA GPU at far greater speed than possible on even the fastest general purpose CPU on the motherboard.
If we have a CUDA-capable NVIDIA graphics card installed in our system and if we have also downloaded and installed NVIDIA's CUDA software onto our system, Manifold can take advantage of the phenomenal power of massively parallel NVIDIA stream processors to execute many tasks at much greater speed.
In addition to providing GPUs with supercomputer speed, NVIDIA also has introduced a very elegant hardware technology called SLI which enables more than one NVIDIA card to be installed in a computer system. When more than one NVIDIA card is installed in an SLI-capable system the cards will automatically work together for even better performance as if all of the GPU processors were on a single card. If we have an SLI-capable system we can install one, two, three or even four CUDA-enabled cards to apply as many as 512 stream processors for over a teraflop of computing power. Although not all motherboards can run SLI cards at full bandwidth if more than one card is installed, there are many motherboards which can run two CUDA-enabled cards at full SLI bandwidth.
Because NVIDIA technology benefits from the enormous economies of scale in the gaming market, CUDA-enabled cards have become absurdly inexpensive for the performance they provide. At the present writing CUDA-enabled cards can be purchased for less than $150 for a basic 8600 GTS card and easily under $550 for a very high performance 8800 GTX card. It is easy and inexpensive to choose a card with the performance you want (more stream processors running at faster clock rate with more memory gives better performance).
Dozens of vendors provide graphics cards based upon CUDA-capable NVIDIA GPUs and it is almost not possible for a high-performance PC or motherboard vendor to introduce a product that does not do a good job of hosting such GPU cards. The insatiable demand of gamers for more performance has also spawned an industry of vendors offering ever-faster memory, more powerful power supplies and other systems components that are perfect for creating outstanding GIS desktop and server machines.
CUDA Limitations and Requirements
There are several important constraints on CUDA use within Manifold:
· We must have a CUDA-enabled NVIDIA card installed in our system. 8800 series NVIDIA cards at the present writing are the best-known CUDA-enabled cards, but other NVIDIA GPUs are also CUDA-capable (check with the NVIDIA web site and with your graphics card vendor's web site to see if a particular card is CUDA-capable). For example, even an NVIDIA 8600 GTS with only 32 stream processors per GPU supports CUDA. Manifold's CUDA capabilities were developed using ASUS 8800 GTX cards, which are a fairly representative example of a high-quality implementation of NVIDIA's recommended "reference design" for graphics card vendors using the 8800 GTX. Even faster cards are available using strategies such as overclocking, water cooling and faster GPUs such as the NVIDIA 8800 Ultra and successors. For maximum support it is wise to spend a bit more and get an 8800 GTX based card like those used by manifold.net for development.
· The rest of our PC system must have sufficient speed and power to support the NVIDIA card. For example, memory must be fast enough to handle CUDA bandwidth and power supplies must provide enough power to run the NVIDIA card (or cards) with extra PCI-E power cables. Consult any technology-obsessed 14 year old gamer for advice on configuring a suitably "hot" system.
· We must have installed NVIDIA's CUDA SDK library for Windows, which may be downloaded from the nvidia.com web site.
· At the present writing, the CUDA library has been released only for 32-bit Windows systems. Through some incomprehensible failure of the Fundamental Structure of the Universe, NVIDIA (normally the world leader in rapid response to modern Windows needs) has as of this writing implemented 64-bit Linux but not 64-bit Windows. A 64-bit Windows CUDA version has been announced by NVIDIA for November, 2007.
· Writing massively parallel algorithms to implement spatial functions is extremely difficult, even for manifold.net. Therefore, at the present time only a few dozen functions have been implemented within Manifold that can leverage CUDA. Many more are on the way, especially once 64-bit Windows CUDA becomes available.
· Existing CUDA-enabled functions are Surface - Transform dialog operators for surfaces. The Surface - Transform dialog is part of the optional Surface Tools extension for Manifold (and also a built-in part of some Manifold System editions such as Universal Edition and Ultimate Edition). If we do not have the Surface Tools extension we will not have the ability to use this dialog and hence no ability to leverage CUDA. New updates and future Manifold releases will likely add many more usages of CUDA in addition to the Surface - Transform dialog operators.
· Functions executed within CUDA cards are virtually instantaneous compared to speed of execution within the main processor. However, the NVIDIA stream processors execute tasks so rapidly that it is difficult to provide data fast enough from disk and memory to keep the processors occupied. The resulting performance in most "real life" applications therefore tends to be limited not by processor speed but rather by the speed with which data can be fetched from hard disk or other memory. In addition, a good portion of various tasks are not bound by computation but instead involve overhead tasks such as writing out results to disk, re-computing levels and other necessary but mundane tasks that are not accelerated by CUDA processors. The net result is that as a practical matter for many tasks CUDA-enabled processors will visibly increase speeds, almost always by a factor of two to ten and at times by a factor of ten to fifty, but not usually by factors of hundreds for the overall task even if the actual computation of parts of the task goes hundreds of times faster.
The greatest limitation at the present writing is the unavailability of CUDA software for 64-bit Windows. This means that any GIS usage of CUDA can occur only within 32-bit Windows, where on the one hand one may have tremendous computational speed with CUDA but limited memory for large jobs, a very odd and inconvenient combination. NVIDIA has announced an intent to remove this constraint in November, 2007. However, it surely cannot hurt for users to express their interest in 64-bit Windows support to assure that NVIDIA has full information on the desire of many users to place maximum priority on Windows. Visit the nvidia.com home page and drill down into the CUDA home page for information on how to contact NVIDIA and how to make comments in the NVIDIA CUDA developer forum.
Despite the above limitations it is clear that CUDA is a revolutionary technology. NVIDIA GPUs are so fast that a routine comment from developers is that NVIDIA renders the main processor almost superfluous, as if even the fastest multi-core Intel chip is relegated to being nothing but an accessory processor to handle the keyboard and mouse. That is obviously not a competitive challenge that traditional processor vendors can afford to ignore. CUDA is the first of what is likely to be a new wave of massively parallel architectures from competitors such as Intel and AMD. Manifold's parallel code has been expressly written to allow easy implementation on future "many-core" processor solutions from Intel and AMD that will compete with NVIDIA CUDA.
Installation and Configuration
Once we have installed the NVIDIA hardware and CUDA SDK there is no need for any other configuration. Note that the CUDA SDK as of this writing can be installed only in 32-bit Windows XP. When a 64-bit CUDA SDK is available, it will be possible to install it in 64-bit Windows. Until then, all of the following comments apply only to operations within 32-bit Windows XP.
When launched, Manifold will automatically detect and utilize the CUDA-enabled hardware. The Use GPGPU technologies (NVIDIA CUDA) option in the Tools - Options - Miscellaneous dialog is turned on by default.
When a CUDA-enabled card is present Manifold will report finding the card in the Help - About dialog in the GPU value.

The above illustration shows a Help - About report in a 32-bit Windows XP system in which two CUDA-enabled cards have been installed.

If a CUDA-enabled card has not been installed or if no CUDA software installed or if 64-bit Windows is in use the Help - About dialog will report Graphics only for the GPU, as seen above.
Functions Utilizing CUDA
At the present writing the following Manifold Surface - Transform dialog functions utilize CUDA if available: Aspect, AvgValue, Blur, CurvGauss, CurvMean, CurvPlan, CurvProfile, DifferenceE, DifferenceN, DifferenceNE, DifferenceNW, DifferenceS, DifferenceSE, DifferenceSW, DifferenceW, Diversity, DiversityIndex, HighPass1, HighPass2, HighPass3, Laplace1, Laplace2, LowPass1, LowPass2, LowPass3, MajValue, MaxValue, MedianCross, MedianSquare, MedianSquare5, MedValue, MinValue, Sharpen, SharpenMore, Slope, SumValue, Tile and TileMedian functions.
Additional functions and use of CUDA are expected to be added with each new Manifold release.
Tech Tip
The above screen shots were made on what has become a typical system within manifold.net for working with CUDA: it is a dual boot Windows configuration loaded with both 32-bit and 64-bit Windows versions. This is easy to accomplish.
When first loading Windows on a new computer begin with 32-bit Windows. In the initial installation, create a disk partition that uses some portion of the overall disk available and then install Windows on that partition. Next, begin a Windows x64 installation, create a second disk partition in the remaining disk space and then install x64 Windows on that partition. When your computer boots up Windows will give you a boot up screen that will enable you to choose either Windows or x64 Windows so you can choose to launch in either 32-bit or 64-bit mode. You will have a computer that appears to have a C: drive with 32-bit Windows installed on it and a D: drive with 64-bit Windows installed on it. No matter which version of Windows you launch, both drives will be fully available so you can keep your files and data on either drive.
Launch 32-bit Windows and install Manifold and the CUDA SDK library. This will enable you to work with CUDA when running 32-bit Windows. Launch 64-bit Windows and install x64 Manifold. This will enable you to work with big jobs using the full power of 64-bit Windows and x64 Manifold. You can use an x64 Manifold serial number for both installations (x64 Manifold will install and run in 32-bit mode when installed in 32-bit Windows), but you will have to use two Activation keys since a dual-boot installation is logically the same as installing twice on two different systems.
It is a major inconvenience not having an x64 Windows CUDA library since just about everyone interested in CUDA performance is already using 64-bit Windows. Despite that inconvenience the cost of CUDA-capable cards is so low that it makes sense to get one anyway for two reasons: first, for occasional usage even in 32-bit mode and second, with the expectation that NVIDIA will issue an x64 Windows version of the CUDA library as promised in November, 2007. A final point is that even if one doesn't use CUDA it is really nice to have a graphics card that can do any form of graphics with superb speed.
Demos
The greatest performance difference visible with CUDA appears when the Surface - Transform dialog is used for a complex calculation on a surface that is not too large. This shows off the intense computational performance in CUDA without disk accesses (which take a proportionately larger amount of time for large files) slowing down the works.
For example, suppose we have a surface called MySurface. Launch the Surface - Transform dialog and execute a formula such as:
Slope([MySurface]) + Slope([MySurface] * 2) /2 +
Slope([MySurface] * 3)/3 + Slope([MySurface] * 4)/4
This is a nonsensical but complex formula that will execute much faster, often ten times faster, when the Use GPGPU technologies (NVIDIA CUDA) option is turned on (thus using CUDA) than without CUDA.
Notes
CUDA capability is enabled by default. To turn it off, turn off the Use GPGPU technologies (NVIDIA CUDA) option in the Tools - Options - Miscellaneous dialog.