By Paul Muehr
May 7, 1999
NTU CA714-CA Graduate Computer Architecture
University of California Berkeley
Introduction
In the Fall of 1997, Intel introduced the PC world to the Accelerated
Graphics Port, or AGP, for the PC video subsystem. Marketed as a new interface
that dramatically improves the processing of 3D graphics and video, the
interface quickly became standard equipment on all new PC's. AGP offers
a 2x-4x improvement in throughput over PCI, with an 8x improvement in the
future. This paper will explore the technical details of AGP as well as
the benefits it brings to the PC platform.
History of PC Graphics
Video on the PC platform started with the ISA bus on the original PC in the early eighties. Running at only 8MHz, ISA quickly became congested as the PC processor and I/O devices on the bus increased in performance. Saturation on the ISA bus eventually led to the creation of the VESA VL local bus for video. The VL bus offered increased bandwidth, but ultimately had a relatively brief existence of about one year. Shortly after the VL bus was introduced, Intel released the first Pentium microprocessor and a new I/O bus, PCI.
PCI offered 32-bit transfers at 33MHz and the most I/O bandwidth the PC platform had seen to date. Almost overnight, PCI became the choice bus for video cards and other high performance I/O devices such as disk drive interfaces and network cards. ISA managed to stay around in a minority roll for slower devices, such as modems and sound cards, but for video, PCI was the only way to go.
As microprocessors continued to increase in performance, PC graphics started to take on more aggressive challenges. PCI works great with standard 2D (x and y buffer information) and business applications, but the emergence of 3D graphics processing started putting a strain on the PCI bus.
In 3D graphics, the computer renders images in a virtual 3D space and maps textures onto the objects in that 3D space. Using floating point intensive algorithms to handle perspective and other visual details, the resulting image is then displayed on the screen. The textures used might be folds in a cloud, the surface of a wall, or other visual effects which create lifelike surfaces within the 3D image. The texture mapping process is calculation and data intensive. The move toward 3D graphics forced video card manufacturers to increase the amount of memory on graphics cards, which now had to store the frame buffer plus additional texture information.
As 3D graphics hardware and programs continued to push the envelope,
it became obvious that the trends in 3D graphics would soon lead to PCI
becoming a video bottleneck. Furthermore, the PCI bus, like the ISA bus
before it, had become increasingly congested as the performance of hard
disks and network cards increased and competed with the video subsystem
for PCI bandwidth. To solve this problem, Intel introduced AGP, which now
is available on Intel's slot 1 form factor motherboards and socket 7 motherboards
from other manufacturers.
The AGP Interface
The purpose of AGP is to create a faster channel of communication between the graphics controller and the CPU and memory. The net result of the faster communication is better graphics performance for the user and less PCI congestion.
Although the PCI specification outlined both 33MHz and 66MHz operation, PC makers only implemented the 33MHz 32-bit version of PCI. AGP is closely related to PCI. In fact, it's based on the 66MHz version of the PCI specification with additional modifications. The biggest difference between AGP and PCI is that AGP is a port, not a bus. AGP is intended to compliment PCI, not replace it, so a system can have many PCI devices, but only one AGP video card.
By giving the graphics controller a dedicated port, the implementation of the port is easier (no sharing), the speed of the port can be easily changed without affecting the PCI bus, and specific video enhancements can be added without affecting PCI compatibility. AGP is also significantly faster than PCI. The major performance benefits of AGP are listed below:
So how much faster is AGP than PCI? AGP offers three clocking modes
to speed past PCI's throughput. The table below shows how the two interfaces
compare in terms of bandwidth. Please note the slight difference between
the commonly quoted bandwidths and the technically correct bandwidths.
For consistency, the diagrams and references in this paper use the commonly
quoted numbers. The discrepancies arise from different interpretations
of the "M" in megabytes (106 vs. 220). A more thorough
explanation with formulas is provided in Appendix A.
|
Interface |
Frequency |
Quoted Bandwidth |
Bandwidth |
vs. PCI |
|
|
|
132 MB/sec
|
127.2 MB/sec
|
|
| AGP, 1x |
|
264 MB/sec
|
254.3 MB/sec
|
|
| AGP, 2x |
|
528 MB/sec
|
508.6 MB/sec
|
|
| AGP, 4x |
|
1056 MB/sec
|
1017.3 MB/sec
|
|
Table 1: Commonly quoted bandwidths and actual bandwidths for PCI and AGP
As you can see from Table 1, AGP can achieve substantially higher bandwidth
than PCI. In the 1x mode, one 32-bit data transfer can take place in each
cycle of the 66MHz AGP clock. For 2x mode, two 32-bit data transfers can
occur in each cycle, one on the rising clock edge and one on the falling
clock edge. AGP uses 3.3V signaling for 1x and 2x modes. For 4x mode, AGP
uses 1.5V signaling and an additional strobe signal to complete four 32-bit
transfers in a clock cycle.
AGP System Integration
Figure 1 shows a system block diagram of an AGP system, with peak bus
bandwidths labeled accordingly. The graphics chip has a peak bandwidth
to local memory of 800MB/sec assuming a 64-bit bus at 100MHz. Textures
stored in the main system memory can be accessed by the graphics chip through
AGP at a peak of 528MB/sec (assuming 2x mode). AGP bypasses the slower
132MB/sec PCI interface. The interface between the chipset and the system
memory is usually a 64-bit bus running at 66MHz or 100Mhz.
Figure 1: AGP system level block diagram. Copyright © Intel Corporation.
Sharing System Memory
As mentioned previously, the graphics card has local memory on-board for use as a frame buffer and to store textures. However, the amount of memory on the video card is much less than the amount available in system memory. At higher resolutions, more of the on-board memory is used for the frame buffer and less is left to store textures. Since the local video memory tends to be a considerable portion of the cost of a graphics card, board vendors have to balance the amount of memory with the costs involved.
By increasing the amount of memory available to the graphics card, AGP allows a more efficient process for accessing large textures that could not fit into local memory. Before AGP, the video card would have to copy the texture to local memory over the PCI bus, forcing some other texture out of local memory to make room. AGP eliminates the extra work by allowing the texture to remain in system memory where it can be directly executed on by the graphics chip. The capability to execute on textures in main memory directly is called Direct Memory Execute, or DIME. The combination of DIME with dynamically-allocated memory sharing allows the most frequently used textures to remain in the local memory on the graphics card, improving overall performance.
Note that AGP's use of system memory is not the same as unified memory architecture (UMA). With UMA, all of the graphics card's memory, including the frame buffer, is in main system memory. With AGP, the graphics card still has it's own local memory.
Borrowing part of the main system memory is not as straight forward
as it might seem. Graphics controllers need access to a contiguous memory
space, and main system memory can be quite fragmented. To solve this problem,
AGP plays a trick on the graphics controller. Through hardware address
translation in the chipset, the AGP memory may be non-contiguous in physical
main memory, but the graphics controller sees it's entire block of AGP
memory as one contiguous region. This built-in address translation, depicted
in Figure 2, is called the Graphics Address Remapping Table, or GART. The
OS makes AGP memory non-cacheable, hence no coherency issues.
Figure 2: Flow diagram for AGP address mapping. Copyright © Intel Corporation.
In addition to having high bandwidth access to main memory, AGP also
has two creative ways to maximize system performance when accessing system
memory: pipelining and sideband addressing. The efficiency of these two
methods allows AGP to obtain a throughput much closer to its theoretical
maximum than what PCI can obtain compared to it's theoretical maximum.
Pipelining and Sideband Addressing
In pipelining, AGP issues multiple, successive requests for data from memory without waiting for the completion of one request before making another request. In this way, AGP can "hide" the memory latencies. By contrast, when PCI makes a request, it has to wait for the current transfer to complete before it can issue another request (see Figure 3).
Note that AGP pipelining is different from PCI burst mode, which AGP
also supports. In PCI burst mode, a single request returns multiple data
transfers. However, the next PCI request still can not be made until all
of the data transfers are complete.
Figure 3: Latency for non-pipelined PCI vs. pipelined AGP. An is the adddress of request n, and Dn is the result. Copyright © Intel Corporation.
AGP implements sideband addressing through eight extra address lines.
These additional address lines give AGP the capability to issue new addresses
and requests simultaneously while data on the main 32 data/address wires
continues to flow from previous requests.
The Many Faces of AGP
Although AGP is a standard, not all AGP devices are created equal. The AGP specification allows vendors to compromise between speedy performance and economics. Indeed, some of the technology proposed by AGP, such as 4x mode, was not practical at the time the specification was created.
Vendors can choose to implement several combinations of AGP features, and all of those devices will still be considered to be AGP compliant. For instance, one vendor may not support sideband addressing or 2x mode. Such a video card would still have the advantage of increased bandwidth over PCI from the 66MHz clock, but it would not take full advantage of the capabilities of AGP. To handle the various implementations, the graphics controller driver will report its capabilities to the OS which will properly allocate memory accordingly. The buyer beware when shopping for AGP video cards and AGP motherboards.
As mentioned above, the AGP specification formed a roadmap to sustain PC graphics for several years. The faster clock speeds and lower voltage levels of AGP 4x mode have taken over two years to become practical enough to make their way to market. The first motherboards with chipsets supporting 4x mode will be arriving mid-1999. Several video card manufacturers are already sampling 4x capable video cards even before motherboard support for 4x shows up in the stores. However, don't expect AGP's video bandwidth to memory to suddenly jump to 1GB/sec with the 4x mode chipset and video cards in place. Main memory bandwidth is not yet at the point where it can meet the demands of 4x mode. Furthermore, the bandwidth to main memory is shared with and dominated by the CPU.
So far, this paper has focused on AGP features and not on real world
performance. The next few sections will address the performance advantage
of AGP as well as its future.
AGP and Software
To properly handle AGP hardware, the OS must properly support it. Microsoft
Windows and the DirectDraw API have been changed to accommodate AGP (Microsoft
Windows and DirectDraw are trademarks of Microsoft). The application developers
also must make use of the DirectDraw API to see the full benefits of AGP.
Outside of the Windows environment, the story is quite different. DOS has
no support for AGP, and any game developers still using DOS would have
to write their own AGP driver to take advantage of AGP. This is unlikely
to happen as most new games now run in Windows only.
AGP vs. PCI vs. Local Memory
With all of the great performance and features offered by AGP, one might expect PCI to have completely vanished from the video card scene. However, PCI based video is far from dead, and graphics card vendors continue to bring out PCI and AGP versions of their latest and greatest video cards. For now, software and the large installed base of non-AGP systems will keep PCI video around. But this transition period will eventually come to a close, and PCI will vanish from the video card market in the end.
As Figure 4 shows, AGP really shines on large textures which don't fit in local memory on the graphics card. In the first case, the higher resolution left less room for textures in local memory. AGP completely obliterates PCI under such conditions because the PCI card lacks the DIME advantage. When the resolution is reduced, more local memory is available for textures, and the AGP advantage begins to slip (800x600). In the final example in Figure 4, PCI and AGP are about equal. Unfortunately, very little software currently makes use of large textures, although this will eventually change. The point is real world benefits of AGP may take a while to appear.
Blurring the performance lines between AGP and PCI further are the affects caused by falling memory prices. Video card manufacturers have used the falling memory prices to put more local memory on the video board in the name of performance. Boards with 16MB of local memory are becoming more common, and some vendors are introducing boards with 32MB of memory. This is a far cry from the 2-4MB typical on graphics cards when AGP was introduced.
The local memory has an even higher bandwidth than AGP's bandwidth to
main memory. Some graphics cards have bandwidths of 1.6GB/sec between the
graphics chip and the on board memory. That kind of bandwidth far outdistances
even AGP's 4x mode and allows for multiple large textures to reside in
local memory without needing to swap out. A PCI device and an AGP device
with such a large local memory will perform similarly until software demands
on textures catch up and surpass the size of the local texture memory.
![]() |
Figure 4: AGP vs. PCI for large texture sizes. Source: Tom's Hardware Page.
|
|
|
|
| Business Winstone 97 |
|
|
| HighEnd Winstone 97 |
|
|
| 3D Winmark 97 |
|
|
| PC Player 3D Bench |
|
|
| Business Winmark 97 |
|
|
| HighEnd Winmark 97 |
|
|
Table 2: Business application and small texture performance comparison of PCI and AGP. Source: Tom's Hardware Page
The race is on to see whether the amount of local memory will stay ahead
of software's use of large textures. If prices continue to decline and
memory bandwidth continues to increase, AGP's fast access to system memory
may not get much utilization. Either way, AGP is here to stay.
Industry Support for AGP
Although Intel developed AGP, the standard has wide industry support. Of course, when it comes to pushing a new PC standard, it helped that Intel was also the largest mortherboard and chipset manufacturer in the world. All major motherboard and video card manufacturers now have AGP products on the market. As mentioned earlier, not all AGP implementations are created equal. For a list of companies who are members of the AGP Implementors Forum, visit the following web site: http://www.agpforum.org/agpmem.htm
Software support for AGP can be divided into two types, the OS level and the application level. Microsoft's operating systems now support AGP, and video card manufacturers with AGP products also tune their video drivers to take advantage of AGP. However, support of AGP at the application level is not as wide spread, but it is growing.
The important thing to remember is that even without specific application support of AGP, 3D applications will still benefit from the faster interface to the rest of the system, especially when handling large textures. With optimization for AGP, applications can see even more speedup.
Part of the reason you do not see more software products touting their
use of AGP is business applications and other 2D software will not see
much benefit from AGP. Game developers are the ones who stand to benefit
the most from AGP, and more games are starting to use more textures and
larger textures as game complexity and CPU processing power grows.
The Future of AGP
Currently AGP does not have any competitors and should remain the video standard for several years to come. The dedicated port of AGP means bus congestion will not be an issue for AGP, which is the reason why video was moved off of the PCI and ISA buses. AGP is limited more by the other buses in the system, such as the memory to chipset bus, so it will be many years before video bandwidth requirements exceed the capabilities of AGP. As the bandwidths of the other system buses grow, AGP is already positioned to take advantage of them.
Other I/O buses on the PC drawing board include PCI-X and Next Generation I/O (NGIO). Each of these proposed buses is aimed at addressing the short-comings of the PCI bus, but neither of them challenges AGP for video.
PCI-X is a 64-bit, 133MHz version of PCI backed by IBM, Hewlett Packard, and Compaq and is expected to enter the marketplace in the second half of 1999. The theoretical maximum throughput for PCI-X equals that of AGP 4x mode (1GB/sec), but lacks the benefits of MIME, sideband addressing, and pipelining.
NGIO is a different beast altogether. Backed by Intel, Dell, Hitachi,
NEC, Siemens, and Sun, this proposed bus is designed for scalability and
reliability on servers, not desktop PC's. Some of it's key differences
with PCI include moving from memory-mapped to channel I/O and changing
from parallel to serial signaling. NGIO is slower than PCI-X and poses
no threat to AGP.
Conclusion
This paper provided an overview of the AGP video standard. AGP brings
a formidable set of technologies to the PC platform and has quickly become
standard in all new PC's. As 3D software begins to take more advantage
of what AGP offers, users can expect AGP to maximize their video experience.
AGP's speed and efficiency of data transfers along with its headroom to
grow ensure AGP will remain the video standard for years to come.
![]()
References
|
|
Computer Shopper.com, December
1997, available online at:
http://www.zdnet.com/computershopper/edit/cshopper/content/9712/cshp0120.html http://www.zdnet.com/computershopper/edit/cshopper/content/9712/cshp0127.html
|
|
|
|
Computer Shopper.com, February
1998, available online at:
http://www.zdnet.com/products/content/cshp/1802/268192.html
|
|
|
|
Hewlett Packard Press Release,
September 9, 1998, available online at
http://www.hp.nl/pers/hpc80916.htm
|
|
|
|
Intel Developer Site, Intel Corp.
Online AGP Tutorial, available online at: http://www.intel.com/technology/agp/tutorial/ AGP Application Notes, available online at: http://www.intel.com/drg/mmx/AppNotes/agp.htm Intel AGP Site, available online at: http://www.intel.com/technology/agp/ NGIO Site, available online at: http://www.intel.com/tech/work/server/i-o.htm Accelerated Graphics Port Interface Specification, Rev 2.0, May 4, 1998 available online at: http://www.intel.com/pc-supp/platform/agfxport/
|
|
|
|
Accelerated Graphics Port Implementors
Forum
A.G.P. Design Guide: Covering 1X, 2X, and 4X Modes and 1.5 Volt and 3.3 Volt Signaling, Rev 1.0, August 1998, available online at: http://www.agpforum.org/specs_design_guide.htm
|
|
|
|
NGIO Forum, available online at:
|
|
|
|
PC Webopia, available online at:
http://webopedia.internet.com/TERM/A/AGP.html
|
|
|
|
PC World Online, December 1997,
available online at:
http://www.pcworld.com/hardware/video_cards/articles/dec97/1512p243.html PC World Online, January 7, 1999, available online at: http://www.pcworld.com/pcwtoday/article/0,1510,9267,00.html
|
|
|
|
SysOpt.Com: System Optimization
Information, available online at:
http://www.sysopt.com/agp.html
|
|
|
|
TC Computers Tech-Web Site, available
online at:
http://tcweb.tccomputers.com/tctechweb/techinfo/General%20Info/Video/agp.htm
|
|
|
|
The PC Guide, Charles M. Kozierok
AGP Guide, available online at: http://www.pcguide.com/ref/mbsys/buses/types/agp-c.html The PC Guide to Video System Interfaces, available online at: http://www.pcguide.com/ref/video/if-c.html The PC Guide to System Bus Functions and Features, available online at: http://www.pcguide.com/ref/mbsys/buses/func-c.html
|
|
|
|
Tom's Hardware Page
Overview of AGP, available online at: http://www.tomshardware.com/agp.html AGP Software and Performance, available online at: http://www.tomshardware.com/practicalagp.html AGP Benchmarks Comparison, available online at: |
Appendix A: Megahertz and Megabytes
PCI bandwidth is commonly quoted at 132 MB/sec, but the actual bandwidth is 127.2 MB/sec. The difference comes from rounding and different interpretations of the "M" in MB/sec.
For megahertz, MHz, the proper multiplier is 1,000,000. For a 33MHz clock, this could mean 33,000,000 Hertz with rounding. However, in the PC world, 33MHz refers to a 30ns period and 1/30ns = 33.33--MHz, or 33,333,333 Hertz.
The "M" in megabyte is the main culprit in the numbers discrepancies.
Technically, a megabyte is 220 bytes, or 1,048,576 bytes. However,
megabytes are commonly referred to in marketing literature as an even 1,000,000
bytes. Disk drive makers do this to make their drive capacities appear
to be larger, but it's really playing with numbers.
32-bit AGP vs. 32-bit PCI Throughput
30ns clock and 220= 1024 * 1024 = 1048576
Actual PCI throughput:
Actual AGP throughput:
2x: [ (1/15ns) * 4 bytes * 2/cycle ] / 220 = 508.6 MB/sec
4x: [ (1/7.5ns) * 4 bytes * 4/cycle ] / 220 = 1017.3 MB/sec
Supporting material: http://www.pcguide.com/res/tables-c.html#BinDec
|
Frequency |
Quoted Bandwidth |
Bandwidth |
Vs. PCI |
|
| PCI |
|
132 MB/sec
|
127.2 MB/sec
|
|
| AGP, 1x |
|
264 MB/sec
|
254.3 MB/sec
|
|
| AGP, 2x |
|
528 MB/sec
|
508.6 MB/sec
|
|
| AGP, 4x |
|
1056 MB/sec
|
1017.3 MB/sec
|
|