Transcript Module 2
Designing for 100+ MHz
1
Designing for 100+MHz
1999 Designs Demand...
Higher system speed
Higher integration
— smaller size, less power, better reliability
Lower cost
Shorter development time
Better product differentiation
2
Designing for 100+MHz
Traditional Multi-Chip Boards
Discrete design components
— CPU, memory
— bus transceivers, PCI controller, FIFOs
— Ethernet controller, Graphics accelerator,
MPEG, DSP, etc.
— programmable logic as glue and custom function
Advantages:
— well-documented sophisticated functions
— readily available as IP in silicon
3
Designing for 100+MHz
Multi-Chip Board Problems
Physical size
Power consumption and reliability
PC board signal integrity
Limited flexibility
— prevents design modifications and upgrades
— prevents product diversification
— prevents product customization
Poor product differentiation
— standard parts = standard architecture
4
Designing for 100+MHz
FPGA Advantages
Smaller size
Lower power consumption
Better signal integrity
— fewer PC-board issues
Enhanced flexibility
— easy modifications, upgrades, etc.
Enhanced product differentiation
— proprietary architectures
5
Designing for 100+MHz
FPGAs Users Want...
System clock rate of 100+ MHz
>100,000 gates
Efficient design methodologies
Availability of well-documented Cores
Reasonable cost
6
Designing for 100+MHz
The FPGA Solution
4th Generation FPGA
Logic+Memory+Routing
Multi-Standard Select I/O
Temperature Sensing
Delay-Locked Loop for
Fast Clock and I/O
3.3 ns Synchronous
Dual-Port SRAM
500 Mbps SelectMAP
Configuration
7
Designing for 100+MHz
Now the Challenge...
Design a 100+ MHz system
Together, we can do it...
— we’ll supply the ingredients...
— you use them intelligently
But don’t forget...
— the clock period is less than 10 ns !
8
Designing for 100+MHz
Designing for 100+ MHz.
Volts, Amps, and Watts
— PCB signal distribution
— chip inputs and outputs
— power and thermal considerations
Ones and zeros
— logic emulation
Bits and bytes
— memory hierarchy
9
Designing for 100+MHz
Moore Meets Einstein
2048
1024
Trace Length MHz
512
256
128
64
32
16
8
Clock Frequency
Inches per 1/4 Clock Period
4
2
1
’65
’70
’75
’80
’85
’90
Year
’95
’00
’05
’10
Speed Doubles Every 5 Years…
...But the speed of light never changes
10
Designing for 100+MHz
Volts, Amps, and Watts
PCB design issues
— capacative loading
— transmission lines and termination
Chip inputs and outputs
— clock distribution and DLLs
— I/O standards
Power and thermal considerations
— temperature sensing diode
— power supply decoupling
Configuration
— new SelectMAP mode
11
Designing for 100+MHz
Capacitive Loading
Capacitance slows outputs and increases power
— output delay increase:
– ~ 25 ps per pF of additional loading
— output power dissipation increase:
– 11 µW per MHz per pF with 3.3-V swing
Sources of capacitance
— 10 pF max for each device pin
— 2 pF per inch for narrow traces ( 0.8 pF/cm )
— 130 pF per inch2 for copper areas ( 20 pF/cm2)
IBIS files provide output impedance details
12
Designing for 100+MHz
Transmission Lines
Some traces must be treated as transmission lines
to minimize ringing
— transmission line if round trip > transition time
— lumped-capacitance if round trip < transition time
Signal delay on a PCB:
— 140 to 180 ps per inch ( 50 to 70 ps/cm)
Lumped-capacitance trace length:
— 3 inches max for a 1-ns transition time (7.5 cm)
— 6 inches max for a 2-ns transition time (15 cm)
13
Designing for 100+MHz
Terminated Transmission Lines
Reflections and ringing
Traditional Thevenin
termination at the end
VCC
100 Ω
50 Ω
100 Ω
Dynamic termination at the
end is better and saves power
50 Ω
50 Ω
100 pF
Series termination at the
source is best single source
and destination only!
22 Ω
27 Ω
50 Ω
(50 Ω Total)
14
Designing for 100+MHz
On-Chip Clock Distribution
Clock
CLB
Data
IOB
Clock distribution introduces delay
— larger chips suffer more clock delay
15
Designing for 100+MHz
Clock Delay Problems
Clock delay increases clock-to-output times
Clock delay leads to unacceptable input hold time
— set-up time is negative
Additional data delay can eliminate the hold time
— set-up time becomes positive
— but tolerance build-up widens the data-valid window
IOB
Flip-Flop
Data
Clock
Delay
Clock
Distribution
Delay
D
Q
Clock
Required Data Valid
(without delay)
Required Data Valid
(with delay)
16
Designing for 100+MHz
DLLs Maximize I/O Speed
Clock-to-output time plus set-up time determines
the I/O speed and data bandwidth
— min clock period = max clock-to-out + max set-up
Traditional solution:
— use highly buffered, balanced clock trees
– needed to reduce internal clock skew
– cannot totally eliminate the delay
The Virtex solution:
— use a Delay-Locked-Loop ( DLL )
– aligns the internal and external clocks
– effectively eliminates the clock-distribution delay
17
Designing for 100+MHz
Virtex Has 4 Independent DLLs
Clock
Comparator
Error
Delay
CLB
Data
IOB
DLLs adjust clock delay to align internal and external clocks
— digital closed-loop control
— 25 to 200-MHz range, 35-picosecond resolution
18
Designing for 100+MHz
Fast Clock-to-Out With DLL
160 MHz inter-chip data rate
— 16-mA LVTTL
— IOB register to IOB register
Virtex FPGA
Virtex FPGA
0.5 ns
D
Q
DLL
DLL
3.8 ns
1.9 ns
Clock
19
Designing for 100+MHz
LVTTL Data Rate with DLL
1.4 ns measured clock-to-output delay
Output standard = LVTTL Fast 16mA
(OBUF_F_16)
Temp=100C, Vdd=2.375V,
Vcco=3.3V
Waveforms:
1: CLKIN
2: DATA OUT (no DLL)
3: DATA OUT (DLL deskewed)
Timing
w/o DLL
w/ DLL
r->r r->f
r->r r->f
3.9n 3.9n 1.4n 1.4n
20
Designing for 100+MHz
Other DLL Functions
Double the incoming clock frequency
— fast internal operation – slow external clock
Clock mirroring to the PCB
Divide clock by 1.5, 2, 2.5, 3, 4, 5, 8, or 16
Adjust clock duty cycle to 50-50
Create four quadrature clock phases
— input four sequential bits per clock period
21
Designing for 100+MHz
Duty Cycle Correction
~25% duty cycle in – 50% duty cycle out
Virtex FPGA
1X
25 MHz
25% Duty
Cycle
DLL
25 MHz
50% Duty Cycle
22
Designing for 100+MHz
Clock Doubling and Mirroring
Clock mirror with less than 100 ps skew
— simplifies PCB clock distribution
Virtex
SDRAM
74 MHz #1
System
Clock
37 MHz
DLL 1
1 Input Load
Zero-Delay
Internal Clock Buffer
74 MHz #2
DLL 2
Actual HDTV
Customer Example
Exactly
Aligned
74 MHz Internal
37 MHz Internal
System Clock
SDRAM
Inside FPGA
Inside FPGA
23
Designing for 100+MHz
Precise Clock Mirroring
2x system clock for board use
Virtex FPGA
2X
66MHz
Clock
DLL
132 MHz
Clock
24
Designing for 100+MHz
Clock Division
Divide clock by 1.5, 2, 2.5, 3, 4, 5, 8, or 16
— maintain synchronous edges
CLKIn 200 MHz
CLKout 200 MHz
CLKDV 12.5 MHz
25
Designing for 100+MHz
Multi-Standard SelectI/O
GTL+
MicroProcessor
2.5V SSTL
SRAM
1.8V
SDRAM
5V Tolerant
FLASH
Mixed Signal
5V
3.3V LVTTL
Busses/Backplanes
(3/5V PCI, ISA, GTL…)
DSP
26
Designing for 100+MHz
Mix & Match Output Standards
User-supplied voltages determine output swing
— 3.3 V, 2.5 V, 1.5 V
— one voltage per bank
— a bank is half of a chip edge
Output characteristics are programmable on a
per-pin basis
— push-pull or open-drain
— LVTTL drive strength
– 2-mA to 24-mA sink and source current
— LVTTL Slew rate
27
Designing for 100+MHz
Mix & Match Input Standards
Internal
Reference
Internal or user-supplied
threshold voltage
— selectable on a per-pin basis
— one user-supplied
threshold voltage per bank
Programmable over-voltage
protection
— 5-V tolerant or diode
clamp to VCCO
— selectable on a per-pin basis
VREF
Input
Input
Input
Input
Input
Input
VREF
28
Designing for 100+MHz
SSTL Clock-to-Out With DLL
200 MHz inter-chip data rate
— SSTL 3, Class II
— IOB register to IOB register
Virtex FPGA
Virtex FPGA
0.3 ns
D
Q
DLL
DLL
2.8 ns
1.9 ns
Clock
(Stub Series Transceiver Logic)
29
Designing for 100+MHz
SSTL Data Rate with DLL
1.3 ns measured clock-to-output delay
— much lower noise than LVTTL
Output standard = SSTL 3 Class 2
(OBUF_SSTL3_II)
Temp=100C, Vdd=2.375V, Vcco=3.3V,
Vtt=1.5V
Waveforms:
1: CLKIN
2: DATA OUT (no DLL)
3: DATA OUT (DLL deskewed)
Timing
w/o DLL
r->r r->f
3.5n 3.8n
w/ DLL
r->r r->f
1.1n 1.3n
30
Designing for 100+MHz
From FPGA to System Component
‘Redefining the FPGA’
x1 CLK
Chip 1
Cache SRAM (Mbytes)
x2 CLK
LVCMOS
SSTL3
LVTTL
Low Voltage
CPU
GTL+
SDRAM (133MHz)
Chip 1
High Speed System Backplane
"Virtex moves FPGAs from glue to system component” - Ron Neale, EE
31
Designing for 100+MHz
Power and Thermal Issues
Power and heat are serious concerns
All CMOS power consumption is dynamic
— proportional to VCC2
— proportional to capacitance
— proportional to frequency
Virtex conserves power
— 2.5-V supply voltage
— small geometries and short interconnects
reduce capacitance
32
Designing for 100+MHz
Virtex Power Consumption
Virtex is designed to conserve power
— 100 MHz 16-bit counters
– 12.5 MHz average transition rate
– 6.5 mW per counter including clock distribution
— 100 MHz 8-bit counters
– 25 MHz average transition rate
– 5 mW per counter including clock distribution
XCV300
XCV1000
384 16-bit Counters
2.5 W Total
768 8-bit Counters
3.7 W Total
1536 16-bit Counters 9.8 W Total
3072 8-bit Counters
14.7 W Total
33
Designing for 100+MHz
Thermal Management
Temperature-sensing diode
— matched to maxim MAX 1617 A/D
— programmable alarms
— similar to the Pentium II solution
Virtex
FPGA
SBMCLK
DXP
DXN
Maxim
MAX1617
SBMDATA
ALERT
34
Designing for 100+MHz
Power Supply Decoupling
CMOS power-supply current is dynamic
— current pulse every active clock edge
Peak current can be 5x the average current
— instantaneous current peaks can only be
supplied by decoupling capacitors
Use one 0.1 µF ceramic chip capacitor for each
power-supply pin
— low L and R are more important than high C
— double up for lower L and R if necessary
— use direct vias to the supply planes, close to the
power-supply pins
35
Designing for 100+MHz
Virtex Configuration
New byte-wide SelectMAP mode
— up to 528 Mbps at 66 MHz
Control Logic
(EPLD)
Busy
– simple handshake protocol
— up to 400 Mbps at 50 MHz
CS
Address
Configuration
EPROM
– no handshake required
Configuration bit-stream length
— 0.5 Mbits to 6.1 Mbits
Data
WE, CS
Virtex
FPGA
36
Designing for 100+MHz
Volts, Amps, and Watts: Recap
PCB design issues
— minimize capacitance for higher speed
— terminate transmission lines to reduce ringing
Chip inputs and outputs
— use DLLs to maximize I/O bandwidth
— use SelectI/O to interface with different standards
Power and thermal considerations
— use the sensing diode to manage chip temperature
— decouple the power supply well
Configuration
— configure faster with the SelectMAP mode
37
Designing for 100+MHz
Designing for 100+ MHz.
Volts, Amps, and Watts
— PCB Signal Distribution
— chip Inputs and Outputs
— power and Thermal Considerations
Ones and zeros
— logic Emulation
Bits and bytes
— memory hierarchy
38
Designing for 100+MHz
Spending the 10 ns Budget
Fast logic requires fast function generators
— signals often pass through several
function generators
Routing delays must also be kept short
— there are routing delays between every
function generator
Arithmetic delays are important
— carry chains often create critical paths
39
Designing for 100+MHz
You Don’t Have To Be An Expert
You don’t have to be an FPGA architecture expert to
implement high-performance designs
— the benefits of a good architecture are automatic
– all the logic goes faster
– software provides easy access to the features
You can achieve high-performance only with a good
FPGA architecture
— a good FPGA empowers its users
You’ll design better if you know the architecture
— matching your design style to the available features
increases performance and/or lowers cost
40
Designing for 100+MHz
Virtex CLB
Logic and arithmetic delay reduction demands
improvements in the CLB
Virtex CLB is divided into two slices, each with:
– 2 function generators
– 2 flip-flops
– 2 bits of carry logic
Fnct
Gen
Fnct
Gen
Carry
Carry
Fnct
Gen
Fnct
Gen
Carry
Carry
41
Designing for 100+MHz
Fast Function Generators
Each function generator emulates
2 to 3 levels of logic
— a 10-level logic path typically requires
3 to 5 Function Generators in series
— at 100 MHz, they must be less than
2 ns each including the routing
Virtex has 0.6-ns function generators
— leaves 1.4 ns for each route
42
Designing for 100+MHz
Connecting Function Generators
Some functions need several function generators
— F5 MUXs connect pairs of function generators
– functions with 5 to 9 inputs
— F6 MUXs connect all 4 function generators
– functions with 6 to 17 inputs
Fnct
Gen
Fnct
Gen
F5
F5
Fnct
Gen
Fnct
Gen
F6
43
Designing for 100+MHz
Fast Local Routing
Local routing provides fast interconnects
— in a CLB, Function Generators connect with minimal
routing delays
— fast paths between adjacent CLBs increases flexibility
Fnct
Gen
Fnct
Gen
Fnct
Gen
Fnct
Gen
Carry
Carry
Carry
Carry
Fnct
Gen
Fnct
Gen
Fnct
Gen
Fnct
Gen
Carry
Carry
Carry
Carry
44
Designing for 100+MHz
Use Pipelining for Speed
Shorter clock periods means doing less each period
—
—
—
—
create a pipeline structure
pipeline stages operate concurrently
more functions are done at the same time
throughput increases
All function generators have output flip-flops
— most pipeline support is “free”
45
Designing for 100+MHz
16-Bit Pipeline in One LUT
In directly cascaded pipelines the flip-flops
are not free
Delay
16-Bit Shift Register
Select
One SRLUT can implement
up to 16 bits of delay
— shift data in and select
the appropriate tap
Output
Input
46
Designing for 100+MHz
Fast Logic Needs Fast Routing
Our typical design with 3 to
5 CLBs needed an average
routing delay of 1.4 ns or less
Vector-based Interconnect
— the Virtex routing
architecture delivers
this performance
Delay is independent
of direction
— dependably
short delays
The c ircles show 1.4-ns routing regions
47
Designing for 100+MHz
Go Farther, Faster
Virtex achieves its speed through a hierarchy of
highly buffered routing resources
— wires span 1, 2, or 6 CLBs
The Virtex routing architecture is designed for
large arrays
— today’s FPGAs are big…
but tomorrow’s will be even bigger
Virtex is designed to maintain its performance
even in very large arrays
48
Designing for 100+MHz
No Routing Congestion
For high-speed applications, routing must be
dependably fast
— not just capable of being fast
In the past, high device utilization has caused routing
congestion
— critical nets might be forced to meander
Virtex minimizes these problems
— abundant resources prevent congestion
If it needs to be fast, it will be fast – automatically!
49
Designing for 100+MHz
Built-in Tri-State Busses
Bi-directional busses are supported directly by
tri-state buffers built into each CLB
— two drivers per CLB
— segmentable every four CLB columns
CLB
CLB
CLB
CLB
CLB
50
Designing for 100+MHz
Arithmetic – A Special Case
Adders, accumulators, counters, and comparators
all depend on carry chains
Carry-chain logic is usually much deeper than the
rest of the design
— 32 levels for a 16-bit ripple adder
— too deep to use function generators at 100 MHz
— arithmetic delays would limit performance
Dedicated carry logic provides the desired speed
— 16-bit adders can operate at up to
200 MHz register-to-register
51
Designing for 100+MHz
Wide Arithmetic
64-bit adders would require 128 levels of logic
— expensive complex carry schemes would be needed
to preserve performance
Virtex minimizes the carry propagation delay
— 100 ps per bit pair
— zero routing delay between CLBs
Minimal performance loss for each extra bit
16-bit adders operate at up to 200 MHz
64-bit adders operate at up to 135 MHz
52
Designing for 100+MHz
Efficient Virtex Multipliers
Cascade vs. tree structure
Delay
— cascade simpler and smaller
— tree is faster
Cascade
Tree
Virtex Tree
Virtex gives the best of both
worlds
— as fast as a tree
— smaller than a cascade
Number of CLBs
160 MHz clock rate for
pipelined 16 x 16 multiplier
4x4
8x8
16 x 16
Cascade
Tree
Virtex Tree
4x4
8x8
16 x 16
53
Designing for 100+MHz
Fast Address Decoders
Wide address decoders
could slow operation
— wide AND gates with
invertable inputs
Virtex carry-chain MUXs
can act as AND gates
— combine function
generator ANDs
64-bit decoders operate
at up to 155 MHz
0
1
0
0
1
0
0
1
0
0
1
0
1
54
Designing for 100+MHz
Speed Is Never Wasted
You can never have too much performance
— excess performance can always be traded for
size and cost reduction
Replace single-cycle functions with smaller
multi-cycle versions
— a 2-cycle multiplier is half the cost of a
single-cycle multiplier
Reduce costs by designing down
to the performance you need
55
Designing for 100+MHz
Creating a High-Speed Clock
Logic sometimes needs to operate faster than
the available clock
— multiple RAM accesses in a single cycle
— low-speed PCB clock distribution for power or
noise reduction
Virtex DLLs can double and redouble
incoming clocks
45 MHz
2X
2X
DLL1
DLL2
90 MHz
180 MHz
56
Designing for 100+MHz
Optimized for the Future
Deep sub-micron technology permits larger
and larger array sizes
— poses new circuit-design challenges
— changes the rules of FPGA architecture
Across-chip routing is the most vulnerable
— could easily limit design performance
Virtex is designed for long-term growth
— even long, across-chip routes will remain fast
Virtex is tomorrow’s FPGA
… today!
57
Designing for 100+MHz
10 ns is Long Enough
Virtex CLBs can implement relatively complex
functions in 10 ns
— 0.6 ns per 4-input function generator
Virtex offers fast interconnections
— even across-chip when fully utilized
— fast tri-state buses
Support for very fast arithmetic operations
— 16-bit adders at 200MHz
58
Designing for 100+MHz
Implement Designs
Automatically
You don’t have to be an FPGA wizard to use Virtex
Virtex is optimized for automated implementation
— uniform structure
– efficient mapping/synthesis
— ample routing
– simple placement and no congestion
— predictable performance
– effective synthesis
IP cores speed design even more
— validated functionality with guaranteed performance
59
Designing for 100+MHz
Designing for 100+ MHz
Volts, Amps, and Watts
— PCB signal distribution
— chip inputs and outputs
— power and thermal considerations
Ones and zeros
— logic emulation
Bits and bytes
— memory hierarchy
60
Designing for 100+MHz
100+ MHz Memory
Virtex memory operates up to 200 MHz
High-speed memory has two benefits
— data storage
– “work-in-progress”
– input/output buffers, FIFOs
— accelerating complex functions
– store pre-computed values in look-up tables
61
Designing for 100+MHz
Data Storage Hierarchy
Virtex supports 3 levels of memory hierarchy
On-chip SelectRAM+
— small-to-medium memories
— 0.6-ns read access time
On-chip Block SelectRAM+
— larger memories
— true dual-ported operation
— 3.3-ns read access time
Fast SelectI/O interfaces to external RAM
— DLL boosts memory bandwidth
62
Designing for 100+MHz
SelectRAM+
SelectRAM+ uses CLB LUTs as user memory
—
—
—
—
16-deep RAMs
32-deep RAMs
16-deep dual-ported RAMs
16-deep shift registers
Cascadable for larger memories
— 128 or more words deep
— uses logic resources for expansion
63
Designing for 100+MHz
Block SelectRAM+
Up to 32 dual-ported 4096-bit RAM Blocks
— synchronous read and write
True dual-port memory
— each port has full read and write capability
— different clocks for each port
Configurable aspect ratio
— trade width for depth
– 4096 x 1 bit to 256 x 16 bits
— separate configurations for each port
Dedicated routing for memory expansion
64
Designing for 100+MHz
High-Speed Memory Interfaces
SelectI0 and DLLs together provide fast access to
many types of external memory
Xilinx currently offers two reference designs
— fully synthesized
— automatic placement and routing
SDRAM
… up to 125 MHz
ZBTRAM … up to 143 MHz
(Zero Bus-Turn-around)
65
Designing for 100+MHz
Input/Output Data Buffers
High-performance systems need data buffers to
decouple internal operation from I/O activity
— I/O may be sporadic (burst-mode busses)
— I/O may be faster or slower
— I/O may be wider or narrower
I/O buffers can take several forms
— dual-ported RAMs
— ping-pong buffers
— FIFOs
66
Designing for 100+MHz
Dual-ported I/O Buffers
Block SelectRAM+ is ideal for I/O buffers
— dual-ported operation
– independent clocks and controls
– bridges between clock domains
– simultaneous read and write
— port-specific aspect-ratio control
– built-in rate/width conversions
SelectRAM+ provides similar benefits
on a smaller scale
67
Designing for 100+MHz
Ping Pong Buffers
Ping-pong buffers are pairs of blocks that alternate
between input and processing
SRLUT for small buffers
Read
Address
16-Bit Shift Register
— self-addressing input
— 0.6-ns read access
{
Larger buffers can use
the dual-ported Block RAM
Output
{
16-Bit Shift Register
— one address bit alternates
read/write areas
— 3.3-ns read access
Select
Input
68
Designing for 100+MHz
Small FIFOs in SRLUTs
Small FIFOs can be implemented in SRLUTs
word count addresses the output data
increment and enable SRLUT to Push
decrement to Pop
Pop
enable only for both
Down
16-Byte FIFO in 4 CLBs
— 16 x 16 in 6 CLBs
— 200+ MHz
Expandable for deeper
FIFOs
Word
Counter
{
Up
Push
16-Bit Shift Register
—
—
—
—
Output
Input
69
Designing for 100+MHz
Large FIFOs in Block RAM
— add read and write
address counters
Asynchronous push
and pop
Data
En
Full
Push
Block
SelectRAM+
Addrs
Output
Data
Addrs
WE
Counter
Input
Counter
Large FIFOs can use the
dual-ported block RAM
En
Control Logic
Pop
Empty
Different port sizes give rate-for-width conversion
Block RAM FIFOs can operate at up to 170 MHz
including flag logic
70
Designing for 100+MHz
Pre-computing for Speed
Some functions are too complex for 10-ns
logic implementation
— pipelining is not always possible
An alternative is to pre-compute all the possible
results and store them in memory
— select a result according to the inputs
Function time is independent of complexity
— 0.6 ns SelectRAM+ access time
— 3.3 ns Block SelectRAM+ access time
The function table can be smaller than the logic
71
Designing for 100+MHz
Multiplication By A Constant
Sometimes, data has to be “scaled”
— multiplied by a constant value
A full multiplier is too expensive
— it can multiply by a variable
— unnecessarily general and too
complex
Storing all multiples of the constant
is a better alternative
Constant
Input
Input
Multiplier
Array
Product
Table
Scaled
Data
Scaled
Data
— smaller and much faster
72
Designing for 100+MHz
16-bit Scaler
A 216-word product table is impractical
— partition the input into nibbles
– use 16-word LUTs for nibble products
– combine the partial products in adders
Roughly half the CLBs of a full multiplier
— for a 16-bit Coefficient:
36 CLBs vs.
62 CLBs
Pipeline the adders
for extra speed
Input
LUT
x4096
LUT
x256
LUT
x16
Scaled
Data
LUT
73
Designing for 100+MHz
Changing the Constant
The SRLUT mode can be used to update the table
— “push-only” stack
— last 16 bits loaded define the table
Constant
Register
Input
{
16-Bit Shift Register
A simple accumulator
computes all products
of a new constant
Output
Register
Clear
Load
Change
Constant
74
Designing for 100+MHz
Large Function Tables
Larger functions can be implemented in the
Block SelectRAM+
— 12-input functions
— micro-coded state machines
Data tables can also be implemented
— sine/cosine tables for DSP, for example
— dual-ported access gives the sine and cosine
simultaneously
— a simple address offset gives 90º phase shift for
accessing sine and cosine from a single table
75
Designing for 100+MHz
Block RAM/ROM Creation
CORE Generator
software creates
RAMs and ROMs
— simple GUI
interface
Initialization file is
loaded into RAMs
and ROMs at
configuration time
76
Designing for 100+MHz
Memory Summary
Virtex has two kinds of internal memory
— distributed SelectRAM+ for small RAMs
— Block SelectRAM+ for larger RAMs
SelectRAM+
—
—
—
—
0.6 ns read access time
16- and 32-word RAMs
16-word dual-ported RAMs
16-word shift registers
– sequential write/random-access read
– FIFOs, pipelining, LUT functions, etc...
77
Designing for 100+MHz
Memory Summary
Dual-ported 4096-bit Block SelectRAM+
— 3.3 ns read access time
— true dual-ported operation
– both ports are read/write
– ports can be clocked asynchronously
— configurable aspect ratio
– 4096 x 1 bit to 256 x 16 bits
– configure ports differently for width/rate conversion
High-speed SelectI/O access to external RAM
78
Designing for 100+MHz
Designing for 100+ MHz
Volts, Amps, and Watts
— DLLs and flexible I/O standards
— fast inter-chip communication
— simple rules for good signal integrity
Ones and zeros
— fast logic and fast interconnect
— dependable high performance
Bits and bytes
— distributed SelectRAM+
— dual-ported Block SelectRAM+
79
Designing for 100+MHz
The Virtex Family
X CV50
X CV100
X CV150
X CV200
X CV300
X CV400
X CV600
X CV800 X CV1000
System Gates
57, 906
108,904
164,674
236,666
322,970
468,252
661,111
888,439
1,124,022
Logi c Cel ls
1,758
2,700
3,888
5,292
6,912
10, 800
15, 552
21, 168
27, 648
Block RAM
32 Kb
40 Kb
48 Kb
56 Kb
64 Kb
80 Kb
96 Kb
112 Kb
128 Kb
User I /O
CS144
94
94
TQ144
PQ/HQ240
94
164
94
164
164
164
164
164
164
164
BG256
BG352
BG432
BG560
180
180
260
260
260
316
316
404
316
404
316
404
404
FG256
FG456
FG600
FG680
176
404
404
500
404
514
514
176
176
260
176
284
312
The complete Virtex Data Sheet is on your AppLinx CD-ROM
and at www.xilinx.com/partinfo/virtex.pdf
80
Designing for 100+MHz
Designing for 100+ MHz
81
Designing for 100+MHz