Ethernet: Reduced Gigabit Media Independent Interface (RGMII)
Even worse than the original MII, GMII used too many pins. For full tri-mode (10/100/1000) operation, a full 25 were required. This is becoming problematic not only for switches, but ordinary processors and FPGAs.
As when the RMII Consortium formed to produce RMII, a group of silicon makers got together to produce a Reduced Gigabit Media Independent Interface (RGMII). Since this was performed external to the ethernet working group, this interface will not be found in IEEE 802.3. The standard needs to be sourced separately. With distribution largely unrestricted, copies of the standard are mirrored locally: RGMII Version 1.3, Version 2.0.
Note: There is no relationship between RMII and RGMII. Implementations and concepts from one will not translate to the other.
Signaling (Clause 3) 🔗
Broadly speaking, RGMII is a DDR version of GMII.
Each direction has six signals: a source-synchronous clock (RXC
/TXC
), a control signal (RX_CTL
/TX_CTL
), and a four-bit data bus (RD
/TD
), for a total of twelve signals.
The half-duplex signals, CRS
and COL
, are jettisoned entirely and need to be reconstructed from the other signals if required.
These signals map directly to the GMII signals.
The rising edge of the clock captures the GMII enable/valid signal (RX_DV
/TX_EN
) and the lower four data bits.
The falling edge of the clock captures the GMII error signal (RX_ER
/TX_ER
) and the upper four data bits with one caveat.
As the error signal is usually low, this would result in increased EMI and power consumption during packet transmission from the constant switching.
To reduce this, the falling edge actually holds the exclusive-or (XOR) of the valid and error signals (called RXERR
/TXERR
in the specification).
RGMII | Rising Edge | Falling Edge |
---|---|---|
RX_CTL | RX_DV | RX_DV ^ RX_ER |
RD[3] | RXD[3] | RXD[7] |
RD[2] | RXD[2] | RXD[6] |
RD[1] | RXD[1] | RXD[5] |
RD[0] | RXD[0] | RXD[4] |
When illustrated as a waveform with a center-aligned clock:
Clocking (Clause 3.2, 3.3) 🔗
Important: Timing is the single most difficult aspect of RGMII. There are a lot of variables and configuration options. Even when using off-the-shelf implementations, one needs to pay close attention to the system design, component selection, and configuration to ensure a reliable link.
All buses in RGMII are source-synchronous. However, being DDR, this makes properly meeting setup and hold timings more difficult. There was a significant revision in RGMII version 2.0 in order to address this.
The clocks are specified as nominal 125 MHz (8 ns), 25 MHz (40 ns), and 2.5 MHz (400 ns) for Gigabit, 100 Megabit, and 10 Megabit respectively. Tolerance is specified as the Ethernet standard of 50 ppm; however, it also permits the period to vary by up to 10% from the nominal value (e.g. ±0.8 ns at Gigabit). The duty cycle for Gigabit is 45% ~ 55%, widening to 40% ~ 60% for 10/100, stricter than both MII and GMII. The later is convenient if one is doing a divide-by-five from 125 MHz in logic instead of switching clock frequencies internally. A jitter requirement of 100 ps was present in the original version but deleted in Version 1.2.
In the latest version of the standard, the driver on each bus (receive and transmit) provides an Internal Delay (RGMII-ID) to produce a center-aligned clock. At the driver, the device should generate a minimum setup and hold of 1.2 ns (TsetupT/TholdT). At the receiver, the device should plan for a minimum setup and hold of 1.0 ns (TsetupR/TholdR). Together, this requires a length match on the PCB to within 200 ps. The standard also gives nominal values in addition to minimum but as these are equal to the entire half-period, it’s not especially useful.
There are multiple viable options to create the delay.
Delay lines, such as Xilinx’s ODELAY
, are an option and used within Xilinx’s soft cores.
The nuclear option is to simply run the internal logic faster and drive the clock and data on different cycles.
The most common technique is to emit clocks of different phase from the on-board PLL:
the clock generating TXC
is delayed 90° from that driving the data.
This guarantees the widest setup and hold timings across all operating frequencies.
Prior to version 2.0, it was specified that the bus driver would output the data edge-aligned with the clock (within ±500 ps, TskewT). However, it also specified that the receiver should expect the clock to be skewed 1.0 ns ~ 2.6 ns (TskewR), consistent with the setup and hold timings of RGMII-ID. This requires the board designer add additional length to the clock trace relative to the control and data lines to produce the skew. Combining the worst cases leads to a target 1.5 ns ~ 2.0 ns of external delay on the clock (1.8 ns nominal), roughly a foot (30 cm) of extra line length on a standard FR-4.
Note: Effectively all PHYs in the wild conform to RGMII Version 2.0 with an internal delay on the output clock enabled by default (RGMII-ID). Given all the difficulty and confusion regarding clock phasing, most will also provide a programmable internal delay for both the receive and transmit clocks, accessible through the management interface. This can be convenient for FPGA designs, where the generation and reception of edge-aligned buses are frequently more straightforward.
Synopsis Design Constraints (SDC) 🔗
Compared to previous incarnations of xMII, the constraints for RGMII are much more complicated. Not only do we need to deal with DDR signaling, we need to also deal with the two versions of RGMII and the configuration options present in most PHYs.
First, we’ll start by defining our clocks, which are the same in all configurations. It’s important to use the minimum period so that we don’t miss timing at the extreme.
# From RGMII Specification, Version 2.0
set rgmii_period_min 7.2
set rgmii_period_nom 8.0
# Output (Transmit) Clock
# TODO: Update source and division to match logic
create_generated_clock -name TXC [get_ports TXC] \
-source [get_pins */TXC_gen/C] -divide_by 1
# Input (Receive) Clock
create_clock -name RXC -period $rgmii_period_min [get_ports RXC]
create_clock -name RXC_virt -period $rgmii_period_min
For output delays, it’s best to formulate the constraints using the RGMII-ID (Version 2.0) model.
We define the setup and hold in terms of the receiver and then adjust TXC
latency to reflect any routing mismatch or input delay configured at the PHY.
In the case of a fully complaint RGMII-ID device, this means TXC
uncertainty is bounded at ±0.2 ns (increasing the required output setup and hold to 1.2 ns).
If we’re adding external delay in keeping with RGMII Version 1.3 design guidelines, we’d set the TXC
delay to 1.5 ns ~ 2.1 ns.
Additionally, if the PHY we’re using has more tolerant specifications, we can update the setup and hold to reflect the device’s datasheet (e.g. 0.8 ns in the case of the Microchip LAN8830).
# From RGMII Specification, Version 2.0
# TODO: Update to datasheet values from selected PHY.
set rgmii_setuprx 1
set rgmii_holdrx 1
# TODO: Update to reflect actual design. These values reflect the standard.
set rgmii_txc_min -0.2
set rgmii_txc_max +0.2
# Output (Transmit) Constraints (RGMII-ID v2)
set_output_delay -clock TXC \
-min [expr {-$rgmii_holdrx - $rgmii_txc_max}] \
[get_ports {TX_CTL TD[*]}]
set_output_delay -clock TXC -clock_fall -add_delay \
-min [expr {-$rgmii_holdrx - $rgmii_txc_max}] \
[get_ports {TX_CTL TD[*]}]
set_output_delay -clock TXC \
-max [expr {$rgmii_setuprx - $rgmii_txc_min}] \
[get_ports {TX_CTL TD[*]}]
set_output_delay -clock TXC -clock_fall -add_delay \
-max [expr {$rgmii_setuprx - $rgmii_txc_min}] \
[get_ports {TX_CTL TD[*]}]
For input delays, it’s best to stick with a formulation consistent with how the device is configured, which is almost invariably RGMII-ID (Version 2.0).
Here, setup and hold are reversed from the previous block:
we define setup and hold in terms of the transmitter and then adjust RXC
latency to reflect the routing mismatch.
In the case of a fully complaint RGMII-ID device, this means RXC
uncertainty is bounded at ±0.2 ns as before (reducing the required input setup and hold to 1.0 ns).
The integrated delay (~Tcyc/2) is already considered by the nature of the formulation.
If the PHY we’re using has more tolerant specifications, we can update the setup and hold to reflect the device’s datasheet (e.g. 1.4 ns in the case of the Microchip LAN8830).
# From RGMII Specification, Version 2.0
# TODO: Update to datasheet values from selected PHY.
set rgmii_setuptx 1.2
set rgmii_holdtx 1.2
# TODO: Update to reflect actual design. These values reflect the standard.
set rgmii_rxc_min -0.2
set rgmii_rxc_max +0.2
# Input (Receive) Constraints (RGMII-ID v2)
set_input_delay -clock RXC_virt \
-min [expr {$rgmii_holdtx - $rgmii_rxc_max}] \
[get_ports {RX_CTL RD[*]}]
set_input_delay -clock RXC_virt -clock_fall -add_delay \
-min [expr {$rgmii_holdtx - $rgmii_rxc_max}] \
[get_ports {RX_CTL RD[*]}]
set_input_delay -clock RXC_virt \
-max [expr {$rgmii_period_min / 2 - $rgmii_setuptx - $rgmii_rxc_min}] \
[get_ports {RX_CTL RD[*]}]
set_input_delay -clock RXC_virt -clock_fall -add_delay \
-max [expr {$rgmii_period_min / 2 - $rgmii_setuptx - $rgmii_rxc_min}] \
[get_ports {RX_CTL RD[*]}]
In the case of a Version 1.3 peer, we need to define everything in terms of skew.
As we did previously, we define the skew in terms of the transmitter and then adjust RXC
latency to reflect the routing mismatch.
For a fully compliant implementation, this would be the specified delay of 1.5 ns ~ 2.1 ns; however, in an actual design this may be absent if an internal delay is added to the input clock instead (e.g. a Xilinx IDELAY
) or the intrinsic latency added by the clock network is sufficient.
In the later case, one might want to set the RXC
latency to ±0.2 ns, in keeping with the RGMII-ID routing uncertainty.
# From RGMII Specification, Version 1.3
set rgmii_skewtx_min -0.5
set rgmii_skewtx_max +0.5
# TODO: Update to reflect actual design. These values reflect the standard.
set rgmii_rxc_min 1.5
set rgmii_rxc_max 2.1
# Input (Receive) Constraints (RGMII v1.3)
set_input_delay -clock RXC_virt \
-min [expr {$rgmii_period_min / 2 + $rgmii_skewtx_min - $rgmii_rxc_max}] \
[get_ports {RX_CTL RD[*]}]
set_input_delay -clock RXC_virt -clock_fall -add_delay \
-min [expr {$rgmii_period_min / 2 + $rgmii_skewtx_min - $rgmii_rxc_max}] \
[get_ports {RX_CTL RD[*]}]
set_input_delay -clock RXC_virt \
-max [expr {$rgmii_period_min / 2 + $rgmii_skewtx_max - $rgmii_rxc_min}] \
[get_ports {RX_CTL RD[*]}]
set_input_delay -clock RXC_virt -clock_fall -add_delay \
-max [expr {$rgmii_period_min / 2 + $rgmii_skewtx_max - $rgmii_rxc_min}] \
[get_ports {RX_CTL RD[*]}]
Control Sequences (Clause 3.4) 🔗
The control sequences (RX_DV
/TX_EN
low, RX_ER
/TX_ER
high) are broadly the same as GMII.
The standard defines four sequences, three of which are identical to GMII:
Value | Transmit | Receive |
---|---|---|
0E | Reserved | False Carrier |
0F | Carrier Extend | Carrier Extend |
1F | Carrier Extend Error | Carrier Extend Error |
FF | Reserved | Carrier Sense |
Notable differences:
- The receive path will not generate
00
for inter-frame. This reserved unlike 802.3 Clause 35, which permits this behavior. - Assert LPI (
01
) has not been assigned. The RGMII specification predates EEE so this code was not yet standard. It is expected that PHYs supporting LPI will simply use the standard GMII code. - The receive path will generate
FF
when it needs to assertCRS
without assetingRX_DV
. This is only relevant for half-duplex operation.
In-Band Link Status (Clause 3.4) 🔗
Unlike previous specifications for xMII, RGMII includes an optional in-band link status.
During idle, when RX_DV
and RX_ER
are both low, the data lines encode the autonegotiation results.
This means that many applications can autoconfigure without needing to access the management interface, convenient for FPGA applications.
RGMII | Function | Values |
---|---|---|
RD[3] | Duplex | 1 - Full Duplex |
0 - Half Duplex | ||
RD[2] | Speed | 11 - Reserved |
10 - 1000 Mbps | ||
RD[1] | 01 - 100 Mbps | |
00 - 10 Mbps | ||
RD[0] | Link | 1 - Link Up |
0 - Link Down |
For example, the RD[3:0]
lines when a Gigabit connection is active will be 0xD
(1101
), indicating Full-Duplex, 1000 Mbps, and Link Up.
A normal 100Base-TX connection will report 0xB
(1011
) for Full-Duplex, 100 Mbps, and Link Up.
It should be emphasized that this feature is optional. The PHY is not required to provide it so be sure to consult your PHY’s datasheet. That said, I’ve yet to see a PHY that does not implement it.
Tri-Mode Operation (Clause 5) 🔗
In reduced bitrates (10/100), the clock rate will reduce to the appropriate MII clock (25 MHz or 2.5 MHz) and the data bus will only transmit four bits per clock cycle instead of eight.
However, unlike MII, the transmit clock remains source-synchronous and the control signals (RX_CTL
/TX_CTL
) remain unchanged.
The values for TD[3:0]
and RD[3:0]
on the falling clock edge are technically left undefined (the standard uses the word may) but it is near-universal to duplicate the same data on both edges if no other reason than reducing EMI and switching energy.
When shifting speeds (e.g. in response to in-band status), it is important that the MAC hold TX_CTL
low until TXC
has been established at the correct clock speed.
The PHY will do the same with RX_CTL
and RXC
.
While stretching the high or low periods is permissible, the introduction of clock glitches is not.
It is expected that the control sequences are also truncated to four bits, consistent with MII, but this is not explicitly stated in the standard nor is a list of updated control sequences provided. This is the position taken by PHYs such as the Microchip LAN8830.
Crossover 🔗
Crossover in this context refers to connecting two devices of the same class (PHY or MAC) directly. For example, connecting the TX of one MAC directly to the RX of a second MAC without an intervening PHY, or using a pair of PHYs as a media converter.
As both sides of the link are source-synchronous and largely identical in their operation, direct crossover is broadly compatible. The only complication would be possible truncation of the preamble in a PHY-to-PHY crossover. There are no expected complications in a MAC-to-MAC crossover.
Energy Efficient Ethernet 🔗
The RGMII specification predates Energy Efficient Ethernet (EEE). As such, guidance is not covered in the official specification. It is expected that it will largely follow the rules of GMII (including clock stoppage); however, one should consult their PHY datasheet for more specific guidance.
The Microchip LAN8830, for example, uses 01
for Gigabit Assert LPI (consistent with GMII) and 11
for 100 Megabit Assert LPI (consistent with MII’s four-bit encoding).
It additionally supports the suspension of TXC
after nine clock pulses, as with GMII and unlike MII.
However, as a consideration for DDR signaling, it does not continue to drive (or expect) Assert LPI while the clocks are halted.
Instead, the data lines are typically zeroed until resuming ordinary idle.
It will, however, drive Assert LPI for one cycle on the receive bus when the peer leaves LPI, consistent with GMII.
Half-Duplex (Clause 3.4.2) 🔗
Half-duplex is much the same as MII and GMII.
The primary difference is that the control signals, CRS
and COL
, need to be derived from the in-band information.
Carrier Sense, CRS
, is asserted when either of these two conditions are true:
RX_DV
is assertedRX_ER
is asserted andRXD[7:0]
contains one of the following values:- False Carrier,
0E
(Gigabit) orE
(10/100 Mbps) - Carrier Extend,
0F
(Gigabit Only) - Carrier Extend Error,
1F
(Gigabit Only) - Carrier Sense,
FF
(Gigabit) orF
(10/100 Mbps)
- False Carrier,
Collision Detected, COL
, is asserted when both of the following conditions are true:
- Carrier Sense,
CRS
, is asserted - Transmit Enable,
TX_EN
, is asserted
While not covered by Clause 3.4.2, it is assumed that (2) should also include the transmission of Carrier Extend and Carrier Extend Error. However, this is rather academic as it only applies to the broadly nonexistent Gigabit Ethernet.
Like the original version of RMII and unlike RMII 1.2, there is no mechanism to drop Carrier Sense prior to clocking the last byte of a packet.
Reduced Ten Bit Interface (Clause 3) 🔗
As with GMII, RGMII addresses the Ten Bit Interface (TBI) for transfer of the 1000Base-X PCS.
However, this encoding is distinct from a direct mapping of GMII.
Instead, the control pins (RX_CTL
/TX_CTL
) become the fifth bit of the data bus and is clocked directly.
RTBI | Rising Edge | Falling Edge |
---|---|---|
RX_CTL | RXD[4] | RXD[9] |
RD[3] | RXD[3] | RXD[8] |
RD[2] | RXD[2] | RXD[7] |
RD[1] | RXD[1] | RXD[6] |
RD[0] | RXD[0] | RXD[5] |