Device Clock Generation

🔥 Read this insightful post from Hacker News 📖

📂 **Category**:

📌 **What You’ll Learn**:

After building a CPU, utilities for
handling bus interconnects, several DMAs
and memory controllers, I often find my time focused on building interfaces
between designs and external peripherals. This seems to be where most of the
business has landed for me. Often, these peripherals require a clock output,
coming from the design, and so I’d like to spend some time describing how to
generate such a “device” clock.

Fig 1. A Basic SOC with Peripherals

There’s actually two topics that need to be discussed when working with modern
high speed peripheral design. One of them is generating the clock to be sent
to the peripheral, such as Fig. 1 above illustrates. The second one involves
processing a clock returned from the peripheral, as shown in Fig. 2 below.
This is a key component of high speed designs such as DDR memories, eMMC,
HyperRAM, or even NAND flash protocols. This second topic is one we shall
need to come back to at a later date.

Fig 2. Data returned with a clock

Today, I’d like to discuss how to go about generating a clock to control
device interaction.

I first came across this problem when building a
NOR flash controller,
based on first a SPI
interface and later a
Quad SPI interface.
My controller was designed for FPGAs,
and so the clock could be built with a single frequency.
This design had the added complication that the clock needed to be paused from
time to time. Specifically, the clock needed to be turned off when nothing
was going on. Likewise, the clock needed to be turned off for one cycle after
dropping (i.e. activating) the chip select pin, and for a couple cycles after
the transaction was complete but before raising (deactivating) the chip select.

I had to deal with a similar problem when controlling a HyperRAM, but …
that design failed when I wasn’t (yet)
prepared to handle the return clock properly. I did say this deserved an
article in its own right, did I not? Processing data on a return clock properly
can be a challenge.

I then built a similar design for ASIC
platforms. Unlike the
FPGA, the final clock speed wouldn’t be known until run time. It might be
that the design started at a slower clock speed, only to later speed up to
the full rate at run time. Unlike an FPGA which can be fixed later, there’s
really no room for failure in ASIC
work. At least
with an FPGA, if my board didn’t support a particular frequency, I could just
rebuild the design for the clock frequency it did support. This doesn’t work,
though, for an ASIC–since it tends to be cost prohibitive to rebuild the
design at a later time when you decide to connect it to a slower part than
the one you designed it for.

The next design I worked with was a NAND flash
design. NAND flash
can be a challenge, since the protocol requires you to start at a slow
frequency and only after you bring up the connection are you allowed to change
to a faster frequency. This particular
design was built for
ASIC environments, and so it depended upon an analog component generating all
the clocks I needed. This worked great, up until someone wanted to purchase
the design to work on an FPGA, then another wanted it to work on an FPGA, and
another and so on.

Fig 3. Single Data Rate (SDR) vs Dual Data Rate (DDR)
SDR

DDR

Just to add another twist to the problem, many protocols require data
transitions on both edges of the clock, a protocol often known as
“Dual Data Rate” (DDR). Unlike the other designs above, these often require a
clock that is 90 degrees offset from the data–so that each clock transition
takes place in the middle of each data valid window, rather than on the edges
of the window. This sort of “offset” clock is necessary to guarantee setup and
hold times within the slave peripheral. An example of the clock and data
relationship required by DDR as opposed to a traditional “single data rate”
(SDR) clock is shown in Fig. 3.

By the time I got to my SDIO/eMMC controller,
I think I finally had the clock division problem handled. An
SDIO controller needs bring up the SD card
at 400kHz, and then depending upon the card, the PCB, and the controller, the
speed may then be raised to 25MHz, 50MHz, 100MHz, or even 200MHz. The clock
may also be stopped whenever either there’s nothing to send or receive, or
when the SOC can’t load or unload the data to the controller. For example, you
might ask an SD card to read and thus produce many blocks of data, then read
the first two of these blocks into your internal buffers only to find that the
CPU is slow in draining those buffers. In that case, you would need to stop
the interface clock before the external card tries to send you a third block
of data that would have nowhere to go.

Other devices require user programmable device clock controllers, such as:

10M/100M/1Gb Ethernet controllers

While each of these speeds might use a single clock, building a truly
trimode controller requires some extra work.
(DDR) SDRAM controllers

SDRAM controllers from an FPGA standpoint tend to be simple: just produce a
clock. However, you can turn the clock off for better power performance.
Yes, there are rules … but we won’t get into those here today.
I2S

We discussed generating an I2S clock at a totally arbitrary
frequency some time ago.
I2C

In general, I2C is too slow to be the focus of this article. There is
an I3C protocol that is built on top of I2C. The techniques we discuss today
might work well for I3C masters, but I’m not nearly as familiar with those.
SPI – not just NOR flash

While SPI slaves have a device clock as well, handling these clocks is
fundamentally different from what I’m describing today. My focus today
will be on generating clock signals for the purpose of controlling
external devices–such as an SPI master might need to do.

Specifically, today I want to look at and discuss generating a clock with one
or more of the following characteristics:

Output Signal: We’re talking about interface clocks–those generated by
the “master” of the interface. These are digital signals, output from
either an FPGA (or ASIC) device.

The output may be accomplished via a component like an
ODDR or an OSERDES,
with or without an additional analog delay following.
Discontinuous: The clock may be discontinuous. Many protocols
(flash,
SDIO/eMMC, etc) allow or even require,
the clock to be stopped, or otherwise only toggled when there’s something to
send or receive. As mentioned above, stopping the clock may also be useful
for pausing a transmission in progress before a source buffer runs dry, or an
incoming buffer overflows.
Dynamic Frequency: Often, the outgoing clock needs to change frequency
during operation as part of the protocol. For example, the SDIO protocol
needs to start at 400kHz, and then increase to 25MHz (or more). Therefore,
a good clock generator will need to be able to naturally generate multiple
clock frequencies as the protocol requires.
Minimum pulse width: Switching between frequencies must be done by rule:
clock glitches must be fully disallowed and guaranteed against. Too-short
clock pulses cannot be allowed. Clock high and low durations must always be
at least a half period of the fastest allowable clock.
90 Degree Offset for DDR Signaling: As shown in Fig 3, many modern
protocols require both positive and negative edge signaling (DDR). This
drops the required clock frequency by 2x, reducing the bandwidth that must
be carried over the PCB for the same data rate. However, the clock signal
required to support such DDR signaling often needs to be delayed 90 degrees
from the data, so that it transitions in the middle of the data valid period.
Faster than the controller’s clock: Just to make matters worse, in my
eMMC design, I needed to generate a 200MHz
DDR device clock from a 100MHz system clock.

All this is to say that our goal today will be to create a divided clock using
digital, rather than analog, logic. (Yes, I can hear my analog engineering
friends jump in here with the comment that “Everything is analog!” God bless
you, my friends.)

The Problem

The first approach I often see to this problem is the straight forward
integer clock division approach. Generally, it looks something like the
following:

always @(posedge src_clk)
if (reset)
	counter <= 0;
else if (!active_clock)
	counter <= 0;
else // if (active_clock)
	counter <= counter + 1;

assign	dev_clk = (high_speed) ? (src_clk && active_clock)
			: counter[user_selected_bit];

In this case, active_clock controls whether or not the clock is stepping,
and user_selected_bit controls to what level of clock division we are
interested in. As for the src_clk, that can be either the system clock or
alternatively whatever is required to generate the fastest clock frequency
required by the protocol.

Note that we’ve done nothing to guarantee this clock won’t glitch between
speed selections, nor can we necessarily guarantee the minimum of two clock
rates. We’ll come back to these requirements later, albeit with a different
(better) implementation.

The user logic required to use this clock this looks very simple at first:

always @(posedge dev_clk or posedge reset)
if (reset)
begin
	// Reset logic
end else begin
	pedge_data <= // Logic controlling any flops based on the dev_clk
end

When a protocol requires data on both edges of the clock, getting the data
right for the second edge of the clock is also important. But, how shall we
output data on the negative edge of a clock we’ve just created out of thin
air? We’ll need to transition on the negative edge to do this.

always @(negedge dev_clk or posedge reset)
if (reset)
begin
	// Reset logic
end else begin
	nedge_data <= // Logic controlling the negative clock's data
end

assign	output_data = (dev_clk || !ddr_mode) ? pedge_data : nedge_data;

This approach leaves us with two problems. The first is that we’re using our
clock as a logic signal when we assign dev_clk to possible be the same as
our source clock. The second problem is that we are transitioning user logic
on this clock. Worse, though, we’re now transitioning our user logic on both
edges of the clock. This violates the
rules of good
digital logic design.

These aren’t necessarily issues when building ASIC designs. However, in FPGA
design, this clock will need to get onto the clocking network’s backbone
somehow, and that’s not automatic. Worse, this new clock is not the same
as the original src_clk–even when they are at the same frequency. There
will always be a delay between the two clocks–a delay that may not be
captured by pre-synthesis simulation, and so it can be a dangerous delay the
engineer isn’t expecting when building this logic.

This leads to two commercial ASIC design challenges. First, when designing an
ASIC IP, you want to be able to test as much of the IP on an FPGA as possible.
Non FPGA compatible logic needs to be moved to the periphery of the design and
carefully controlled. Second, from a business point of view, it helps to be
able to sell the ASIC design to FPGA customers in addition to ASIC customers.
So, even though you can do something like this on an ASIC, that doesn’t mean
you should.

There are other problems.

Clock domain crossings (CDCs)

Since the src_clk and dev_clk are now two separate and distinct clock
domains, you’ll need to properly manage every clock domain
crossing between these two
clock domains. This can create additional delays through what otherwise
might be high speed logic.

Likewise, the positive and negative edges of the same clock are also
(technically) separate clock domains. Moving between them is “possible, but
not recommended.”
Gating

You may have noticed we haven’t properly gated our clock above. Sure, we
used an active_clock signal to provide gating, but this signal does not
guarantee the maximum frequency of the output clock. This, however, is a
minor problem that most engineers reading this blog would be able to easily
fix with a little bit of additional logic.

Two problems in particular, though, become deal breakers when it comes to this
type of design. The first is that DDR interfaces often require a clock delayed
by 90 degrees from the data, as shown in Fig. 3 above. The simple approach
will not generate such a 90 degree delay. While one might use an analog delay
element, such as a Xilinx ODELAY element, to delay the clock signal by an
appropriate amount, this will only work for high speed clocks and not for
clocks less than 50MHz or so. The second problem is, what do you do when you
need a device clock that’s faster than your src_clk, like I did in my
SDIO/eMMC controller design?

As a result, we really need another approach.

The Solution

The basic solution is to return to the
rules, and so
avoid all transitions on the device clock edge at all. Instead, we’ll continue
to transition on our source clock and then use either an
ODDR or an OSERDES to generate
the final outgoing clock. In the meantime, we’ll treat the newly generated
device clock as a traditional logic signal–rather than a “clock” within our
design. That is, we’ll let it be and remain logic.

Let’s start by looking at Fig. 3 above, and dividing the clock period into
sections, as shown in Fig. 4 below.

Fig 4. Dividing the clock period

Nominally, we’d want at least two sections per clock–one for each piece of
data in a DDR transmission. Sadly, this isn’t enough, since the clock might
need to be offset by 90 degrees. Hence, we’ll need to break each clock
period into four logically distinct time periods. We can label these time
periods 3:0, from left-most or most-significant being 3 down to the right most
and least significant being 0.

From here, we can generate what I’m going to call a wide clock, four bits at
a time. This wide clock will then be output via a 4:1 OSERDES–if it is to keep
pace with the source clock within our design. At its
fastest speed, this clock will be either 0011 (where the MSB ‘0’ is
transmitted “first”), or 0110 if a 90 degree offset clock is required for
DDR transmissions (as shown in Fig. 4). At its next slowest speed, the clock
would be 0000 followed by 1111, or 0011 followed by 1100. Further
clock divisions will use wide clocks of 0000 or 1111.

If you wish to use an ODDR
instead of a 4:1 OSERDES, you can still use this approach, save that you
would be generating 2 wide clock bits at a time instead of four. The fastest
clock would be a repeating 01, but this fastest clock would be unable to
handle the 90 degree offsets of a DDR signal. The next fastest would be
either 00 followed by 11, or the 90 degree offset version of the same at
01 followed by 10.

If you want a clock running at twice your system frequency, you could use
an eight-bit wide clock signal, designed to feed an 8:1 SERDES. Your fastest
clock would become 00110011 (non–DDR) or 01100110 when working with DDR
signals.

That’s the first step–the wide clock.

The second step is to generate, together with the wide clock signal, two
other signals. The first signal, let’s call this new_edge, will indicate
that a new clock cycle is beginning. The second, which I shall call the
half_edge, will indicate that the second half of a clock cycle is beginning.
Both of these signals are also shown in Fig. 4 above, each indicating the
portion of the clock cycle they represent.

All three of these logic signals can be now generated by a “clock generator”
module.

If necessary, this clock can be stopped either at the clock generator, or
gated further down the signal pipeline by simply zeroing out the wide clock.

Let’s pause for a moment to illustrate what a “clock” like this might look
like.

We’ll start with the highest speed clock, running at the source clock rate.
This clock will have a wide clock of 0011, and new data on every clock edge.

Fig 5. Highest speed SDR

Fig. 5 shows all of these key signals. First, you can see the system clock,
which we called src_clk above, that everything is generated off of. Next, you
can see the IO clock we create, followed by the wide_clock used to create
it. This is followed by the new_edge control signal. This clock might be
the clock we would use for a data signal transitioning at once per clock (SDR).
Therefore, to illustrate, I’ve also illustrated what a couple periods of this
this data signal might look like.

Were this interface to run in DDR mode, sending one word of data on each edge
of the clock, then the wide_clock would need to be (repeatedly) set to
0110, as shown in Fig. 6 below.

Fig 6. Highest speed DDR

There are a couple key differences between Fig. 6 and Fig. 5 above. The first,
and perhaps most obvious, is that the data in Fig. 6 are output at two words
per system clock cycle. This is often desirable, in that twice the data rate
may now be achieved. The second difference is that the IO clock is now offset
90 degrees from the data, instead of 180 degrees. This is often necessary to
guarantee that there is a clock transition in the middle of the data valid
period. To make this happen, the wide_clock is now set to 0110 in each
clock period.

Using these clock signals, we can also pause the clock–as shown in Fig. 7
below.

Fig 7. Pausing the clock

Note that the key signals, such as new_edge and half_edge must also stop
when the clock pauses (stops). Because there is no clock signal, the data
output signals become don’t care. (For power reasons, I could see holding the
output at at its previous value for short periods of time, D2 in this case,
but that’s another discussion.)

This same signaling approach also works when dividing the clock speed by two.
Fig. 8 shows an example SDR signal with a clock speed set to half the system
clock speed.

Fig 8. SDR at half the system clock rate

Fig. 9 shows the same thing, but this time for a DDR signal with the clock
at half the system clock speed.

Fig 9. DDR at half the system clock rate

Before leaving this example, note how easy it was to change frequencies in
this representation: we just adjusted the wide_clock, and then the new and
half clock positions changed to match.

We can drop the clock frequency again to a quarter of the system clock speed,
as shown in Fig. 10.

Fig 10. SDR at a quarter of the system clock rate

We can also offset this clock by 90 degrees, as shown in Fig. 11.

Fig 11. DDR at a quarter of the system clock rate

When using this type of “wide” clock, user logic becomes simplified as well.
This “simplified” user logic is easily illustrated with an example. For this
example, let’s suppose we wished to control 8 data wires using this type of
divided clock signaling. Let’s also assume, for the purposes of this
illustration, that the source arrives via an AXI stream interface with signals
S_VALID and S_DATA[15:0], and a ready signal given by S_READY.

We’ll start with the wide_clock, new_edge, and half_edge signals from
the clock generator. Note that, as we propagate these signals through our
pipeline (below), we won’t send the wide_clock straight to the output pad,
but instead we’ll use it along side our data processing pipeline. This way,
if the pipeline must stall (and it might need to), the pipeline can also stall
the outgoing clock at the same time.

Hence, we’ll create a one clock delayed version of this wide_clock that
we can call outgoing_clock. Further, a second signal, active_clock,
can be used to keep track of whether or not we’ve committed to the current
clock cycle.

always @(posedge src_clk)
if (i_reset)
begin
	outgoing_clock <= 4'h0;
	active_clock <= 1'b0;
end else if ((S_VALID && S_READY) || (new_edge && second_edge))
begin
	// We commit to this clock if either
	// 1. We have new data and we are ready to consume this new data, *OR*
	// 2. We're in SDR (not DDR) mode, and we've already committed
	//	to a byte of data that we haven't (yet) sent.
	// In both cases, we need to start a clock period.
	//
	// Note that S_READY implies new_edge
	//
	outgoing_clock <= wide_clock;

	// The "active_clock" signal is used to let us know that we've committed
	// to this clock cycle.  From now until the next new_edge, we must
	// forward the wide_clock signal to the output.
	active_clock <= 1;
end else if (new_edge)
begin
	// The clock generator is creating an edge that ... we're not prepared
	// for or ready to handle.  There's just no data available, so ...
	// let's stop the clock.
	outgoing_clock <= 4'h0;

	// In this case, we're not forwarding the clock, nor will we until
	// the next clock period.
	active_clock <= 1'b0;
end else if (active_clock)
	// If we've already committed to this clock cycle, then we'll need to
	// ontinue it to its completion.
	outgoing_clock <= wide_clock;

Before we can get to the data, we need another key signal as well. This is
the second_edge signal that we used above. Here’s why: our data is going to
arrive, 16b at a time via AXI stream. If we are in DDR mode, then we’ll
consume 8b on each edge of this clock–and possibly all 16b at once. However,
if we are only in SDR mode, then we’ll need to consume the second 8b on the
next clock edge. Hence, we’re going to need a signal that I’m calling,
second_edge, to tell us that we have 8b remaining of the 16b committed to us
that didn’t get sent on the last clock tick.

always @(posedge src_clk)
if (reset && i_care_about_resets)
	second_edge   <= 0;
else if (S_VALID && S_READY)
	// In SDR, we just accepted 16b and output 8b.
	// We need another new_edge to send the remaining 8b.
	// Note that S_READY implies new_edge
	//
	// Also note that we only use this signal in SDR modes
	second_edge <= !ddrmode;
else if (new_edge)
	// On any (other) new_edge, we can clear this signal
	second_edge <= 0;

That leads us to the outgoing_data. This is a 16 bit data signal, consisting
of 8b, outgoing_data[15:8], which will be output on the first half of the
clock, and another 8b, outgoing_data[7:0], which will be output on the second
half of the clock. A third signal, next_byte, will be used for keeping track
of the second byte of data in the case where we don’t output both bytes in the
same clock period.

always @(posedge src_clk)
if (reset && i_care_about_resets)
begin
	outgoing_data <= 0;
	next_byte   <= 0;
end else if (S_VALID && S_READY)
begin
	// new_edge is implied by S_READY
	if (ddrmode && half_edge)
	begin
		// Set data for both halves of the clock
		//    The first half in the MSBs
		outgoing_data[15:8] <= S_DATA[15: 8];
		//    The second half in the LSBs
		outgoing_data[ 7:0] <= S_DATA[ 7: 0];

	end else begin
		// Set only the first half ot the data, but set it to be
		// output twice.  We'll need to come back later for the second
		// outgoing byte.
		outgoing_data <= 🔥;
	end

	// Keep track of that second byte, so we can come back to it later.
	next_byte <= S_DATA[7:0];
end else if (new_edge ||(ddrmode && half_edge))
begin
	outgoing_data <= ⚡;
end

The final signal we need to define is the S_READY signal. In this example,
we can accept new data on any new clock edge, unless we have 8b remaining
from the last clock edge that have yet to be output.

assign	S_READY = new_edge && !second_edge;

This approach provides us with a couple big advantages to our user logic over
what we had before.

First and foremost, all of our user logic now takes place on the same
src_clk.
We didn’t need any CDCs.
AXI slave data, generated externally on this src_clk can now be used within
our design on the same clock it was generated on.

Second, did you notice how we were able to simply gate the
clock when there was no
data available? If not, go back up and look again at the active_clock signal.

Third, unlike the previous approach, we’ve now guaranteed that this clock
signal won’t glitch. That is, assuming the outgoing OSERDES won’t generate
glitches from our glitchless data signals. The previous clock generator,
on the other hand, could well have had glitches between the clock and the
data enabling it.

Also look at how easy it was to do pipelined processing. The clock was
generated prior to our pipeline, and simply propagated through the pipeline.
Although this pipeline only contains a single clock cycle, we could’ve easily
extended the pipeline for multiple clock cycles if necessary by simply passing
the wide_clock, new_edge, and half_edge signals through the
pipeline–adjusting them if and where necessary along the way.

As a result of this example, all IO pins can now be driven using a 4:1
OSERDES. (You could also use
ODDRs for the data, if you
trusted them to have the same timing relationship as the OSERDES.)

What about frequency changes, or adjusting between the unshifted clock and
the clock shifted by 90 degrees? What about when the clock is off, and needs
to be turned on? All of these challenges and more now reside within the clock
generator.

The Clock Generator

For discussion purposes, let’s take a look at the
clock generator
I used for my SDIO/eMMC controller. As
mentioned above, this
clock generator
has the particular requirement of being able to generate two outgoing clock
periods per system clock cycle, but otherwise it’s a fairly straight forward
example of the discussion above.

From a configuration standpoint, there are a couple of configuration options.
For example, I wasn’t certain that I’d always have an 8:1 SERDES available
to me, nor do all digital environments necessarily offer 2:1
ODDR
components. Therefore, we allow those to be adjusted. Second, I want to know
the maximum number of bits required in my clock divider.

Still, these configuration parameters are fairly straightforward.

module	sdckgen #(
		// OPT_SERDES is required for generating an 8:1 output.
		parameter [0:0]	OPT_SERDES = 0,

		// If no 8:1 SERDES are available, we can still create a clock
		// using a 2:1 ODDR via OPT_DDR
		parameter [0:0]	OPT_DDR = 0,

		// To hit 100kHz from a 100MHz system clock, we'll need to
		// divide our 100MHz clock by 4, and then by another 250.
		// Hence, we'll need Lg(256)-2 bits.  (The first three speed
		// options are special)
		localparam	LGMAXDIV = 8
	) (

The clock generator
is primarily controlled via three signals. The first tells us whether we want
our clock offset by 90 degrees for DDR outputs or not. The second controls
the speed of the outgoing clock. The final signal tells us we can shut the
clock down.

		input	wire			i_cfg_clk90,
		input	wire	[LGMAXDIV-1:0]	i_cfg_ckspd,
		input	wire			i_cfg_shutdown,

When shut down, the wide clock output will be fixed at zero, as will both the
new_edge and half_edge control signals.

The shutdown signal is actually really useful at slow clock speeds. Sure you
could shut the clock down, as we did above, by just not forwarding it through
the pipeline. On the other hand, once the clock has been shut down, you’d like
to be able to restart it on a dime. The shutdown control signal to our
clock generator
allows us to do that. Once set, the
clock generator
takes the remainder of a clock cycle to shut down, and then stays ready to
restart the clock at a moments notice.

The outputs from this module are just about what you would expect. You
have the three signals we’ve already discussed. In this case, o_ckstb
is the new_edge signal we’ve mentioned, o_hlfclk is the
half_edge signal, and o_ckwide is the wide_clock signal.

		//
		output	reg			o_ckstb,	// new_edge
		output	reg			o_hlfck,	// half_edge
		output	reg	[7:0]		o_ckwide,	// wide_clock
		output	wire			o_clk90,
		output	reg	[LGMAXDIV-1:0]	o_ckspd
	);

The two new signals are o_clk90 and o_ckspd. These are feedback signals
returned to the control module,
used to tell us when any frequency shift or phase shift operations are complete.

These feedback signals solve an issue I was having in my eMMC
controller, where the clock would
be at some crazy low frequency (100kHz or so), and I’d want to speed it up.
Just setting the new clock speed wasn’t enough, since it might take a thousand
clocks to finish a single cycle at the 100kHz clock speed. However, by
checking these return signals via the register set, the software driver
could then tell if any clock frequency change had fully taken
effect
before going on to any next operation.

The next logic block is part of a two process finite state machine. The first
process, shown below, is the combinatorial process. The second will be
the clocked logic.

Personally, I’m not a big fan of two process state machines. I’m just not.
They often seem to me to be adding extra work and complexity. However,
two process state machines allow me to reference logic results even before
the full logic path is complete. They also allow me an ability to describe
more complicated logic than the simple single process state machine, so
a two process state machine it is.

In this case, we are going to generate the next signal for the strobe,
nxt_stb, the clock, nxt_clk, and the counter, nxt_counter.

Of these signals, nxt_clk is the simplest to explain. This signal indicates
that we’re about to start a new clock cyle. In many ways, this is the
combinatorial version of what is to become the new_edge once latched.

Clock cycles themselves come in four phases, just like the four bits of the
wide clock we discussed before. You can think of these phases as the 0110
of the fastest clock before. The first bit, 0, is the first phase of the
clock. Our new_edge bit, o_ckstb, will only ever be true on this phase.
The second bit, 1, is where the clock rises. The third bit, 1 again, is
the only phase where the half_edge, o_hlfck, will be set. Finally, the
clock will return to zero in the last phase. If the clock is ever idle,
it will idle in this first phase prior to delivering a new_edge signal.

This background will help explain how I’ve divided up the counter. There are
NCTR bits to the counter. Of those bits, the top two control the phase
bits we just described, whereas the others are the clock divider. The
nxt_stb signal, mentioned above and below, is simply a signal that these top
two phase-control bits are about to change.

With that as background, let’s take a look at how this works.

In general, the first step of any combinatorial block is to set all the
values that will be determined within the block. This is a good practice
to get into to avoid accidentally generating any latches.

	always @(*)
	begin
		nxt_stb = 1'b0;
		nxt_clk = 1'b0;
		nxt_counter = counter;

From here, we subtract one from the bottom (non-phase) bits of our counter
on every cycle. When these bits are zero, subtracting one will cause the
counter to overflow and set our nxt_stb signal, so we can know when to
adjust the phase bits.

		{ nxt_stb, nxt_counter[NCTR-3:0] } = counter[NCTR-3:0] - 1;

		if (nxt_stb)
		begin
			// Advance the top two bits
			{ nxt_clk, nxt_counter[NCTR-1:NCTR-2] }
						= nxt_counter[NCTR-1:NCTR-2] +1;

If our clock speed is set to 0 (wide clock of either 01100110 or 00110011)
or 1 (wide clock of 00111100 or 00001111), then we are always generating
a new clock cycle. In this case, we’ll hold the counter at zero and (roughly)
ignore the phase.

			if ((OPT_DDR || OPT_SERDES) && ckspd <= 1)
			begin
				nxt_clk = 1;
				nxt_counter[NCTR-3:0] = 0;

Likewise, if the clock speed is equal to two, the wide clock will either
alternate between 0000_0000 and 1111_1111, or 0000_1111 and 1111_0000,
and so our phase will alternate, but otherwise everything else can be kept
to zero.

			end else if (ckspd <= 2)
			begin
				nxt_clk = counter[NCTR-1];
				nxt_counter[NCTR-3:0] = 0;

Finally, in the more general case, we’ll just set the bottom bits to count
down from ckspd-3 to zero. Yes, this is “just” a counter, but the maximum
value is offset by three for the three special speeds we just discussed above.

			end else
				nxt_counter[NCTR-3:0] = ckspd-3;
		end

You may have noticed that we’ve only adjusted the bottom bits of this
counter–the bits that count down. We’ve done nothing to update the phase
bits at the top of this “counter”, so let’s handle those next. (Spoiler alert:
these MSBs don’t act like counter bits in this implementation.)

Of course, for the highest frequencies, the counter will never change. It
sits at zero, with a permanent next phase of 3.

		if (nxt_clk)
		begin
			if ((OPT_DDR || OPT_SERDES) && new_ckspd <= 1)
				nxt_counter = {2'b11, {(NCTR-2){1'b0}} };

When the speed setting is 2, we allow the top two bits to toggle back and
forth. If nxt_clk is set, we need to reset these bits only.

			else if (new_ckspd <= 2)
				nxt_counter = { 2'b01, {(NCTR-2){1'b0}} };

Finally, for the general case, we return the phase to zero and reset the
clock.

			else begin
				nxt_counter[NCTR-1:NCTR-2] = 0;
			end
		end
	end

This is only the first half of this “two process” FSM. The second half,
with respect to the counter, is just about as simple. Perhaps it is even more
so, given that we’ve done all of the hard work above.

	always @(posedge i_clk)
	if (i_reset)
	begin
		if (OPT_SERDES)
			counter <= 0;
		else if (OPT_DDR)
			counter <= { 2'b11, {(NCTR-2){1'b0}} };
		else
			counter <= { 2'b01, {(NCTR-2){1'b0}} };
	end else if (nxt_clk && i_cfg_shutdown)
		counter <= { 2'b11, {(NCTR-2){1'b0}} };
	else
		counter <= nxt_counter;

The big thing to notice here is the nxt_clk && i_cfg_shutdown. Remember, if
the user ever asserts i_cfg_shutdown, we need to wait for clock cycle to
complete before shutting it down. Hence, we wait for the nxt_clk signal
before acting. Then, once set, we leave the counter in a state where it
will perpetually set nxt_clk. This way, the moment i_cfg_shutdown is
released, we’ll be back to generating a clock again.

To explain this a bit better, imagine the clock generator is producing
an output clock from ten periods of the source/system clock: five system clocks
of 0000_000, followed by five more clocks of 1111_1111. Imagine
again that we’ve had several periods of these 10 clock cycles before the
user asserts the clock shutdown signal. We then wait another 10 cycles for the
clock to fully shut down. Now, if the user drops the shutdown signal after a
further 3 cycles, we could either wait another 7 cycles (to complete the 10),
or start immediately. Here, we try to arrange to start a stopped clock
immediately without violating any of our clocking rules.

The next signal, clk90, controls whether or not we’re generating an
clock offset from new_edge, o_ckstb, by 90 degrees or not.

	always @(posedge i_clk)
	if (i_reset)
		clk90 <= 0;
	else
		clk90 <= w_clk90;

	assign	o_clk90 = clk90;

This logic isn’t very interesting yet, since we’ve basically split a two
process FSM. It will become more so when we get to w_clk90, and the first
process of the FSM, below. The key is, this logic must determine what the
current 90 degree offset setting is. Hence, when you look at the outgoing
wide clock, this signal must match it.

How about the clock speed? In this case, we go through some error checking.

	initial	ckspd = (OPT_SERDES) ? 8'd0 : (OPT_DDR) ? 8'd1 : 8'd2;
	always @(posedge i_clk)
	if (i_reset)
		ckspd <= (OPT_SERDES) ? 8'd0 : (OPT_DDR) ? 8'd1 : 8'd2;
	else
		ckspd <= w_ckspd;

	always @(*)
	if (OPT_SERDES)
		new_ckspd = i_cfg_ckspd;
	else if (OPT_DDR && i_cfg_ckspd <= 1 && !i_cfg_clk90)
		new_ckspd = 1;
	else if (i_cfg_ckspd <= 2 && (OPT_DDR || !i_cfg_clk90))
		new_ckspd = 2;
	else if (i_cfg_ckspd <= 3)
		new_ckspd = 3;
	else
		new_ckspd = i_cfg_ckspd;

	assign	w_clk90 = (nxt_clk) ? i_cfg_clk90 : clk90;
	assign	w_ckspd = (nxt_clk) ? new_ckspd   : ckspd;

The error checking is here to guarantee that a clock speed of 0 is only used
when OPT_SERDES is set. Likewise, a clock speed of 1 may be used in
ODDR
mode (wide clock of 00001111), but not when the clk90 configuration
is set (calling for a wide clock of 0011_1100 which is too complex for an
ODDR output module to produce).
This continues for a clock speed of two which is fine for a non-offset clock
(wide clock of 0000_0000 followed by 1111_1111), but not for an offset
clock (wide clock of 0000_1111 followed by 1111_0000 unless the OPT_DDR
option is set.

Finally, the two values w_clk90 and w_clkspd are used to tell us what
values our registered logic should use when generating a clock. As such,
they are either the registered values, or (when we’re about to start a new
cycle) the new values.

With all this as background, we can now dig into the core of this
logic–generating the three key signals we will be outputting.

On reset, these signals will simply be set to indicate a clock of the
fastest rate, ready to go, but otherewise one that is idle (o_ckwide=0).

	initial	o_ckstb  = 0;
	initial	o_hlfck  = 0;
	initial	o_ckwide = 0;
	always @(posedge i_clk)
	if (i_reset)
	begin
		o_ckstb  <= 0;
		o_hlfck  <= 0;
		o_ckwide <= 0;

Next, if we want to shutdown the clock, we can only do so on nxt_clk.
When shutdown, the wide clock will be zero and the new edge signals willl
all be suppressed.

	end else if (nxt_clk && i_cfg_shutdown)
	begin
		o_ckstb  <= 1'b0;
		o_hlfck  <= 1'b0;
		o_ckwide <= 8'h0;

As mentioned above, the key here is that the clock can suddenly start if
the i_cfg_shutdown signal is released. Using this logic, it does not need
to remain phase coherent with whatever phase the clock had prior to being
shutdown.

Moving on to our highest speed clock, we simply set that according to
the 90 degree clock configuration. In general, this speed will only
ever generate one of two values: 01100110 or 00110011.

	end else if (OPT_SERDES && w_ckspd == 0)
	begin
		o_ckstb  <= 1;
		o_hlfck  <= 1;
		o_ckwide <= (i_cfg_clk90) ? 8'h66 : 8'h33;

When running from a 100MHz system (src_clk) clock, this plus the OSERDES
will generates a 200MHz clock signal to the external device.

One might argue that the OPT_SERDES here is really redundant. There should
be enough logic elsewhere to keep w_ckspd at a non-zero value if OPT_SERDES
is not set. Why use it?

It’s here specifically to provide a strong hint to the synthesis tool
regarding logic that can be cleaned up if OPT_SERDES is not set. This block
is complicated enough as it is, so adding it in should simplify our logic.

The problem with putting this value here, and generating a clock module based
upon parameters such as OPT_SERDES and OPT_DDR, is that I now need to
formally verify the IP under several conditions before I can know if it works.
This applies to simulation as well. It is now no longer sufficient to run
the simulation tool once when you do something like this. It must now be run
many times under different conditions. As an engineer, I need to be aware
of costs like this whenever I invoke logic like this.

In this case, I wanted to support multiple types of FPGAs (and/or ASICs), and
so this was the logic I chose.

Our next speed, ckspd=1, has almost the same logic. As before, o_ckstb
and o_hlfck are both set continually in this mode. In this case, our wide
clock output will either be 0011_1100 or 0000_1111 depending on whether
or not we need a 90 degree offset clock for DDR.

	end else if ((OPT_SERDES || OPT_DDR) && w_ckspd <= 1)
	begin
		o_ckstb  <= 1'b1;
		o_hlfck  <= 1'b1;
		o_ckwide <= (OPT_SERDES && w_clk90) ? 8'h3c : 8'h0f;

When running from a 100MHz system (src_clk) clock, this generates a 100MHz
clock as well.

You may note that there’s no real two-cycle output signal. The signaling,
with o_ckstb and o_hlfck, allows us to describe a new clock together
with or separate from the second half of that clock period, but offers nothing
for describing two clock cycles in the same source clock period. This is
just a limitation in our chosen signaling.

The solution to this problem is specific to the eMMC
controller that we’ve drawn our
example
from. In this case, I look at both the DDR setting and the
clock speed before generating any transmit data. From this, I determine if
I should be sending one byte, two bytes, or four bytes of data per clock.
The actual logic is more complex, due to the fact that the eMMC interface
may run in 1b, 4b, or 8b modes, but that’s the story of another piece of logic,
found outside of the clock controller.

As with clock speeds of either 0 (200MHz) or 1 (100MHz), the clock speed of 2
(50MHz) is also handled specially. This is the speed that alternates between
two outputs, generating either 00001111 followed by 11110000 in the offset
mode (o_clk90=1), or simply 00000000 followed by 11111111 in the normal
mode.

	end else if (w_ckspd == 2)
	begin
		{ o_ckstb, o_hlfck } <= (!nxt_counter[NCTR-1]) ? 2'b10 : 2'b01;
		if (w_clk90 && (OPT_SERDES || OPT_DDR))
			o_ckwide <= (!nxt_counter[NCTR-1]) ? 8'h0f : 8'hf0;
		else
			o_ckwide <= (!nxt_counter[NCTR-1]) ? 8'h00 : 8'hff;

When running from a 100MHz system clock (src_clk above), this generates
a 50MHz output clock signal. This might be the “fastest” speed you would
normally think of for an integer clock “divider”. As you can see, though,
we’ve already generated outgoing 200MHz and 100MHz clocks above.

This brings us to the general case–a divided clock running at less than half
our source clock rate. Here, we’ve already done all of the hard work for
nxt_clk, so the outgoing next edge signal o_ckstb is done.

	end else begin
		o_ckstb <= nxt_clk;

The half edge signal is determined by the counter. The lower bits must be zero,
indicating a new phase, and the top two bits indicate the new phase will be
the third of four–so just entering halfway.

		o_hlfck <= (counter == {2'b01, {(NCTR-2){1'b0}} });

The wide clock is determined by the top two phase bits of the next counter.
It’s either equal to the most significant bit, when there’s no clock offset,
or the exclusive OR of the top two bits when there is.

		if (w_clk90)
			o_ckwide <= {(8){nxt_counter[NCTR-1]
						^ nxt_counter[NCTR-2]}};
		else
			o_ckwide <= {(8){nxt_counter[NCTR-1]}};
	end

This leaves us with only one final signal: the current clock speed. In this
case, all the work has been done above, and nothing more need be done with it.

	always @(posedge i_clk)
		o_ckspd <= w_ckspd;

That’s the basic idea. In summary:

There are four phases to the outgoing clock, either 0011 or 0110.
A counter generally helps us know when to transition from one phase to the
next.
High speeds get special attention.
Data changes on the outgoing next edge signal, o_ckstb.

In DDR modes, data can also change on the outgoing o_hlfstb signal.

Key features of this approach include:

There’s no need for any clock domain
crossings in the outgoing data
path. All outgoing signals are handled in the source clock domain.
The clock may be gated at will, and (re)started quickly if necessary.
Frequency changes are controlled, and will take place between clock periods.
Although the clock is generated in logic, it doesn’t trigger any logic.
That is, nowhere in the design will anything in the outgoing logic path
depend upon either @(posedge dev_clk) or @(negedge dev_clk). Instead,
all of the logic is triggered off of the o_ckstb or o_hlfstb signals
while still running on the same src_clk we started from.

But … does it work?

Simulation testing

Just to get this clock generator off the ground, I built a quick simulation
test bench. You can
find it here, and we’ll walk through it quickly.

The first step was pretty boiler plate. I simply started a VCD trace, placed
the design into reset, and generated a 100MHz clock.

	initial begin
		$dumpfile("tb_sdckgen.vcd");
		$dumpvars(0,tb_sdckgen);
		reset = 1'b1;
		clk = 0;
		forever
			#5 clk = !clk;
	end

For the second step, I wanted to place the design in a variety of
configurations to see how it would work in each. I chose to leave it in each
configuration for five clock cycles before moving to the next. I then defined
a simple task, capture_beats, that I could call to wait out five cycles of
a given clock setting before moving on.

	task	capture_beats;
	begin
		repeat(5)
		begin
			wait(w_ckstb);
			@(posedge clk);
		end
	end endtask

The last step, then, was to walk through one clock setting after another
to see what would happen.

I started by taking the design out of reset, and configuring the inputs for
a (rough) 100kHz clock.

	initial begin
		{ cfg_shutdown, cfg_clk90, cfg_ckspd } = 10'h0fc;
		repeat (5)
			@(posedge clk)
		@(posedge clk)
			reset <= 0;

		// 100kHz (10us)
		capture_beats;

You can pretty well read the comments below to see the configurations I checked.

		// 200 kHz (5us)
		@(posedge clk)
			{ cfg_shutdown, cfg_clk90, cfg_ckspd } = 10'h07f;
		capture_beats;

		// 400 kHz (2.52us)
		@(posedge clk)
			{ cfg_shutdown, cfg_clk90, cfg_ckspd } = 10'h041;
		capture_beats;

		//   1MHz (1us)
		@(posedge clk)
			{ cfg_shutdown, cfg_clk90, cfg_ckspd } = 10'h01b;
		capture_beats;

		//   5MHz (200ns)
		@(posedge clk)
			{ cfg_shutdown, cfg_clk90, cfg_ckspd } = 10'h007;
		capture_beats;

		//  12MHz (80ns)
		@(posedge clk)
			{ cfg_shutdown, cfg_clk90, cfg_ckspd } = 10'h004;
		capture_beats;

		//  25MHz (40ns)
		@(posedge clk)
			{ cfg_shutdown, cfg_clk90, cfg_ckspd } = 10'h003;
		capture_beats;

		//  50MHz (20ns)
		@(posedge clk)
			{ cfg_shutdown, cfg_clk90, cfg_ckspd } = 10'h002;
		capture_beats;

		// 100MHz
		@(posedge clk)
			{ cfg_shutdown, cfg_clk90, cfg_ckspd } = 10'h001;
		capture_beats;

		// 200MHz
		@(posedge clk)
			{ cfg_shutdown, cfg_clk90, cfg_ckspd } = 10'h000;
		capture_beats;


		//  25MHz, CLK90
		@(posedge clk)
			{ cfg_shutdown, cfg_clk90, cfg_ckspd } = 10'h103;
		capture_beats;

		//  25MHz, CLK90
		@(posedge clk)
			{ cfg_shutdown, cfg_clk90, cfg_ckspd } = 10'h102;
		capture_beats;

		// 100MHz, CLK90
		@(posedge clk)
			{ cfg_shutdown, cfg_clk90, cfg_ckspd } = 10'h101;
		capture_beats;

		// 200MHz, CLK90
		@(posedge clk)
			{ cfg_shutdown, cfg_clk90, cfg_ckspd } = 10'h100;
		capture_beats;

		$finish;
	end

These are basically all of the configurations I wanted to use the design with.
Using the generated trace, I can visually see all of the signals within this
design working as intended. Further, unlike the formal verification we’ll
discuss next, I can actually see many clocks of this design. This allows
me to verify, for example, that the 100kHz, 200kHz, and 400kHz clock divisions
work as designed.

Sadly, this test is woefully inadequate for any real or professional purpose.

The biggest problem with this simple test bench
script
is that it’s not self checking. I can run it, but the only way to know if the
design did the right thing or not is to pull up a viewer and check the
VCD file.
Sure, this might get me off the ground, but it is horrible for maintenance.
How should I know, for example, if a small and otherwise minor change breaks
things?

The second problem with this test
bench
is that it does nothing to try out unreasonable input signals. How shall I
know, for example, that this design will never go faster than the fastest
allowed frequency? That is, it should only ever be able to go as fast as the
current speed, or the newly commanded speed.

Perhaps some of you may remember my comments on twitter about getting excited
to try this new design as a whole (not just the clock generator) on an FPGA,
only to be mildly (not) surprised that it didn’t work before all the formal
proofs were finished? (I couldn’t find them when I looked today …) Yeah,
there’s always a surprise you aren’t expecting that takes place when you work
with real hardware.

So, while this
looks nice, and while the resulting traces look really pretty,
this test bench
is highly insufficient.

Let’s move onto something more substantial.

Formal Properties

I like to think of this clock
module
as a basic clock divider. It’s not much more than a glorified counter,
together with a 4-state phase machine. Yeah, sure, you can run through all 4
states in one clock cycle, but it’s still not really all that much more.
Formally verifying this clock
generator
should therefore be pretty simple.

One of the big keys to this proof is the interface property
set.

I’ve discussed interface properties
before. The idea born
from the fact that one component, such as this clock generator,
is going to generate signals that another component, in this case the transmit
data generator,
will use. Further, these two proofs will be independent of each other. Hence,
anything the
transmitter’s
proof needs to assume should then be asserted in the
clock generator
and vice versa. That’s the purpose of the
property set.
The property set.
also greatly simplifies the assertions found within the design itself.

Still, let’s look over the design assertions for now. We’ll come back to
the property set in the next section.

We’ll start with the f_en signal.

	initial	f_en = 1'b1;
	always @(posedge i_clk)
	if (i_reset)
		f_en <= 1'b1;
	else if (nxt_clk)
		f_en <= !i_cfg_shutdown;

This just captures whether the clock should be shut down during the current
cycle or not. It’s that simple.

Many engineers just starting out with formal verification struggle to see
past the assertions and the assumptions within the language to realize they
can still use regular verilog when generating formal properties. In this
case, f_en is nothing more than a register which we are going to use in our
formal proof. Nothing prevents you from doing this. Indeed, you are more
than able to write more complicated state
machines
when generating formal properties as well.

Just make sure that your new logic doesn’t make the same expresesions as the
logic you are verifying, or you might convince yourself something works when
it doesn’t. When teaching, I like to explain this way: the best way to
verify that A divided by B is C is to multiply C and B together.
If the result of the multiply is A, then you’ve verified your result. Why
does this work? Because you use different logic paths in your brain for
division than you do for multiplication. Hence, if you make a mistake in
dividing, you aren’t likely to make the same mistake when multiplying.

The same is true of formal methods. You can use logic in formal methods, just
like you do in your design, you just don’t want to use the same logic lest
your mind falsely convinces you its right when it isn’t. This is sort of
like having one witness to a murder called onto the stand twice under the
same name.

Anyway, let’s move on.

The next step is to instantiate a copy of the clock interface
properties.

	fclk #(
		.OPT_SERDES(OPT_SERDES),
		.OPT_DDR(OPT_DDR)
	) u_ckprop (
		.i_clk(i_clk), .i_reset(i_reset),
		//
		.i_en(f_en),
		.i_ckspd(o_ckspd),
		.i_clk90(clk90),
		//
		.i_ckstb(o_ckstb),
		.i_hlfck(o_hlfck),
		.i_ckwide(o_ckwide),
		//
		.f_pending_reset(f_pending_reset),
		.f_pending_half(f_pending_half)
	);

See how simply that was?

In addition to the assertions within this property
set,
the property set
provides two output signals that we can use to connect the state of our
design to the internal state of the property
set.
These signals are:

f_pending_reset

This otherwise annoying signal is required for us to be able to handle
the clock anomalies between reset and the first clock strobe. This signal is
set on a reset, and released once the clock gets started.
f_pending_half

This signal is simpler. It simply means that we’ve seen the new_edge
(o_ckstb) and not the half_edge herein called o_hlfck. If
f_pending_half is true, then the clock must generate o_hlfck before it
can generate o_ckstb.

With these signals, we can express things like this:

	always @(*)
	if (!i_reset && !o_hlfck && !o_ckstb && !f_pending_reset)
		assert(f_pending_half == (counter[NCTR-1:NCTR-2] < 2'b10));

This helps us through long periods of time with neither o_hlfck or o_ckstb.
During this time, f_pending_half should be equivalent to the top two bits
of our counter being either 2'b00 or 2'b01.

Let’s look at some other assertions.

For example, if we shut the clock down, then we shouldn’t get any more new
edges, o_ckstb:

	always @(posedge i_clk)
	if (f_past_valid)
	begin
		if ($past(!i_reset && i_cfg_shutdown))
		begin
			assert(!o_ckstb);
		end

Now we can look at some of the specific options. For example, the clock
speed should only be zero (200MHz) if OPT_SERDES is set. While set to zero,
either o_ckstb should be set on every clock cycle or we should’ve received
a clock shutdown request.

		if (ckspd == 0)
		begin
			assert(OPT_SERDES);
			assert(o_ckstb || $past(i_cfg_shutdown));
			assert(counter == 0
				||counter == {2'b11,{(NCTR-2){1'b0}} });
		end

Likewise, we should only ever be in a clock speed of 1 (100MHz) if either
OPT_SERDES or OPT_DDR are set. Further, if OPT_SERDES is not set, we
shouldn’t ever be implementing a 90 degree clock offset.

		if (ckspd == 1)
		begin
			assert(OPT_SERDES || OPT_DDR);
			if (!OPT_SERDES)
			begin
				assert(!clk90);
			end
			assert(counter == {2'b11,{(NCTR-2){1'b0}} });
		end

A clock speed of two (50MHz) is available to all configurations. In this case,
the bottom bits–the non-phase description bits–must always be zero.

		if (ckspd == 2)
			assert(counter == 0
				|| counter == {2'b01,{(NCTR-2){1'b0}} }
				|| counter == {2'b10,{(NCTR-2){1'b0}} }
				|| counter == {2'b11,{(NCTR-2){1'b0}} });

Finally, in all other clock speeds, all we insist is that the lower bits of
the counter be less than the clock speed minus three.

		if (ckspd >= 3)
			assert(counter[NCTR-3:0] <= (ckspd-3));
	end

There are only two ways both o_ckstb and o_hlfck can be true at once.
The first is if the speed indicates either 200MHz or 100MHz. The second is
if the clock is stopped, and so the wide clock output is zero and a new
clock is expected on the next clock cycle.

	always @(*)
	if (!i_reset && o_ckstb && o_hlfck)
		assert(ckspd <= 1 || (o_ckwide == 0 && nxt_clk));

The difficult part of these assertions is that these aren’t enough to
limit the output of the clock
generator.
Just to make certain the outputs are properly limited, I enumerate each
together with the conditions they may be produced.

We’ll start with a zero output. This can come from either a stopped clock,
or one of two slow clock situations.

	always @(*)
	if (!i_reset)
	case(o_ckwide)
	8'h00: if (nxt_clk)
		begin // A stopped clock
			assert(counter == {2'b11,{(NCTR-2){1'b0}} }
					|| ckspd == 0);
		end else if(!clk90)
		begin // In slow situations with no offset
			assert(counter[NCTR-1] == 1'b0);
		end else if(clk90)
		begin // In slow (DDR) situations with a 90 degree clock offset
			assert(counter[NCTR-1:NCTR-2] == 2'b00
				||counter[NCTR-1:NCTR-2] == 2'b11);
		end

An output of 8'h0f means we’re either in speed one with no clock offset
and both clock edges active, or we’re in the first half of speed two.

	8'h0f: assert((!clk90 && ckspd == 1 && o_ckstb && o_hlfck)
			||(clk90 && ckspd == 2 && o_ckstb));

An output of 8'hf0 can only mean we’re in the second half of speed two.

	8'hf0: assert(clk90 && ckspd == 2 && !o_ckstb && o_hlfck);

An output of 8'hff is common at slow speeds, but also completely determined
by thee two top phase bits of the counter.

	8'hff: if(!clk90) assert(counter[NCTR-1] == 1'b1);
		else
			assert(counter[NCTR-1:NCTR-2] == 2'b01
				|| counter[NCTR-1:NCTR-2] == 2'b10);

The last several outputs are very specific to their settings. 8'h3c is
only possible in a speed of 1 with a 90 degree clock offset.

	8'h3c: assert( clk90 && ckspd == 1 && o_ckstb && o_hlfck);

That leaves the two possible double-clock outputs. First, the double clock
with no 90 degree offset.

	8'h33: assert(!clk90 && ckspd == 0 && o_ckstb && o_hlfck);

The last possibility is the double clock with the 90 degree offset.

	8'h66: assert( clk90 && ckspd == 0 && o_ckstb && o_hlfck);

Everything else is specifically disallowed.

	default: assert(0);
	endcase

Interface File

While I might like to leave things there, a full proof of this
clock generator
requires we go over the formal interface
file.

Remember, the purpose of the formal interface file is to separate two proofs.
In this case, we want to both formally verify the
clock generator,
as well as the
transmitter data generator
that will use the results of the
clock generator.
Further, unlike the
clock generator,
the transmitter data generator
doesn’t really care if the signals to and from the
clock generator are realistic. It only cares that
they follow whatever rules it requires–things like either
1) both new_edge && half_edge at the same time, or 2) an alternating
new_edge with the half_edge, and so forth.

You can find this formal interface
file
among the other files associated with the formal proofs for this design.
Although it is written in Verilog, it’s not really something that could or
would be synthesized. For this reason I keep it in the bench/formal
subdirectory of the project, rather than the rtl/ subdirectory.

Starting at the top, our
property set
must operate in at least three configurations: 1) in an environment where the
wide_clock commands an 8:1 OSERDES, 2) an environment where it commands an
ODDR instead, or 3) a simpler
environment where neither option is available to us.

module	fclk #(
		parameter	[0:0]	OPT_SERDES = 1'b0,
					OPT_DDR    = 1'b0
	) (

Yes, we’ll need to run at least
3 formal proofs,
one for each option, to make sure we’ve truly captured each option. This,
however, is just the price of doing business with configurable logic.

Our formal properties
will need the same inputs as the
clock generator.
The outputs of the
clock generator
also need to be listed as inputs to this property set.
While the formal property set
will primarily consist of assertions and assumptions, it will also produce
two outputs–as discussed above. These are necessary for making sure the
formal property set’s
state is consistent with the internal state of the design.

		input	wire		i_clk, i_reset,
		//
		input	wire		i_en,
		input	wire	[7:0]	i_ckspd,
		input	wire		i_clk90,
		//
		input	wire		i_ckstb, i_hlfck,
		input	wire	[7:0]	i_ckwide,
		//
		output	reg		f_pending_reset,
		output	reg		f_pending_half
	);

Some of you may recall the challenges I’ve struggled through when trying to
verify two co-dependent components.
My original approach was to swap assumptions and
assertions between the
two components. This didn’t
work,
primarily because it was possible for the resulting assumptions to render
one or more assertions to be irrelevant or vacuous. In that example, the logic
of a design acted as an assumption as well.

In our case, we’re going to disconnect the two designs that will use this
property set entirely. The
clock generator (the master)
will make assertions that the
transmitter data generator will later assume, and vice versa.
To make this work, we’ll have the SymbiYosys
script
for the clock generator
define a CKGEN macro. This will then tell us whether this property set is
being used as part of the
clock generator’s proof, or the
transmitter data generator’s.
If a part of the clock generator’s
proof, we’ll make assertions about our outputs. If a part of the
transmitter data generator’s
proof, those “outputs” will now be inputs of the transmitter data
generator,
and so we should be making assumptions about them instead. To do this, we’ll
create a macro, SLAVE_ASSUME, that can be used to describe properties of
these outputs with either assert or assume statements.

`ifdef	CKGEN
`define	SLAVE_ASSUME	assert	// Clock generator proof
`else
`define	SLAVE_ASSUME	assume	// Transmit data generator proof
`endif

The next step is boiler plate: create an f_past_valid register to let us
know if we can use the $past() function or not. (Remember, $past()s value
is invalid on the first clock of any proof.)

	reg		f_past_tick, f_past_valid;
	reg		last_reset, last_en, last_pending;
	reg	[7:0]	last_ckspd;

	initial	f_past_valid = 0;
	always @(posedge i_clk)
		f_past_valid <= 1;

Likewise, f_pending_reset, will be true between the i_reset signal and the
first clock edge.

	initial	f_pending_reset = 1'b0;
	always @(posedge i_clk)
	if (i_reset)
		f_pending_reset <= 1'b1;
	else if (i_ckstb || i_hlfck)
		f_pending_reset <= 1'b0;

Our second output, f_pending_half, is true from the top of the clock to
the second half of the clock, but only if the top of the clock didn’t
include the half_edge signal (called i_hlfck herein).

	initial	f_pending_half = 1'b0;
	always @(posedge i_clk)
	if (i_reset)
		f_pending_half <= 1'b0;
	else if (i_ckstb)
		f_pending_half <= !i_hlfck;
	else if (i_hlfck)
		f_pending_half <= 1'b0;

A third signal, f_past_tick, will allow us to reason about whether or not
we just passed an edge. We’ll get to this one in a bit.

	initial	f_past_tick = 0;
	always @(posedge i_clk)
		f_past_tick <= i_ckstb || i_hlfck;

Now that we have these two signals, we can state with a certainty that
we can’t start a new clock cycle while waiting for the second half of a clock
cycle. Likewise, if we are in second half of a clock cycle, we shouldn’t see
the half edge again unless we’re starting a new (and high speed) clock.

	always @(posedge i_clk)
	if (!i_reset && !f_pending_reset)
	begin
		if (f_pending_half)
			`SLAVE_ASSUME(!i_ckstb);
		else if (i_hlfck)
			`SLAVE_ASSUME(i_ckstb);
	end

Now, with this as background, we can now make assertions about our various
clock speeds, and the outputs that should be produced in each. Note that in
this formal property set,
the i_ckspd input reflects our current clock speed, and not just the
requested clock speed that we worked with in the clock
generator.
Hence, it is an output of the generator clock
generator,
and no longer the requested clock speed.

Let’s start with the highest speed (200MHz) clock output.

	always @(posedge i_clk)
	if (!i_reset)
	case(i_ckspd)
	0: begin
		// We can only run in this speed if OPT_SERDES is set.
		`SLAVE_ASSUME(OPT_SERDES);

		// This speed has no pending half cycles.  All clock cycles
		// are complete in one cycle.
		`SLAVE_ASSUME(f_pending_reset || !f_pending_half);
		if (i_ckwide == 0)
		begin
			// Clock is either *off*/inactive, or we're still coming
			// out of a reset.
			`SLAVE_ASSUME(f_pending_reset || (!i_ckstb && !i_hlfck));
		end else begin
			// Clock is active, both edges are active in a clock
			// tick
			`SLAVE_ASSUME(i_ckstb && i_hlfck);
		end

The wide_clock output, herein called i_ckwide, can only have one of two
values when active at this speed.

		if (i_clk90)
		begin
			// In the case of a 90 degree offset clock, if the
			// clock is active, it must be 0110_0110
			`SLAVE_ASSUME(i_ckwide == 0 || i_ckwide == 8'h66);
		end else begin
			// Otherwise, if the clock is active, it must be
			// 0011_0011
			`SLAVE_ASSUME(i_ckwide == 0 || i_ckwide == 8'h33);
		end end

Those are just the rules for 200MHz (assuming a 100MHz system clock).

Now let’s drop down a speed, and look at the 100MHz clock. In this mode,
the new edge and half edge signals must also be present on the same clock.
Likewise, there’s no allowable means to have a pending second half–the
first and second half must always show up on the same clock cycle.

	1: begin
		if (i_ckwide == 0)
		begin
			`SLAVE_ASSUME(f_pending_reset || (!i_ckstb && !i_hlfck));
		end else begin
			`SLAVE_ASSUME(i_ckstb && i_hlfck);
		end

		if (!f_pending_reset)
			`SLAVE_ASSUME(!f_pending_half);

At 100MHz, the outgoing wide clock can only be 0011_1100 (90 degree offset),
or 0000_ffff. The former requires OPT_SERDES, the latter may also be
possible in OPT_DDR mode–since the first four bits equal the last four
bits.

		if (i_clk90)
		begin
			`SLAVE_ASSUME(i_ckwide == 0 || i_ckwide == 8'h3c);
			`SLAVE_ASSUME(OPT_SERDES);
		end else begin
			`SLAVE_ASSUME(i_ckwide == 0 || i_ckwide == 8'h0f);
			`SLAVE_ASSUME(OPT_SERDES || OPT_DDR);
		end end

Our last special clock speed is 50MHz. For this case, we break our properties
into two parts: the 90 degree offset, and the normal (SDR) case.

For the 90 degree offset clock, the clock must either be 0000_1111 if
we’re not waiting on the next half clock cycle, or 1111_0000 if we are.
Likewise, either the new or half edge signal must be true on every cycle.
The only exception is for if/when the clock is stopped. Further, this
output will require either OPT_SERDES or OPT_DDR.

	2: begin
		if (i_clk90)
		begin
			`SLAVE_ASSUME(i_ckwide == 0 || i_ckwide == 8'h0f || i_ckwide == 8'hf0);
			if (i_en)
			begin
				`SLAVE_ASSUME(i_ckwide != 0);
			end
			`SLAVE_ASSUME(OPT_SERDES || OPT_DDR);
			if (!f_pending_reset && f_pending_half)
			begin
				`SLAVE_ASSUME(i_ckwide == 8'hf0);
			end
			if (i_ckwide == 8'h00)
			begin
				`SLAVE_ASSUME(!i_ckstb && !i_hlfck);
			end else if (i_ckwide == 8'h0f)
			begin
				`SLAVE_ASSUME(i_ckstb);
			end else begin
				`SLAVE_ASSUME(i_hlfck);
			end

The normal offset is simpler. This doesn’t require OPT_SERDES or OPT_DDR.
The wide clock can either be 0000_0000 or 1111_1111. Further, if ever
the clock output is 1111_1111, then we must be on the second half edge.

		end else begin
			`SLAVE_ASSUME(i_ckwide == 0 || i_ckwide == 8'hff);
			if (i_ckwide == 8'hff)
				`SLAVE_ASSUME(i_hlfck);
		end end

This brings us to the default clock–the very slow clock generated by
integer division (i.e. the counter). As before, the wide clock can either
be 0000_0000 or 1111_1111 and hence needs no special hardware such as
either OPT_SERDES or OPT_DDR.

	default: begin
			`SLAVE_ASSUME(i_ckwide == 0 || i_ckwide == 8'hff);
			if (!f_pending_reset && !i_clk90 && last_en && i_en)
			begin
				if (i_ckstb)
				begin
					`SLAVE_ASSUME(i_ckwide == 8'h00);
				end else if (i_hlfck)
				begin
					`SLAVE_ASSUME(i_ckwide == 8'hff);
				end else if (f_pending_half)
				begin
					`SLAVE_ASSUME(i_ckwide == 8'h00);
				end else // if (!f_pending_half)
					`SLAVE_ASSUME(i_ckwide == 8'hff);
			end
		end
	endcase

Just as a quick sanity check, if we have no special hardware, then both
new and half edges can never be true on the same cycle.

	always @(posedge i_clk)
	if (!OPT_SERDES && !OPT_DDR)
		assert(!i_ckstb || !i_hlfck);

Let’s come back and double check the high speed cases. These are the only
cases where both new and half edge may be allowed at the same time. In all
other cases, one or both signals should be zero.

	always @(posedge i_clk)
	if (f_past_valid && !last_reset && (last_en || i_ckstb || i_hlfck))
	begin
		case(i_ckspd)
		0: `SLAVE_ASSUME(!i_en || (i_ckstb && i_hlfck));
		1: `SLAVE_ASSUME(!i_en || (i_ckstb && i_hlfck));
		default:
			`SLAVE_ASSUME(!i_ckstb || !i_hlfck);
		endcase
	end

Feel free to check the
property set
out yourself. While there are a couple more properties to it, these
are the most significant.

Coverage Checking

Any good verification set should include not just a simulation, not just
formal induction based proofs, but also a set of coverage checks.
These are critical to making sure you haven’t (accidentally) assumed away
some key component of the devices operation. Were that to happen, then
the formal proof would be irrelevant–even if it did pass.

Hence, we add some cover properties here to the
clock generator.

The first step is just to check if the clock is active, and if so, what mode
it is active in.

	reg		cvr_active, cvr_clk90;
	reg	[7:0]	cvr_spd, cvr_count;

	always @(posedge i_clk)
	if (!cvr_active)
	begin
		cvr_spd <= i_cfg_ckspd;
		cvr_clk90 <= i_cfg_clk90;
	end

	initial	cvr_active = 0;
	always @(posedge i_clk)
	if (i_reset)
		cvr_active <= 1'b0;
	else if (cvr_spd != o_ckspd || cvr_spd != i_cfg_ckspd || !f_en
			|| cvr_clk90 != i_cfg_clk90 || cvr_clk90 != clk90)
		// We want to prove what our clock output can do over
		// time, not so much what happens when/if it changes.
		cvr_active <= 0;
	else if (o_ckstb)
		cvr_active <= 1;

If the clock is active, we can then start counting every new edge that takes
place while active.

	always @(posedge i_clk)
	if (i_reset || !cvr_active)
		cvr_count <= 8'b0;
	else if (o_ckstb && !(&cvr_count))
		// Don't allow the counter to overflow, but otherwise
		// count the beginnings of each clock cycle.
		cvr_count <= cvr_count + 1;

With that as background, we can start looking at traces! Let’s get
cover traces for a variety of potential frequencies.

	always @(posedge i_clk)
	if (!i_reset)
	begin
		cover(cvr_spd == 2 && !clk90 && cvr_count > 2);	// 50MHz
		cover(cvr_spd == 3 &&  clk90 && cvr_count > 2);	// 25MHz
		cover(cvr_spd == 3 && !clk90 && cvr_count > 2);
		cover(cvr_spd == 4 &&  clk90 && cvr_count > 2);	// 12MHz
		cover(cvr_spd == 4 && !clk90 && cvr_count > 2);
		cover(cvr_spd == 5 &&  clk90 && cvr_count > 2);	//  8MHz
		cover(cvr_spd == 5 && !clk90 && cvr_count > 2);
		cover(cvr_spd == 6 &&  clk90 && cvr_count > 2); //  6MHz
		cover(cvr_spd == 6 && !clk90 && cvr_count > 2);
	end

We’ll have to handle covering the high speed options a bit differently. In
this case, we only want to check speeds requiring OPT_SERDES if
OPT_SERDES is actually checked. We can’t use an if for this, lest the
formal tool decide we failed the cover check. Hence, we’ll use a generate
statement, so that the cover statements requiring OPT_SERDES are only
generated if OPT_SERDES is true. Now we can check for 200MHz, 100MHz, and
50MHz.

	generate if (OPT_SERDES)
	begin : CVR_SERDES

		always @(posedge i_clk)
		if (!i_reset)
		begin
			cover(cvr_spd == 0 &&  clk90 && cvr_count > 5);
			cover(cvr_spd == 1 &&  clk90 && cvr_count > 5);
			cover(cvr_spd == 1 && !clk90 && cvr_count > 5);
			cover(cvr_spd == 2 &&  clk90 && cvr_count > 5);
			cover(cvr_spd == 2 && !clk90 && cvr_count > 5);
		end

We can apply the same logic to OPT_DDR, but we’ll have fewer clock options
to check. In this case, it’s only the 100MHz and 50MHz options.

	end else if (OPT_DDR)
	begin : CVR_DDR

		always @(posedge i_clk)
		if (!i_reset)
		begin
			cover(cvr_spd == 1 && !clk90 && cvr_count > 5);
			cover(cvr_spd == 2 &&  clk90 && cvr_count > 5);
			cover(cvr_spd == 2 && !clk90 && cvr_count > 5);
		end

	end endgenerate

By the time you get to this point, you should have a strong confidence that
this device clock generator
actually does what it needs to. I certainly do, and it hasn’t failed me (that
I recall) since going through this exercise. Yes, other parts of this design
have had problems, particularly the
front end, but the
clock generator
has been quite reliable.

Conclusions

This is now my go-to approach whenever I need to generate a device clock:

Generate the “clock” in logic.
Generate the “clock” wide, so it can be output via either OSERDES or
ODDR.
Maintain all logic transitions on the original source clock.
Use logical signals like you would enables to handle data transitions.

What did this gain us? We received several advantages from this approach:

This is everything we would want of an outgoing clock, with none of the
challenges associated with breaking the
rules. Indeed,
this approach works nicely in both FPGA and ASIC contexts, as I’ve now used it
quite successfully in both for multiple projects. No, I don’t use the same
clock generator for all my projects, but that’s for both
requirements (the 200MHz clock is unique) and legal
reasons.

This leaves us with the topic of the “return clock”, which we’ll need to come
back to and discuss on another day.

{💬|⚡|🔥} **What’s your take?**
Share your thoughts in the comments below!

#️⃣ **#Device #Clock #Generation**

🕒 **Posted on**: 1781252478

🌟 **Want more?** Click here for more info! 🌟