Phase Accumulator / CORDIC part 1

Disclaimer: I am not claiming any competency in signal processing / Verilog. This is just a documentation of what I’ve done (primarily for myself when I’ve forgotten what I did in 6 months!) If it helps you, great but I’m not brave enough to claim this is a tutorial.

Demodulating an AM signal can be done by multiplication by a cosine wave of the same frequency as the AM signal’s carrier frequency. (If this statement is not obvious, I suggest reading my intro to AM page link TODO before proceeding)

However, performing this the naive way by calculating the cosine of ωt and performing a floating-point multiplication for every value of t (bearing in mind a new sample comes in every clock cycle) is almost impossible or is, at best, going to be stupidly resource intensive. As such, one way used in SDRs is to use a phase accumulator to generate an angle θ which is equal to ωt and using an algorithm called CORDIC to calculate RF*cos(θ).

Phase Accumulator

ωt is just an angle that changes every timestep by an amount such that it performs f (=ω/2π) complete revolutions per second. In a discrete time system, t can only take values of n/fs where n is some integer and fs is the sampling frequency and t will increment by 1/fs in each timestep. This means that the angle (which I can going to call θ from now on) will increment by ω/fs = 2π*f/fs. This means that we could calculate θ at time t by knowing the value of θ at the previous timestep (t-1) and adding 2π*f/fs. The only reason for the 2π to be present is that the standard cosine function in most programming languages expects the angle to be specified in radians which have 2π radians per revolution. We can actually use any system we want to divide a revolution up, lets say we divide a circle into x steps, θ will increment by x * f / fs in each timestep. The convention in most mathmatics is to use radians and divide a circle into 2π. Radians are preferably mathematically as they have the simplest behaviour when differentiated / integrated and a couple of other nice properties. But, given that we are not differentiating / using any functions that expect angles in radians, we can define an arbitrary number of steps in a revolution as long as we are consistent. This is particularly important in the case of Verilog which, as far as I know, can only handle integers. Dividing a circle into 6 (closest integer to 2π) doesn’t seem like a recipe for accuracy to me! I initially chose to use 32 bit angles (so divided a complete revolution into just over 4 billion sections) at which point our angle increments by 2^32*f/fs on every clock cycle. The Verilog module for this is fairly simple, it has a 32 bit register (o_phaseAngle) which increments by some constant amount (i_phaseDelta) every clock edge. The reason for taking the increment as an input rather than calculating it from the desired frequency and the sampling frequency is that I wasn’t sure if asking for a multiplication and division by such large numbers in a single clock cycle was a great idea. Given I have an ARM core attached to the FPGA fabric, I think I’ll let that handle the maths. If someone understands this better than me, I’d be interested in your thoughts.


module phase_accumulator(
    input wire i_adcClock,
    input wire [31:0] i_phaseDelta,
    input wire i_resetn,
    output reg [31:0] o_phaseAngle
);

always @(posedge i_adcClock or negedge i_resetn) begin
    if(~i_resetn) begin
        o_phaseAngle <= 0;
    end else begin
        o_phaseAngle <= o_phaseAngle + i_phaseDelta;
    end
end

endmodule

Setting f/fs = 1/80 gave a phase increment of 53,687,091 (0x3333333). Simulating this with a 2ns clock period (not at all realistic, just the defaults) shows the angle increase and then overflow (as one full cycle was completed) after 160ns or 80 clock cyles. Result!

CORDIC Basics

For a excellent video on CORDIC that explains it much better than I can, check this video on Youtube https://youtu.be/TJe4RUYiOIg but in essence:

If we feed 0 in as y0, our RF signal as x0 and have θ=ωt (as done by the phase accumulator) , our output is [RF * cos(ωt), RF * sin(ωt)], the first element of which is our demodulated signal!

Unfortunately, calculating tan(θ) is still tricky to do for all values of θ. So what if we restrict tan(θ) to be a negative power of 2 (1, half, quarter, 1/8, 1/16 etc)? Computers can easily divide by 2 for an integer number – it’s equivalent to a bit-shift to the right. This means if we constrain tan(θ) to be 2-i where i is 0 or a positive integer, our rotation has become:

Ignoring K as it is just a constant gain for each value of i, we now have an addition/subtraction and some bit-shifts. Much better! This calculation also works for a clockwise rotation, just with some sign changes. The only problem is “how do we calculate this for an angle which does not satisfy the magic requirement of tan(θ) = 2-i?” Well, there’s nothing stopping us combining multiple rotations that are easy to calculate until we get close enough to the answer we want. As i increases, θ decreases so we can increase the number of iterations of this algorithm, tweaking our approximation by an increasingly fine amount, until our series of rotations adequately represent the angle we want. Here are the first 5 angles that satisfy our criteria:

iθ (degrees)
045.0
126.56505117707799
214.036243467926477
37.125016348901798
43.5763343749973515
First angles that satisy the CORDIC criteria

Say we wanted an angle of 13° (angles are defined following the standard convention of x-axis = 0°, positive rotations are counter-clockwise). Our first stage has a starting error (target angle – current angle) of 13° so we need to perform a postive rotation. We rotate positively by the first CORDIC angle (45°), our second stage has a starting error of -32° so we perform a negative rotation of the second CORDIC angle (26.56°), our third stage has a starting error of -5.44° etc. and so the algorithm homes in on the correct answer – shown below. The downside is that it takes has a processing delay of the number of iterations but I think I can live with that. Extending this example to many iterations gives the following:

Error in CORDIC approximation vs iteration

How good is “good enough”?

I now have two parameters that I can play with to adjust my accuracy, how many bits I specify my angle to and how many iterations of the CORDIC I run before deciding it is good enough. If I can’t specify my angle accurately enough, I’ll lose a lot of accuracy just trying to express the angle. e.g. if I only used 2 bits, I could only divide a cicle into 4 – this means expressing angles to the nearest 90° so I have an average of 22.5° degrees error (max error is 45°) that I can’t do anything about before I’ve even started. If I can specify my angle more accurately, I need more iterations of the CORDIC to achieve 0 error. The CORDIC is capable of reaching zero error, given enough iterations, as the angle is expressed using a fixed number of bits. As long as we have enough iterations such that tan(θ) becomes a value of 1 LSB, then we can adjust the rotation to be perfect as far as the CORDIC algorithm sees it. At this point, the remaining error is whatever error was introduced expressing the original angle as a finite number of bits (known as quantisation error). This is shown below, after 14 iterations, the CORDIC algorithm has an error of 0 and the Total error in expressing the angle comes purely from quantising the angle.

I hacked together some Python to plot the average angle error vs number of iterations / number of bits I’d specified the angle to.

The point where the error doesn’t improve for increasing iterations is where the error is entirely due to quantisation noise. I also realised this should be roughly the point where the number of iterations is equal to the number of bits the angle is expressed to. The result of the arctan function is scaled such that a full circle is 232 units.

When i is much greater than angleWidth, θ = 0 as all of the data has been right shifted out of the number! It also holds that the number of iterations can’t exceed the data width. Recalling our expression for the rotation:

If y0 or x0 are shifted more bits then they are wide, the second term on each line will be 0 and the rotation will just apply a scaling factor. I only have a 16 bit wide data bus so plotted error against number of how many bits I used to specify the angle for 16 iterations of the CORDIC algorithm.

RMS error vs angle precision for 16 iterations (due to 16 bit data bus)

This confirmed using the 32 bit angle will work just fine. I don’t know how an RMS error of 0.001 degree affects the signal quality but I’ll work with it. Also not sure if you can pad the data bus to get greater angular resolution (in theory, I should be able to perform more iterations before hitting the quantisation noise limit) but that’s a thought for another day.

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: