RX Algorithm in C

Over the past few weeks I tried my hand at the RX algorithm again, this time in C. I’ve posted my code on github if you’d like to comment or suggest improvements. I last wrote about this algorithm about 9 months ago, but Python was far too slow for it to be useful on a drone.

While I’m still tracking down a few memory leaks with Valgrind (an awesome tool), it can process a 1MP PNG image in about 1/5 of a second, including reading and writing. I’m still hoping to speed that up to be able to process a live video feed of around 15-30 fps, however.

The following is an example processed excerpt of Terry Sim’s footage.

You’ll notice that the blue tarp on the ground appears very bright because its pixels lie further from the mean RGB vector than others in the frame.

I’m confident that pairing these data with altitude, orientation, basic knowledge of the landscape and camera specifications (optics), it’d be possible to identify human-sized features of this type on the ground that stand out. However, with this algorithm they would have to be wearing bright clothing (the further from those wavelengths primarily reflected by foliage the better).

Compiling and Uploading C to Arduino over Serial

A couple days ago I took a break from other problems I was working on to flex my bash skills. I was given an Arduino UNO earlier in the year while working on some basic embedded programming projects, and I haven’t used it for much since then.

Because I like C programming (the Arduino “language” isn’t for me), I wanted the ability to write programs and upload them in the terminal. A quick Google search pointed to a great post by Belau, and the script that follows echoes his in some ways.

#! /bin/bash
# Compile avr-gcc and upload via serial

# ensure we're root user
if [ $UID != 0 ]
    echo "Must be root" 1>&2
    exit 1

# ensure we have the correct number of arguments - only the first is used
if [ $# -lt 1 ]
    echo "** Must submit file for compilation" 1>&2
    exit 1
elif [ $# -gt 1 ]
    echo "** Warning: only using first argument as file name"

    # check if there are input arguments use them as error message
    if [ $# -eq 1 ]
        if ! [ -e $FNAME ]
            echo "** File $FNAME does not exist, exiting" 1>&2
            exit 1
    elif [ $# -gt 1 ]
        # just use one
        echo "** Warning: next time just use 1 input for function \
        check_compilation $1
        echo "** please provide valid number of arguments" 1>&2
        exit 1

    printf "\n"
    WIDTH=$(tput cols)
    for ((i=0; i<$(( $WIDTH-2 )); i=i+1)) 
        printf "*" 
    printf "\n\n" 

BASE_FILE_NAME="${1%.*}" # kind of a redundant step for clarity 
DF_CPU="16000000UL" # processor speed (MHz) 
MCU="atmega16u2" # look up option for the processor 
SER_PORT="/dev/ttyACM0" # serial port 

echo "** compiling $BASE_FILE_NAME_c" 

if ! [ -e $BASE_FILE_NAME_c ] 
    echo "** File $BASE_FILE_NAME_c does not exist, exiting" 1>&2
    exit 1

avr-gcc $BASE_FILE_NAME_c -Os -DF_CPU=$DF_CPU -o $BASE_FILE_NAME_o -c -mmcu=$MCU

check_compilation $BASE_FILE_NAME_o
avr-gcc $BASE_FILE_NAME_o -o $BASE_FILE_NAME -mmcu=$MCU

# generate hexcode
check_compilation $BASE_FILE_NAME
avr-objcopy -O ihex -R .eeprom $BASE_FILE_NAME $BASE_FILE_NAME_h

# upload over serial
check_compilation $BASE_FILE_NAME_h
avrdude -F -V -c arduino -p ${MCU^^} -P $SER_PORT -b $BAUD_RATE -U \

# cleanup

echo "** finished compilation and upload"
exit 0

And, because it’d be silly to write a tool and not use it, here’s a somewhat unorthodox “Hello world” program.

/* Hello world in morse code */

#include <avr/io.h>
#include <util/delay.h>

#define DOT_DELAY_MS 200
#define DASH_DELAY_MS 500
#define CHAR_SPACE_MS 50
#define SPACE_MS 100

    PORTB |= _BV(PORTB5);   // turn on
    PORTB &= ~_BV(PORTB5);  // turn off



    PORTB |= _BV(PORTB5);
    PORTB &= ~_BV(PORTB5);

    dot(); charspace();
    dot(); charspace();
    dot(); charspace();


    dot();  charspace();
    dash(); charspace();
    dot();  charspace();

    dash(); charspace();
    dash(); charspace();

    dot();  charspace();
    dash(); charspace();
    dash(); charspace();

    dot();  charspace();
    dash(); charspace();

    dash(); charspace();
    dot();  charspace();

    DDRB |= _BV(DDB5);
    do {
    } while (1);

    return 1;

I was thinking of trying to use characters whose bits represented the dots and dashes, but I think the interpretation of these bits may be far more verbose.

Find the correct phrase: Trustpilot

I was inspired by a post on randomgooby about a question posted by Trustpilot, which may be seen here. The author provides a great solution in Go, but I wanted to try it with Python. I’ve honestly not played much with encryption or problem solving of this type, but it was a lot of fun to see my program produce the correct output!

From the outset I applied the author’s grep command to the wordlist because I couldn’t come up with a better regex. As the author mentions, this step is really important, because if you start with the total 99,175 words, the search space will be upwards of 99,1753=975,453,625,984,375 word combinations; however, filtering this list to only 1659 words yields 4,566,034,179 combinations, which is more than within reach of an hour or two of searching if we need to check every last one.

Here’s a straightforward solution in Python in just 30 lines of code.

#! /usr/bin/env python3
# -*- coding: utf-8 -*-

""" Determine the secret phrase for

from hashlib import md5
import sys

def obtainMatchingPhrase(fileName: str ='wordlist_clean') -> None:
    """ Find phrase """

    _ANAGRAM = 'poultry outwits ants'
    _MD5     = '4624d200580677270a54ccff86b9610e'

    with open(fileName, 'r') as words:
        # we need all the words in memory in this example
        rwords = words.read().splitlines()

    total_words = len(rwords)
    ttl = 0

    print('Anagram : ' + _ANAGRAM)
    print('md5     : ' + _MD5)

    # iterate over all possible combinations
    for w1 in rwords:
        # write progress percentage to output
        sys.stdout.write("Progress: %0.4f%%\r" % (100 * ttl / total_words**3) )
        for w2 in rwords:
            for w3 in rwords:
                comb = w1 + ' ' + w2 + ' ' + w3
                comb2 = comb.encode('utf-8')
                hashed = md5(comb2).hexdigest()

                if (hashed == _MD5):
                    print('md5(\'' + comb + '\')' + ' == md5')
                    print('Tested %.4f of total possible combinations' % \
                            (100 * float(ttl) / total_words ** 3))
                    ttl += 1
        print('No phrase found for md5 ■')

if __name__ == '__main__':

And the output upon locating the correct phrase:

Anagram : poultry outwits ants
md5     : 4624d200580677270a54ccff86b9610e
md5('pastils turnout towy') == md5
Tested 27.5991% of total possible combinations

I tried a bunch of other methods, such as enumerate(itertools.product(*([rwords]*3)) to form a single loop, etc., but the iteration speed dropped low enough to add an additional 2 hours to the search time.

In short, it would seem that simpler is better for problems that require a lot of speed. I would also like to point out that I used a return statement to end all three nested loops, as suggested on SO, as opposed to a bunch of continue and break‘s. Because the function type is essentially void, it works quite well in this situation.

I’ll be on the lookout for other problems of this type, it was a lot of fun!

Quick recipe for a bibliography with BibTeX

I like typing math homework in \LaTeX if I have the time, and lately I’ve wanted to find an easier way to cite sources and create a bibliography at the end of a paper, should I use any.

One way I’ve probably used in the past is to just list out all the information as it’s bound to be displayed in the document.

G.A.F. Seber and C. J. Wild.
\textit{Nonlinear Regression}. 
Wiley Series in Probability and Mathematical 
Statistics: Probability and Mathematical Statistics. 
John Wiley & Sons, Inc., New York, 1989.

But this is tedious.

A far quicker and easier way is to use BibTeX, which was hardly straightforward at first (why I’m writing this quick little tutorial for my future self and others).

First, go look look up your text on MathSciNet. We’ll look up the same text I’ve listed above. At the top we have the option to get BibTeX information from the drop down menu:

@book {MR986070,
    AUTHOR = {Seber, G. A. F. and Wild, C. J.},
     TITLE = {Nonlinear regression},
    SERIES = {Wiley Series in Probability and Mathematical Statistics:
              Probability and Mathematical Statistics},
 PUBLISHER = {John Wiley \& Sons, Inc., New York},
      YEAR = {1989},
     PAGES = {xxii+768},
      ISBN = {0-471-61760-1},
   MRCLASS = {62-02 (62J02)},
  MRNUMBER = {986070},
MRREVIEWER = {Roy T. Saint Laurent},
       DOI = {10.1002/0471725315},
       URL = {http://dx.doi.org/10.1002/0471725315},

Now copy/paste this information into a file *.bib (you may name it whatever you’d like). At the bottom of your \TeX document before \end{document} put the following lines:

\bibliographystyle{plain} % adjust styling here
\bibliography{<filename of your *.bib file>} % Don't include .bib ext

There are many different styles you can choose from, I’ve just kept it basic.

Now we recompile using pdflatex and bibtex – I’ve written the following simple bash script to automate the process if you’re in the same directory as the file.

#! /bin/bash
# Compile TeX documents with *.bib sources
# Append : `complete -g '*.tex' compile_latex_bibtex.sh` to your .bashrc
#     for automatic selection of the file ending upon ing

if (( $# != 1 )); then
    echo "Please provide filename." 1>&2
    exit 1

# the file we wish to work with

if ! [ -e $FILE ]; then
    echo "** File $1 requested does not exist." 1>&2
    exit 1

# Compile using latex and bibtex
pdflatex $BASE
bibtex $BASE
pdflatex $BASE
pdflatex $BASE

echo "** Compiled $FILE successfully, removing clutter"

# Remove all the 'clutter' files that get created here

# View the file with evince
if [ -e "$BASE.pdf" ]; then
    evince "$BAE.pdf" &
    echo "File $BASE.pdf does not exist"
    exit 1

exit 0

I think I also saw it in one of the Stack Exchange forums where someone suggested Python, so I wrote the following little script too.

#! /usr/bin/env python3.4
# -*- coding: utf-8 -*-

""" Compile a *.tex file """

import os
import subprocess
import sys
import argparse

class InvalidFileTypeError(Exception): pass

def main(arguments: argparse.Namespace) -> None:

    def process(prefix: str ='pdf') -> None:
            prefix + 'latex', 
            '-aux_directory=' + DIRECTORY, 
            '-output-directory=' + DIRECTORY,
            DIRECTORY + FILE

    # Loop over all files in arguments.inputfile
    for name in arguments.inputfile:
        if not os.path.isfile(name):
            raise FileNotFoundError('** File {0} does not exist'.format(name))
        elif not name.split('.')[-1] == 'tex':
            raise InvalidFileTypeError('** File {0} is not a TeX file'\
        # If there's a path to the file involved...
        DIRECTORY = '/'.join(name.split('/')[:-1]) + '/' \
            if len('/'.join(name.split('/')[:-1])) == len(os.getcwd()) \
            else ''

        # The filename may have multiple .'s
        FILE = ''.join(name[len(DIRECTORY):].split('.')[:-1])

        # Compile
        if arguments.bibtex:
            if len(DIRECTORY) == 0:
                # Don't call -aux_directory or -output-directory
                subprocess.call(['pdflatex', FILE])

                if arguments.bibtex:
                    subprocess.call(['bibtex', FILE])
                    subprocess.call(['pdflatex', FILE])
                    subprocess.call(['pdflatex', FILE])
                    print('File {0}.bib does not exist, normal compile')
                # Ensure all the extra files end up in same directory

                if os.path.isfile(DIRECTORY + FILE + '.bib'):
                    subprocess.call(['bibtex', DIRECTORY + FILE]) 
            if len(DIRECTORY) == 0:
                subprocess.call(['latex', FILE])
                subprocess.call(['pdflatex', FILE])
                subprocess.call(['latex', FILE])

        # Display
        if arguments.display and not arguments.convertdvi:
            if os.path.isfile(DIRECTORY + FILE + '.pdf'):
                    DIRECTORY + FILE + '.pdf',
                raise IOError(
                        'File {0} does not exist'.format(
                            DIRECTORY + FILE + '.pdf'
        elif arguments.display and arguments.convertdvi:
            # Convert .dvi to .ps 
            print('** Converting to dvi and displaying ps')
            if not len(DIRECTORY):
                    'latex', DIRECTORY + FILE
            if os.path.isfile(DIRECTORY + FILE + '.dvi'):
                print('** Displaying using dvips')
                subprocess.call(['dvips', DIRECTORY + FILE + '.dvi', 
                    '-o', DIRECTORY + FILE + '.ps'])
                    DIRECTORY + FILE + '.ps',
                print('** Could not locate file ' + DIRECTORY + FILE + '.dvi')
            # Don't display anything

        if arguments.delete:
            # Delete files that don't have *.tex or *.pdf extension
            files = filter(lambda f: FILE in f, os.listdir(os.getcwd()))
            check = lambda ext: ext == fl.split('.')[-1]

            for fl in files:
                if arguments.convertdvi:
                    if not any(map(check, ['tex', 'pdf', 'ps', 'dvi'])):
                        subprocess.call(['rm', fl])
                        print('** Removed {0}'.format(fl))
                    if not any(map(check, ['tex', 'pdf'])):
                        subprocess.call(['rm', fl])
                        print('** Removed {0}'.format(fl))

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='** ' + __doc__)

    parser.add_argument('inputfile', nargs='+', type=str,
      help='input file name')

    parser.add_argument('-bib', '--bibtex', action='store_true',
      help='compile with bibtex as well')

    parser.add_argument('-cdvi', '--convertdvi', action='store_true',
      help='compile document again with latex and convert *.dvi to *.ps')

    parser.add_argument('-d', '--display', action='store_true',
      help='display *.pdf')

    parser.add_argument('-dlt', '--delete', action='store_true',
      help='delete additional files that are created')
    args = parser.parse_args()
    del parser


Because I’m more comfortable with Python than Bash, I tried to make the script pretty flexible. Running either of these produces the results we were expecting on a sample document:


An example compiled source.


Now all you have to do is include `\cite{MR986070}` in the spots you wish to reference [1] (and I wish it were this simple in WordPress too :P). I’m sure there are other nifty tricks and details I’ve overlooked or haven’t come across yet, but I think this is the gist of it.

In the code I uploaded to GitHub for future reference I’ve also included a bunch of if-statements to remove files that clutter the directory, but these are of course optional.

Spacial and OA Convolution in Python

In UAV club last semester I was researching CNNs to perform computer vision tasks on our drone. My thought at the time was that a CNN trained on generated data representing all possible combinations of targets (along with a bunch of other processing to mimic the camera’s resolution limits, lighting stabilization for different times of day and cloud cover, etc.), would be an excellent classifier. Unfortunately, we didn’t have time to put one together before the deadline. However, I’m definitely looking to give it a try in the future.

The goal of this post is to try a bunch of algorithms and techniques to convolve images in a (hopefully) small amount of time.


We’ll start with a convolution class to hold the algorithms. The end goal is to create a plot of array size vs. time for each method.

We’ll also utilize a few libraries, namely NumPy’s FFT, trange from tqdm to watch the algorithms’ progress, the @jit decorator from Numba, and several threading functions, per Chris Keihl’s post on threading (I’ll talk more about this in a bit).

import numpy as np
from numba import jit
from tqdm import trange, tqdm
from numpy.fft import fft2 as FFT, ifft2 as iFFT
from numpy.fft import rfft2 as rFFT, irfft2 as irFFT
from numpy.fft import fftn as FFTN, ifftn as iFFTN
#from multiprocessing import Pool as ThreadPool
from pathos.multiprocessing import ProcessPool
from psutil import cpu_count
from types import FunctionType
from typing import List

class Convolve(object):
    """ contains methods to convolve two images """
    def __init__(self, image_array, kernel):
        self.array = image_array
        self.kernel = kernel

        self.__rangeX_ , self.__rangeY_  = image_array.shape
        self.__rangeKX_, self.__rangeKY_ = kernel.shape

        # to be returned instead of the original
        self.__arr_ = np.zeros(image_array.shape)
        # pad array for convolution
        self.__offsetX_ = self.__rangeKX_ // 2
        self.__offsetY_ = self.__rangeKY_ // 2
        self.array = np.lib.pad(self.array, 
            [(self.__offsetY_, self.__offsetY_),
             (self.__offsetX_, self.__offsetX_)], 

The array is also padded on each side with half the width of the kernel to allow convolution right to the edges. This preserves the dimensions of the final image.


I’ll start with a basic implementation of this algorithm; for those who haven’t written it before, I’ll be very explicit and demonstrate how straightforward the concept is.

Suppose we wish to convolve an array


with a kernel (some also refer to it as a filter)


As performed in the class __init__, we first pad this array with \lfloor\dfrac{\text{col},(\Phi)}{2}\rfloor=1 zeros on the left and right sides, and \lfloor\dfrac{\text{row},(\Phi)}{2}\rfloor=1 zeros on the top and bottom. This method is easiest if the dimensions of \Phi are odd; because we’ve ensured that the dimensions work out nicely, we obtain an array


As I mentioned at the outset, this new array allows the filter to go to the very edge of the original, and further to preserve the original dimensions of the image.

Now for the convolution step. We compute the following, yielding a 5\times 5 array, which is the same size as A (why we padded it with 0’s).



The entry (\tilde{A}\otimes\Phi)_{11}, for example, would be


Many people visualize this method as sliding \Phi along \tilde{A}, performing the matrix dot product, then updating the corresponding entry of A with that value. This product allows us to rewrite these two loops (sums, and borrowing a little of Python’s slicing syntax in the equations) as


but we’ll still need to loop over the kernel at some level to produce it.

The only trouble with this algorithm is the poor time complexity of \mathcal{O}(N^2n^2), where N^2 is similar to the size of A, and n^2 is similar to the size of \Phi. Programmatically, we will need four nested loops to perform convolution in this manner, which makes it a slow process.

I’ve implemented this algorithm in pure Python as the following method, which is, again, pretty slow.

def spaceConv2(self):
    """ normal convolution, O(N^2*n^2). This is usually too slow """

    # this is the O(N^2) part of this algorithm
    for i in range(self.__rangeX_):
        for j in range(self.__rangeY_):
            # Now the O(n^2) portion
            total = 0.0
            for k in range(self.__rangeKX_):
                for t in range(self.__rangeKY_):
                    total += self.kernel[k][t] * self.array[i+k, j+t]

            self.__arr_[i, j] = total

    return self.__arr_

The following additional example is the same as the method above, I’ve just written a nested function to emphasize the dot product I mentioned earlier. In the following graph it’s apparent that this method is slightly slower than that above (probably due to the extra N^2 function calls).

def spaceConvDot2(self):
    """ Exactly the same as the former method, just contains a 
    nested function so the dot product appears more obvious """

    def dot(ind: int, jnd: int) -> float:
        """ perform a simple 'dot product' between the 2 
        dimensional image subsets. """
        total = 0.0

        # This is the O(n^2) part of the algorithm
        for k in range(self.__rangeKX_):
            for t in range(self.__rangeKY_):
                total += self.kernel[k][t] * self.array[k+ind, t+jnd]
        return total
    # this is the O(N^2) part of the algorithm
    for i in range(self.__rangeX_):
        for j in range(self.__rangeY_):
            self.__arr_[i, j] = dot(i, j)
    return self.__arr_

As you can see, there are two outer loops that iterate over the entire original image size, then two inner loops that perform the matrix dot product. As it turns out, these methods are extremely slow.

One way we can speed up the dot function is by using NumPy.

def dotNumpy(subarray: np.ndarray, kernel: np.ndarray) -> float:
    """ Perform the matrix dot product with NumPy """
    return np.sum(subarray * kernel)

We could also compile this function with Numba as follows.

def dotJit(subarray: np.ndarray, kernel: np.ndarray) -> float:
    """ perform a simple 'dot product' between the 2 dimensional image 
    total = 0.0
    for i in xrange(subarray.shape[0]):
        for j in xrange(subarray.shape[1]):
            total += subarray[i][j] * kernel[i][j]
    return total

I found in testing that Numba provides a marked improvement over pure Python and NumPy after the first run, as the graph below demonstrates.


After the first run, Numba out-paced NumPy by about an order of magnitude; and both out-paced pure Python by several orders of magnitude.

The @checkarray decorator has similar functionality to the far more common @accepts decorator – it makes sure the input arrays have the same dimensions.

def checkarrays(f: FunctionType) -> FunctionType:
    """ Similar to the @accepts decorator """
    def new_f(*args, **kwd):
        assert reduce(lambda x, y: x == y, map(np.shape, args))
        return f(*args, **kwd)
    return new_f

Such a decorator is useful while testing the algorithms to ensure that the subset and kernel are the same size prior to attempting the product.

Another method we might use to reduce time complexity by a constant factor is threading, but in my own implementations it’ll cost us the brevity of the former methods.

By dividing the image into an even number of chunks we should be able to spread the computations over multiple cores. For a small number of cores (my laptop only has 2), the following will suffice; unsurprisingly, most of the lines are dedicated to solving for the optimal division. I’ve written divider (below) to solve for the optimal axis over which to partition.

(update Aug 10) It turns out that multiprocessing.dummy is restricted by the GIL, which explains why I wasn’t seeing any performance increase over the regular Numba implementation. I’ve updated the code below (and will continue updating it as I make simplifications and improvements) to use Pathos, which is a great library for parallel computation.

def spaceConvNumbaThreadedOuter2(self):
    """ `Block` threading example """

    def divider(arr_dims: List[List[List[int]]], coreNum: int =1):
        """ Get a bunch of iterable ranges; 
        Example input: [[[0, 24], [15, 25]]]"""
        if (coreNum == 1):
            return arr_dims

        elif (coreNum < 1):
            raise ValueError(\
          'partitioner expected a positive number of cores, got %d'\
                        % coreNum

        elif (coreNum % 2):
            raise ValueError(\
          'partitioner expected an even number of cores, got %d'\
                        % coreNum
        total = []

        # Split each coordinate in arr_dims in _half_
        for arr_dim in arr_dims:
            dY = arr_dim[0][1] - arr_dim[0][0]
            dX = arr_dim[1][1] - arr_dim[1][0]
            if ((coreNum,)*2 > (dY, dX)):
                coreNum = max(dY, dX)
                coreNum -= 1 if (coreNum % 2 and coreNum > 1) else 0

            new_c1, new_c2, = [], []

            if (dY >= dX):
                # Subimage height is greater than its width
                half = dY // 2
                new_c1.append([arr_dim[0][0], arr_dim[0][0] + half])
                new_c2.append([arr_dim[0][0] + half, arr_dim[0][1]])

                # Subimage width is greater than its height
                half = dX // 2
                new_c1.append([arr_dim[1][0], half])

                new_c2.append([arr_dim[1][0] + half, arr_dim[1][1]])

            total.append(new_c1), total.append(new_c2)

        # If the number of cores is 1, we get back the total; Else,
        # we split each in total, etc.; it's turtles all the way down
        return divider(total, coreNum // 2)

    def dotJit(subarray: np.ndarray, kernel: np.ndarray) -> float:
        total = 0.0
        for i in xrange(subarray.shape[0]):
            for j in xrange(subarray.shape[1]):
                total += subarray[i][j] * kernel[i][j]
        return total

    def outer(subset: tuple) -> Tuple[np.ndarray, ]:
        a, b, = subset
        ai, bi, = map(sub, *reversed(zip(*subset)))
        temp = np.zeros((ai, bi))

        for ind, i in enumerate(range(*a)):
            for jnd, j in enumerate(range(*b)):
                temp[ind, jnd] = dotJit(\
                    , self.kernel
        return temp, a, b

    # ProcessPool auto-detects processors, but my function above
    # only accepts an even number; I'm still working on it.
    cores = cpu_count()
    cores -= 1 if (cores % 2 == 1 and cores > 1) else 0

    # Get partitioning indices and the usable number of cores
    shape = [[[0, self.__rangeX_ - 1], [0, self.__rangeY_ - 1]]]
    partitions = divider(shape, cores)
    # Map partitions to threads and process
    pool = ProcessPool(nodes=cores)
    results = pool.map(outer, partitions)
    for ind, res in enumerate(results):
        X, Y, = results[ind][1:]
        self.__arr_[slice(*X), slice(*Y)] += results[ind][0]

    return self.__arr_

An example convolution with a Gaussian filter (which has a blurring effect) is below. The image array is 1000×596=596,000px, or 0.596Mp (so not too large), and it takes less than 8 seconds to process on my laptop with Numba. Without Numba or NumPy it takes almost 10 minutes!


An example image convolved with Numba and a 17×17 Gaussian kernel.

In the following graph we see the impact of increasing kernel scale on time for each method.


Numba allows an almost dependable amount of time per iteration, with NumPy being an almost constant half order of magnitude slower; threading Numba with Pathos on subsets of the original image makes the process even faster. The pure Python implementations are also almost unusable given their dependence upon small kernel size.

As suggested by the complexity, the amount of time to complete each iteration increases roughly quadratically as the kernel size increases, which is far easier to see in the pure Python implementations of spaceConv.


Because our goal is to convolve every pixel, I don’t think we can get below the N^2-part of this complexity. If you’re aware of a method that does, please feel free to let me know. This complexity is similar to that of the RX algorithm, because I think I was able to cut down on the overhead cost without too much sway in the resultant covariance matrix; however, since we apply the Mahalanobis metric to every single pixel, there’s a lower bound of complexity \Omega(N^2). (There are other very interesting posts online that have remarkable visuals and content, e.g. this one on blog.bogatron.net.)

In convolution the application of a filter to a subset of the array is an ‘overhead cost’. As you may have supposed from the title, we can reduce the time complexity of this inner-loop with the Convolution Theorem (CT). In a nutshell, the CT says that, for image functions I_{1}, I_{2},

I_{1}\otimes I_{2}=\mathscr{F}^{-1}[\mathscr{F}[I_1\otimes I_2]]


In words, the Fourier transform \mathscr{F} of I_{1} convolved with I_{2} is equal to the product of the Fourier transform of either image. There are a lot of extra details that come with discrete I_{1}, I_{2}, but the underlying concept is the same. From NumPy we imported the Fast Fourier Transform (FFT) and inverse Fast Fourier Transform (iFFT) to transform image arrays, and we may multiply arrays together as we would two numbers.

The overlap and add convolution algorithm, referred to as OaAconv in [1], uses the FFT to simplify the convolution process to a Hadamard product in the frequency domain, reducing the complexity to \mathcal{O}(N^2\log(n)). An additional overhead cost is incurred by transforming each image subset, then the inverse after the entry-wise product. While I’ve found resources that give one dimensional examples and figures in a number of places, very few have explained the two dimensional case in detail, including [1]. After a little research I came across this description, which clearly outlines the algorithm’s steps.

We first take the array A and pad it on each side with enough 0’s so that \text{row}\,(\Phi)\,|\,\text{row}\,(\tilde{A}) and \text{col}(\Phi)\,|\,\text{col}\,(\tilde{A}). Total padding along the x– and y-axes is given by

X_{\text{pad}}=\left(\text{row}\,(\Phi) - \text{row}\,(A) + \text{row}\,(\Phi)\lfloor\dfrac{\text{row}\,(A)}{\text{row}\,(\Phi)}\rfloor\right)\!\!\!\!\mod\text{row}\,(\Phi),

and Y_{\text{pad}}=\left(\text{col}\,(\Phi) - \text{col}\,(A) + \text{col}\,(\Phi)\lfloor\dfrac{\text{col}\,(A)}{\text{col}\,(\Phi)}\rfloor\right)\!\!\!\!\mod\text{col}\,(\Phi),

respectively. Using A above, we see visually that we would have to pad it with a single border of 0‘s all the way around. The equations above yield

X_{\text{pad}}=3-4+3\times\lfloor 4/3 \rfloor\!\!\!\mod 3


Y_{\text{pad}}=3-4+3\times\lfloor 4/3 \rfloor\!\!\!\mod 3


Then splitting both of these numbers in half, we obtain

X_{\text{pad left}}=X_{\text{pad}}//2=1,

X_{\text{pad right}}=X_{\text{pad}}-X_{\text{pad left}}=1,

X_{\text{pad top}}=X_{\text{pad}}//2=1,

X_{\text{pad bottom}}=X_{\text{pad}}-X_{\text{pad top}}=1,

which is the same as the single border of 0‘s we were expecting! Thus A becomes, again,


and I’ve colored the four sub-arrays that result from partitioning \tilde{A} into blocks the size of the kernel, \Phi. We’ll work with the upper two blocks from here on out.

The next step in this algorithm is to pad the arrays and kernel until both are twice the size of the kernel minus one in either direction. In this case that amounts to an additional border of 0‘s (since 2\,\text{col}\,(\Phi)-1=2\,\text{row}\,(\Phi)-1=5), yielding the arrays


and a kernel


Because I’d rather not analyze this portion symbolically (it’d be a rather confusing mess of work), I’ll simply state that we now compute the FFT of all three arrays; luckily, we do not have to transform the padded kernel more than once. Then multiply them together entry-wise and take the iFFT.


Because there is always some negligible residual imaginary component left over from the iFFT, I’ve taken the real part. Now to the algorithm in Python 😛

def InvertKernel2(kernel):
    """ Invert a kernel for an example """
    X, Y = kernel.shape
    # thanks to http://stackoverflow.com/a/38384551/3928184!
    new_kernel = np.full_like(kernel, 0)

    for i in xrange(X):
        for j in xrange(Y):
            n_i = (i + X // 2) % X
            n_j = (j + Y // 2) % Y
            new_kernel[n_i, n_j] = kernel[i, j]

    return new_kernel

def OAconv2(self):
    """ A threaded version of the former algorithm """
    self.__rangePX_, self.__rangePY_ = self.array.shape

    diffX = (self.__rangeKX_ - self.__rangePX_ +  \
             self.__rangeKX_ * (self.__rangePX_ //\
             self.__rangeKX_)) % self.__rangeKX_
    diffY = (self.__rangeKY_ - self.__rangePY_ +  \
             self.__rangeKY_ * (self.__rangePY_ //\
             self.__rangeKY_)) % self.__rangeKY_

    # padding on each side, i.e. left, right, top and bottom; 
    # centered as well as possible
    right = diffX // 2
    left = diffX - right
    bottom = diffY // 2
    top = diffY - bottom

    # pad the array
    self.array = np.lib.pad(self.array
        , ((left, right), (top, bottom))
        , mode='constant'
        , constant_values=0
    print self.__arr_.shape

    divX = int(self.array.shape[0] // self.__rangeKX_)
    divY = int(self.array.shape[1] // self.__rangeKY_)

    # a list of tuples to partition the array by
    subsets = [(i*self.__rangeKX_, (i + 1)*self.__rangeKX_,\
                j*self.__rangeKY_, (j + 1)*self.__rangeKY_)\
               for i in xrange(divX)\
               for j in xrange(divY)]

    # padding for individual blocks in the subsets list
    padX = self.__rangeKX_ // 2
    padY = self.__rangeKY_ // 2

    # Add. padding for __arr_ so it can store the results
    self.__arr_ = np.lib.pad(self.__arr_
        , ((left+padX+self.__offsetX_, right+padX+self.__offsetX_), 
           (top+padY+self.__offsetY_, bottom+padY+self.__offsetY_))
        , mode='constant', constant_values=0

    kernel = np.pad(self.kernel
        , [(padX, padX), (padY, padY)]
        , mode='constant'
        , constant_values=0

    # thanks to http://stackoverflow.com/a/38384551/3928184!
    # Invert the kernel
    new_kernel = self.InvertKernel2(kernel)
    transf_kernel = FFT(new_kernel)

    # transform each partition and OA on conv_image
    for tup in tqdm(subsets):
        # slice and pad the array subset
        subset = self.array[tup[0]:tup[1], tup[2]:tup[3]]
        subset = np.lib.pad(subset
            , [(padX, padX), (padY, padY)]
            , mode='constant'
            , constant_values=0

        transf_subset = FFT(subset)

        # multiply the two arrays entrywise
        space = iFFT(transf_subset * transf_kernel).real
        # overlap with indices and add them together
        self.__arr_[tup[0]:tup[1] + 2 * padX,\
                    tup[2]:tup[3] + 2 * padY] += space

    # crop image and get it back, convolved
    return self.__arr_[

I would like to split this process into multiple threads eventually, but due to the unexpected results with Numba above, I think I’ll wait until I’m able to resolve them. By superposition in the OA portion of this algorithm, it should be very straightforward.

This method’s time complexity is thrown off a little because NumPy isn’t written completely in Python. Also, fewer iterations are necessary as the kernels get larger, which makes it very fast for larger kernels.


The shape resembles 1/n^2; far more operations are performed inherently by NumPy than Python as the kernels get larger, and fewer iterations are necessary to cover the padded image array, as I’ve already mentioned.


An example OA convolved image with an 18×18 kernel.

I would also like to thank the user ead on SO for helping me troubleshoot a problem that stalled progress for several weeks at the beginning of the summer.

For a comparison with this method I’ll try convolving with scipy.ndimage.filters.convolve and scipy.signal.fftconvolve.


A comparison of various SciPy convolution algorithms against my Python implementation of OA convolution. Both fftconvolve and convolve are wicked fast!


[1] Very Efficient Training of Convolutional Neural Networks using Fast Fourier Transform and Overlap-and-Add, by T. Hylander and A. Rodriguez

Fitting PSF Models to Stellar Distributions in FITS Data

As an end-of-semester project I would like to investigate the mathematics of Gaussian and Moffat distributions, which are used to model stars in astronomical data. My inspiration for writing this post came from reading [1] and [5], where many interesting analytics are presented.

Specifically, in this post we’ll generate example PSFs and sources therefrom, extract stellar sources from images taken by the SDSS, and fit Gaussian and Moffat functions to each using a linear gradient descent (GD) algorithm.

The Gaussian Distribution

We will consider the two dimensional case of the bivariate normal distribution since image arrays are usually two dimensional arrays of pixels.


where the covariance matrix \Sigma will be defined as


so that


and thus \arg\exp\left(d_{M}^2\left(X,\mu\right)\right) becomes



Therefore, we arrive at a simplified distribution


which is basically a fixed-axis elliptical normal distribution. Furthermore, because the property is rather elegant, we see that



and by parameterizing R\begin{bmatrix}\cos\left(\theta\right)\\\sin\left(\theta\right)\end{bmatrix}+\mu\mapsto X, we obtain



Thence the former result is



and by yet another substitution one may simplify this integral drastically to


If \sigma_{1}=\sigma_{2}=1 (to obtain the “standard normal distribution”), then the volume of g_{2,c}=1. This is a widely understood identity that easily generalizes to higher dimensions (e.g. see [4] in my last post). On another mathematical note, in reference to [6], I might suspect the multivariate normal distribution to aid in determining the value of \Gamma_{m}\left(\frac{1}{2}\right), the multivariate gamma function [I will have to investigate this in a future post].

Because we will only be considering the circular Moffat function in the next section, here we will restrict this function even further by setting \sigma_{1}=\sigma_{2}=\sigma>0. This also means that


where I_{2} is the square identity matrix. If you’ll recall from statistics, when \Sigma=I we just obtain the Euclidean metric from d_{M}, so the entire equation simplifies to


where ||\cdot|| denotes the Euclidean norm.

Therefore, the only axes we must solve for in the gradient descent algorithm are translation, i.e. the centroid of a source, and the variance (which is related to the stability of the atmosphere). Feature scaling of the dataset will also be important, because the ADUs won’t be in the range \big[0,1\big].

The Moffat Distribution

The authors of [1] present the Gaussian as a limiting case of the more general Moffat distribution, introduced by none other than A. Moffat in 1969 [2]. The circular case is defined as


The authors, again, give a very interesting derivation in appendix A which fortifies their claim that


I would like to retrace their steps here. But first, as with the former distribution, we shall determine (or simply verify, as it turns out) the conditions for which the volume of m_{2} is unity. We see that



and by the same substitution we used last time, we may simplify this integral to






which works out quite nicely. Continuing on our plight to unravel their work in appendix A, we solve for the full width at half maximum (FWHM) of m_{2}, which may be visualized as the width of the fitted PSF halfway along its magnitude. Thus

\dfrac{1}{2}=\dfrac{\pi\alpha^2\,m_{2}(\,\vec{0},\mu;\alpha,\beta\,)}{\beta-1} (normalized amplitude)



\implies ||\mu||=\alpha\sqrt{2^{1/\beta}-1},


The extra factor of 2 came from the fact that we’re trying to find the width, whereas the penultimate equation yields the radius (so we multiply it by 2). Next, we eliminate \alpha in m_{2} by writing it in terms of \text{FWHM}_{m}, since


For this I obtain the expression


Now we can make the case that since


it would be perfectly fine to substitute \log\left(2\right)/\beta\mapsto 2^{1/\beta}-1 as


Hence, altogether we have


At this point it may appear ridiculously complex. But if you’ll recall the classic identity

\displaystyle\lim_{x\rightarrow\infty}\left(1+\dfrac{1}{x}\right)^{x}=e (c.f. [4])

we are able to obtain the exponential from the limit


Look familiar yet? Well, we’re off by a few constants; letting \text{FWHM}_{m}^{2}=8\sigma^2\log\left(2\right), as the authors suggest,



In the next section we will write scripts that ‘shift and shimmy’ the m_{2} and g_{2,c} distributions about until a reasonable convergence is obtained by the GD algorithm; moreover, we will compare the fits for both the Gaussian and Moffat distributions. We shall also test the authors’ claim that setting \beta\sim 100 in m_{2} should yield similar results to g_{2,c} (I don’t have any doubts).

Gradient Descent

The gradient descent algorithm is ubiquitous in machine learning. It was originally devised as a first order method to arrive at the local optima of a multivariate function by Isaac Newton through iteration. We will utilize this tool to optimize the parameters of m_{2} and g_{2,c} for various real-world examples of stellar distributions in astronomical data.

The first order algorithm may be written recursively as


where in this case, \text{cost} is defined as

\text{cost}(\vec{\theta}_{\kappa}; h,z)=\dfrac{1}{2P}\displaystyle\sum_{i=1}^{P}(h^{i}(\vec{\theta}_{\kappa})-z^{i})^{2}


(e.g. see 173 of [4]), and P=\prod_{j=1}^{\dim I_{s}}\text{NAXIS}_{j} is the total size of the image subset containing a star. We will be substituting g_{2,c} and m_{2} for h, and their parameter vectors \vec{\theta} will be \begin{bmatrix}A & \sigma & \mu_{1} & \mu_{2}\end{bmatrix}^{\text{T}} and \begin{bmatrix}A & \alpha & \beta & \mu_{1} & \mu_{2}\end{bmatrix}^{\text{T}}, respectively. I have added A to scale either function, which is necessary since scaling the data to [0,1] does not mitigate the possibility that the amplitude of the best fit is slightly greater than unity. Moreover,


so we may simplify the necessary computations slightly.

\eta_{\kappa} is the “learning rate” of the algorithm. In the past I’ve just set this to a constant value, but I’ve found that, in practice, a constant vector works a little better when one parameter may depend on others. This requires a little tuning to arrive at a decent (though not quite optimal) rate along each parameter. In the future I hope to implement a simple method to allow the algorithm to tune its own learning rates (I’m still reading about this in the literature).

To begin, let’s compute the 7 partial derivatives. I arrive at




for g_{2,c}, and




and \partial_{\mu_{i}}\,m_{2}=\dfrac{2\,m_{2}\,\beta\left(x_{i}-\mu_{i}\right)}{\alpha^{2}+||X-\mu||^{2}},

for m_{2}. I’ve written m_{2} and g_{2,c} in Python as follows.

def m2(X, theta):
    """ Moffat distribution """
    return theta[0]*(theta[2]-1.0)/(np.pi*theta[1]**2.0)*(1.0\

def g2(X, theta):
    """ Gaussian Distribution """
    return theta[0]/(2.0*np.pi*theta[1]**2.0)*np.exp(-((X[0]-\

As per the authors’ earlier suggestion, we might expect

\displaystyle\int_{\mathbb{R}^{n}}(m_{n}(X\,\mu;\alpha,100)-g_{n,c}(X;\mu,\Sigma))\,\mathrm{d}X\approx 0.

Using dimension-independent versions of the previous two functions (which only take an additional line of code each), I have computed the following results.

def mn(X, theta):
    """ n-dimensional Moffat distribution """
    return theta[0]*(theta[2] - 1.0)/(np.pi*theta[1]**2.0)\
        *(1.0 + sum((X[i] - theta[3:][i])**2.0 for i in \

def gn(X, theta):
    """ n-dimensional Gaussian distribution """
    return theta[0]/(((2.0*np.pi)**(len(X) / 2.0))*(theta[1]**\
        len(X)))*np.exp(-(1.0 / (2.0*(theta[1]**2.0)))\
        *sum((X[i]-theta[2:][i])**2.0 for i in xrange(len(X))))

I found that we would be adding O(n^d) time for dimension d
(viewing the integrals as nested loops), so I only attempted n≤4
as that seemed to be the max my laptop could handle in a reasonable
amount of time (about an hour).

>>> subset = [-3.0, 3.0]
>>> integrate.nquad(lambda x, y, z, t: gn([x, y, z, t], 
[1.0, 1.0, 0.0, 0.0, 0.0, 0.0]) - mn([x, y, z, t], 
[1.0, 1.0, 100.0, 0.0, 0.0, 0.0, 0.0]), 
[subset, subset, subset, subset])[0]

>>> integrate.nquad(lambda x, y, z: gn([x, y, z], 
[1.0, 1.0, 0.0, 0.0, 0.0]) - mn([x, y, z], 
[1.0, 1.0, 100.0, 0.0, 0.0, 0.0]), 
[subset, subset, subset])[0]

>>> integrate.nquad(lambda x, y: gn([x, y], 
[1.0, 1.0, 0.0, 0.0]) - mn([x, y], 
[1.0, 1.0, 100.0, 0.0, 0.0]), 
[subset, subset])[0]

I think the reason these don’t quite match the prediction is due to roundoff error, or because \beta=100 isn’t quite far enough; if you think I have actually made a mistake here, feel free to let me know in the comments. Because it isn’t necessary to write this program to be dimensionally independent (as much as I would enjoy the challenge), we’ll stick to the previous two functions.

The seven partial derivatives I’ve listed above may be written as follows.

from math import log
import numpy as np
from numpy.linalg import norm

### Gaussian

def PartialAg2(X, theta):
    return g2(X, theta) / theta[0]

def PartialSigmaXg2(X, theta):
    return g2(X, theta)*((norm(X-theta[2:])**2.0)/(theta[1]**3.0)-2.0/

def PartialMuXg2(X, theta, NAXIS=0):
    return (g2(X, theta)*(X[NAXIS]-theta[NAXIS + 2])) / (theta[1]**2.0)

### Moffat

def PartialAm2(X, theta):
    return m2(X, theta) / theta[0]

def PartialAlpham2(X, theta):
    return (2.0*m2(X,theta)/theta[1])*
        (theta[1]**2.0 + norm(X - theta[3:])**2.0) - 1.0)

def PartialBetam2(X, theta):
    return m2(X, theta)*(log((theta[1]**2.0) / 
        (theta[1]**2.0+norm(X-theta[3:])**2.0))+1.0 / (theta[2]-1.0))

def PartialMuXm2(X, theta, NAXIS=0):
    return (2.0*m2(X, theta)*theta[2]*(X[NAXIS]-theta[3+NAXIS]))/
        (theta[1]**2.0 + norm(X - theta[3:])**2.0)

### Also include the Gaussian and Moffat functions above

We may now piece together a class with methods for performing linear GD.

from kernel import *
from numpy.linalg import norm
from numpy.random import randn
import partials
import numpy as np
import sys

class GD(object):
    def __init__(self, array, form, derivatives, max_iterations=1000, 

        if (type(array).__module__ != np.__name__):
            # handle lists as well
            self.maximum = np.array(array).max()
            self.minimum = np.array(array).min()
            self.maximum = array.max()
            self.minimum = array.min()

        if (self.maximum != 1.0 or self.minimum != 0.0):
            # Map range of image array → [0, 1]
            self.array = np.array(map(lambda arr: 
                (arr-self.minimum)/(self.maximum-self.minimum), array))
            # probably not common in practice...
            self.array = np.array(array)

        if (self.dim(self.array) > 2):
            raise ValueError('Array must be dimension 2.')

        self.rangeX = self.array.shape[0]
        self.rangeY = self.array.shape[1]
        self.form = form
        self.max_iterations = max_iterations
        self.limit = limit
        self.derivs = derivatives

    def dim(arr):
        if (type(arr).__module__ != np.__name__):
            return len(np.array(arr).shape)
            return len(arr.shape)

    def expected(arr):
        if (type(arr).__module__ != np.__name__):
            return np.average(arr)
            return np.average(np.array(arr))

    def linear(self):
        """ perform linear GD using all derivatives """
        self.param_list = []

        # randomly generate initial parameter vector
        theta = np.abs(randn(len(self.derivs) + 1)) + 
            (((self.rangeX + self.rangeY) // 2) - 1) // 2
        theta[0] = 1.0

        # perform GD using all partial derivatives
        for _ in xrange(self.max_iterations):
            cost = self.__cost_(theta)
            theta += self.__gamma_(1.0)*cost


            #print theta

            if (not (0 < theta[-2] < self.rangeX) or 
                not (0 < theta[-1] < self.rangeY)):
            if (norm(cost) < self.limit):
            if (norm(cost) > 10e3):
                print "Values diverged."

        return theta, np.array(self.param_list)

    def __gamma_(self, magnitude=1.0):
        """ estimate the value of γ """
        if (len(self.derivs) + 1 == 4):
            # tuned for a decent Gaussian GD
            return -magnitude*np.array([1e3, 25.0, 2e2, 2e2])
        elif (len(self.derivs) + 1 == 5):
            # tuned for a decent Moffat GD
            return -magnitude*np.array([1e2, 1e1, 1e1, 1e1, 1e1])

    def __cost_(self, theta):
        """ compute the cost, Ω(mN^2) time. """
        total_cost = np.zeros(len(self.derivs) + 1)

        for i in xrange(self.rangeX):
            for j in xrange(self.rangeY):
                # get difference between model and array
                cost = self.form([i, j], theta) - self.array[i][j]

                # set up nabla
                nabla = np.zeros(len(self.derivs) + 1)

                # compute partials at each pixel
                for k in range(len(self.derivs) - 1):
                    nabla[k] = self.derivs[k]([i, j], theta)
                    # compute the last derivative twice, once for each
                    # axis. This may be collapsed into one line
                    # with a slight modification to the partial
                    # functions.
                    nabla[-1] = self.derivs[-1]([i, j], theta, NAXIS=1)
                    nabla[-2] = self.derivs[-1]([i, j], theta, NAXIS=0)

                # scale nabla by the difference
                total_cost += cost*nabla
            return total_cost / (self.rangeX*self.rangeY)

Most of the work in this class is done in the cost method, which builds \vec{\nabla}\text{cost} by iterating over the entire image subset and computing the partials at each pixel. Also note that I’ve written the code to work with lists as input too, so the class is a bit more flexible.

In the event that you don’t want to print out theta on every iteration (which fills the terminal), the following quick lines (or some variation thereof) may be substituted for that which is highlighted above.

if (len(self.derivs) + 1 == 4):
    sys.stdout.write('%ft%ft%ft%fr' % (theta[0], theta[1], 
                                           theta[2], theta[3]))
elif (len(self.derivs) + 1 == 5):
    sys.stdout.write('%ft%ft%ft%ft%fr' % 
                                          (theta[0], theta[1], 
                                           theta[2], theta[3], 

Now before we begin testing this algorithm on real data, let’s test it on a controlled set of examples with and without additive Gaussian noise. To generate these examples I have re-purposed the following class I wrote for UAV club (to convolve images).

import scipy.integrate as integrate
import numpy as np
import partials

class Kernel(object):
    """ creates kernels """
    def __init__(self, random_noise=False, noise_level=0.001):
        self.random_noise = random_noise
        if (random_noise):
            self.noise_level = noise_level

    def Kg2(self, X, Y, sigma, muX=0.0, muY=0.0):
        """ return a 2 dimensional Gaussian kernel """
        kernel = np.zeros([X, Y])
        theta = [1.0, sigma, muX, muY]
        for i in xrange(X):
            for j in xrange(Y):
                kernel[i][j] = integrate.dblquad(lambda x, y: 
                     partials.g2([x + float(i) - (X-1.0)/2.0, 
                         y + float(j) - (Y-1.0)/2.0], theta), 
                   -0.5, 0.5, lambda y: -0.5, lambda y: 0.5)[0]

        if (self.random_noise):
            kernel = kernel + 
                np.abs(np.random.randn(X, Y))*self.noise_level
            maximum = np.max(kernel)
            minimum = np.min(kernel)
            return map(lambda arr: 
                (arr-minimum)/(maximum-minimum), kernel)
            maximum = np.max(kernel)
            minimum = np.min(kernel)
            print maximum, minimum
            return map(lambda arr: 
                (arr-minimum)/(maximum-minimum), kernel)

    def FWHMg2(theta):
        """ get the FWHM of the Gaussian distribution """
        return 2*theta[3]*sqrt(log(1.0 / 

    def Km2(self, X, Y, alpha, beta, muX=0.0, muY=0.0):
        """ return a 2 dimensional Moffat kernel """
        kernel = np.zeros([X, Y])
        theta = [1.0, alpha, beta, muX, muY]
        for i in xrange(X):
            for j in xrange(Y):
                kernel[i][j] = integrate.dblquad(lambda x, y: 
                      partials.m2([x+ float(i) - (X-1.0)/2.0, 
                         y + float(j) - (Y-1.0)/2.0], theta), 
                   -0.5, 0.5, lambda y: -0.5, lambda y: 0.5)[0]

        if (self.random_noise):
            kernel = kernel + 
                np.abs(np.random.randn(X, Y))*self.noise_level
            maximum = np.max(kernel)
            minimum = np.min(kernel)
            return map(lambda arr: 
                (arr-minimum)/(maximum-minimum), kernel)
            maximum = np.max(kernel)
            minimum = np.min(kernel)
            return map(lambda arr: 
                (arr-minimum)/(maximum-minimum), kernel)

    def FWHMm2(theta):
        """ get the FWHM of the Moffat distribution """
        return 2*theta[0]*sqrt(pow(2, 1 / theta[1]) - 1.0)

Now that we know what the accepted values should be, we can truly test the results of the algorithm. Below is a sample Moffat distribution, generated by the above script.


An example Moffat distribution with additive noise N=0.005, and parameters \alpha=3.5, \beta=4.5.

And another without background noise for visual comparison to that above.


Moffat without additive noise, same parameters as the last.

We may create example PSFs as follows.

>>> from kernel import *
>>> PSF = Kernel(random_noise=False, noise_level=0.01)
>>> example = PSF.Kg2(15, 15, sigma=1.75, muX=0.0, muY = 0.0)

And perform a fit:

>>> import partials
>>> from GD import *
>>> fit = GD(image1
...     , form=partials.g2 
...     , derivatives=[partials.PartialAg2
...         , partials.PartialSigmaXg2
...         , partials.PartialMuXg2]
...     , max_iterations=1000
...     , limit=1e-5
... )
>>> theta, progress = fit.linear()

I could have written the code so the partials are inherent to the class, but I thought it better for the future (in case we wanted to try a different dirstribution) to just submit each partial, as well as the related function. Also note that I’ve passed partials.PartialMuXg2 last – for this class it’s important that the \partial_{\mu_{i}}‘s be in the last position.

I get a convergence for g_{2,c} in ~500 iterations using the above pre-defined rate vector, and ~1200 for m_{2}. Again, these rates aren’t optimal, so it doesn’t make sense to compare. Below are graphs showing the convergence of either function to a generated g_{2,c} with parameters \sigma=1.75, \mu_{1}=7, \mu_{2}=7 (and of course it’s rescaled in kernel.py to [0,1]).


Convergence of the Moffat function to a Gaussian profile. What’s interesting is that \beta (red) converges to a relatively high value, so the function is closer in shape to a Gaussian. In this plot Blue=A, Red=\beta, Purple, Cyan=x_{1}, x_{2} and Green=\alpha.


Convergence of the Gaussian function to a Gaussian profile. Because I sought to initialize \mu_{1} and \mu_{2} (two indistinguishable lines about 7.0) as close to center as possible, they converge extremely fast, whereas A (blue) and \sigma (green) take a few more iterations.

Now for a bunch of noise to see how well the algorithm copes. The centroid shouldn’t vary much, but I’m expecting A, \sigma, \alpha and \beta to change quite a bit.


I didn’t get a decent convergence from m_{2} inside of 5000 iterations. Setting the rates faster caused a bit of chaos (e.g. see the last section). I’m assuming that a larger array may have been necessary, as in the fits with SDSS data (below), which worked fine.


Convergence of g_{2,c} occurred in about the same number of iterations, probably indicative that it’s a more robust model.

Now that we have definitive results showing that the algorithm has the ability to converge to the approximate expected values, we’ll try it on real data.

Working with FITS Images and Testing on Real Data

We’ll try fitting these two models to single stars in the latest SDSS3 data release, DR9. You can try this on any data you like, however.

Specifically, I have downloaded frame 15 from the G-band. If you would like to use these same data, you may download and uncompress the image as follows.

wget http://data.sdss3.org/sas/dr9/boss/photoObj/frames/301/8158/6/frame-g-008158-6-0015.fits.bz2
bzip2 -dk frame-g-008158-6-0015.fits.bz2

We can open these images with PyFITS in the following function.

import pyfits

def get_fits_array(image_name, axis=0, info=False):
    """ Get the data from an image along a specified axis. """
        image = pyfits.open(image_name, mode='readonly')
        if (info):
            print image.info()
        return image[axis].data

This returns a NumPy array containing the image data. Extracting stars and objects from the frame is a huge statistical problem in itself, but here I’m just going to take the array and crop it nicely about a star that I’ve physically located. By the way, if you’re not already familiar, I’ve found that SExtractor is extremely simple to use for this kind of work too.

>>> image = get_fits_array('frame-g-008158-6-0015.fits')
>>> image = image[1046:1086,530:570]

We can also plot this image and view it as follows.

# be careful logging - there may be -'s or 0's
image[np.where(image < 0.0)] = 0.01
plt.imshow(np.log10(image), interpolation='none', cmap='gray')



A single star in the g-band image. I’ve purposely made it off-center as well.

The \text{cost} function required quite a bit of time to compute in either case because the subset above is now 1600 pixels instead of a mere 225.


The values the algorithm arrived at are A=27.06944038, \sigma=2.11533306, x_{1}=20.71500907, and x_{2}=20.69633224.

The Moffat function didn’t take nearly as long to converge as I was expecting, and its parameters do some interesting things too. The only bottleneck is the extra partial derivative it has to compute at each pixel, which becomes more noticeable as the array becomes larger (not to mention the fact that its derivatives are a bit more complex).


GD with m_{2} arrived at A=27.28057256, \alpha=13.05963548, \beta=20.17842481, 20.71206149, 20.69690351.

Parameters such as A, x_{1} and x_{2} compare favorably with those of the Gaussian. It also appears that the Moffat function becomes far closer to normal during the descent, before very slowly decreasing \beta again. Altogether I do not think this is a good method for solving for \beta as it’s a “weaker” variable, only having an impact after the distribution is centered and A is closer to its optimal value.


I mentioned above that I would like to implement a method that determines a decent “learning rate” for each parameter. It’s rather easy to drive the parameters too fast, causing unstable results like that below.


GD that’s driven too hard makes for chaotic results.

I’m also interested in pursuing one of the many second order GD methods, but again I am still reading the literature and working out the details (especially the computation of the Hessian).

If you can think of any ways to improve this code feel free to let me know in the comments!


[1] The effects of seeing on Seeing on Sérsic profiles – II. The Moffat PSF, by Trujillo et. al. (2001)

[2] A Theoretical Investigation of Focal Stellar Images in the Photographic Emulsion and Application to Photographic Photometry, by A. Moffat (1969)

[3] e as the limit of \left(1+\frac{1}{n}\right)^{n}, by D. Joyce (click here for the .pdf)

[4] Stochastic Optimization, by J. Spall, Handbook of Computational Statistics p. 173

[5] Common-Resolution Convolution Kernels for Space- and Ground-Based Telescopes, by G. Aniano et. al.

[6] Advanced Calculus, D. Widder (1898), p. 371

The RX Algorithm

Over the past few months I have been writing programs to perform basic computer vision tasks for a drone we built in UAV club. To be blunt, I learned a lot – both about what I’m capable of, and what it means to work on a small team.

Before I explain the algorithm and share the code I wrote, I’ll outline our goals. They were to

  1. obtain usable imagery despite any vibrations and attitude variations of the quadcopter,
  2. search a complete frame for anomalies (colored targets),
  3. and classify any anomalies.

My teammate worked a lot on algorithms and methods related to #3, while I was slightly more interested in #’s 1 & 2. I saw the goals as hierarchical; namely that a) we probably cannot classify anomalies in the data if we have no means to locate them in entire frames, and b) we cannot search the data for anomalies if the data are not clean and unblurred.

In this post I will talk briefly about #2.

After a lot of research I eventually found several papers on anomaly detection, including [1], [2] and [3]. I also went to the library and found a book on color science – a field for which I now have the utmost of respect, because the mathematics involved with color matching and other concepts is quite elegant. Moreover, in each of aforementioned papers the authors cover the RX algorithm, so I thought I’d give it a try.

Consider the multivariate Gaussian distribution

g_{n}\left(X; \mu,\Sigma\right)=\dfrac{1}{\left(2\pi\right)^{n/2}|\Sigma|^{1/2}}\exp\left[-\dfrac{1}{2}\left(X-\mu\right)^{\text{T}}\Sigma^{-1}\left(X-\mu\right)\right],

where |\cdot| denotes the determinant and \Sigma is the covariance matrix. There are a lot of interesting properties of this function that you may read about in [4]. For the RX algorithm we are, for the most part, only interested in the argument of the exponential function


known as the Mahalanobis distance. It is basically a measure of how many standard deviations X, a spectral vector in the RGB tristimulus space, lies from the mean spectral vector mu. For the RX algorithm we hope to compute this distance metric for all pixels in an image (which we may visualize as vectors in the space \text{RGB}=\left[0,255\right]^{3}).

This operation’s complexity is similar to that of convolution (it will be bound below by \Omega\left(N^2\right) at least), but there are a few steps we may take to reduce the complexity of the problem. For instance, to compute the covariance matrix \Sigma for each image it’s not necessary to use every single pixel. In [3] the authors state that \Sigma “is the estimated background covariance matrix.” By extracting a random sample of the image pixels (in practice I found that only ~10^4 worked well, but as few as 10^3 yielded reasonable results) this matrix can be estimated quickly.

See the below figure for an actual comparison (the Frobenius norm of an average of 100 runs for each extraction sample size, ranging from 10^{2}10^{5}).


This process assumes that the background (the ground) is the dominant feature in a given image, and from a high vantage point this is likely the case.

Yet another simplification we can make is from the properties of the matrix \Sigma. It is commonly defined as



since \Sigma is symmetric. We may use this definition so that we only have to compute p\left(p+1\right)/2 values instead of every entry, where p is the dimension of the original image.

Now to compute d_{M} I’ve written the following class, which utilizes many vector and matrix operations available in NumPy.

from math import sqrt
import numpy as np

class dM(object):
    def __init__(self, array):
        self.array = array
        self.Xpix = array.shape[0]
        self.Ypix = array.shape[1]

    def Mahalanobis(self):
        """ Compute the Mahalanobis distance to every point in RGB
        space within the image """
        mean_vector = np.mean(self.array, axis=(0, 1))
        variance_covariance = self.variance_covariance()

        # iterate over the original image and store dM in this new array
        distances = np.zeros([self.Xpix, self.Ypix])
        for i in range(self.Xpix):
            for j in range(self.Ypix):
                distances[i][j] = sqrt(np.dot(np.dot(np.transpose(
                self.array[i][j] - mean_vector), variance_covariance), 
                self.array[i][j] - mean_vector))

        return distances

    def variance_covariance(self, number=10000):
        """ create a variance-covariance matrix """
        reshaped_array = self.array.reshape((self.Xpix * self.Ypix, 3))

        # sample the above array to reduce computing time
        choices = np.random.randint(0, len(reshaped_array), number)
        reshaped_array = np.array([reshaped_array[i] for i in choices])
        average = np.mean(reshaped_array)

        # compute the variance-covariance matrix for these RGB data
        matrix = np.array(sum([np.outer(np.array([reshaped_array[i] - 
           average]), np.array(reshaped_array[i] - average)) for i in 
            range(len(reshaped_array))]) / len(reshaped_array))

        return np.linalg.inv(matrix)

This code yields the following results upon a sample image.


Artifacts are due to the image compression and not the algorithm itself. As we can see, the objects that are “different” show up far brighter (like the pink target at upper-right), whereas the background, viz. grass and runway, are much darker.


I would like to try writing this algorithm in pure C to get a sense of how expensive it actually is to compute – as of late I’ve felt quite distant from the “happenings” while using Python.


[1] Anomaly Detection based on a parallel kernel RX algorithm for multicore platforms, Molero, J. et. al.

[2] Is there a best hyperspectral detection algorithm?, Manolakis, D. et. al.

[3] Anomaly Detection for Hypaerspectral Imagery Using Analytical Fusion and RX, Zhang, G. and Yang, C.

[4] The Multivariate Gaussian Distribution, by Do, C.