Find the correct phrase

I was inspired by a post on randomgooby about a question posed by Trustpilot, which may be seen here. The author provides a great solution in Go, but I wanted to try it with Python (also Scala, but I have a lot more reading to do about this language). I’ve honestly not played much with encryption or problem solving of this type, but it was a lot of fun to see my program produce the correct output!

From the outset I applied the author’s grep command to the wordlist because I couldn’t seem to find a better regex. As the author mentions, this step is really important, because if you start with the total 99,175 words, you’ll have to check upwards of 99,1753=975,453,625,984,375 combinations. On the other hand, filtering this list to only 1659 words yields around 4,566,034,179 combinations, which is more than within reach of an hour or two of searching.

Here’s a straightforward solution in Python in just 30 lines of code.

#! /usr/bin/env python3
# -*- coding: utf-8 -*-

""" Determine the secret phrase for

from hashlib import md5
import sys

def obtainMatchingPhrase(fileName: str ='wordlist_clean') -> None:
    """ Print phrase matching """

    _ANAGRAM = 'poultry outwits ants'
    _MD5	 = '4624d200580677270a54ccff86b9610e'

    with open(fileName, 'r') as words:
        # we need all the words in memory in this example
        rwords =

    total_words = len(rwords)
    ttl = 0

    print('Anagram : ' + _ANAGRAM)
    print('md5     : ' + _MD5)

    # iterate over all possible combinations
    for w1 in rwords:
        # write progress percentage to output
        sys.stdout.write("Progress: %0.4f%%\r" % (100 * ttl / total_words**3) )
        for w2 in rwords:
            for w3 in rwords:
                comb = w1 + ' ' + w2 + ' ' + w3
                comb2 = comb.encode('utf-8')
                hashed = md5(comb2).hexdigest()

                if (hashed == _MD5):
                    print('md5(\'' + comb + '\')' + ' == md5')
                    print('Tested %.4f of total possible combinations' % \
                            (100 * float(ttl) / total_words ** 3))
                    ttl += 1
        print('No phrase found for md5 ■')

if __name__ == '__main__':

And the output upon locating the correct phrase:

Anagram : poultry outwits ants
md5     : 4624d200580677270a54ccff86b9610e
md5('pastils turnout towy') == md5
Tested 27.5991% of total possible combinations

I tried a bunch of other methods, such as enumerate(itertools.product(*([rwords]*3)) to form a single loop, etc., but the iteration speed dropped low enough to add an additional 2 hours to the search time. In short, it would seem that simpler is better when it comes to Python and problems that require a lot of speed. I would also like to point out that I used a return statement to end all three nested loops, as suggested on SO, as opposed to a bunch of continue and break‘s. Because the function is essentially void, it works quite well in this situation.

I’ll be on the lookout for other problems of this type, it was a lot of fun!

Posted in Mathematics, Programming, Python | Tagged , , | 1 Comment

Quick recipe for a bibliography with BibTeX

I like typing math homework in \LaTeX if I have the time, and lately I’ve wanted to find an easier way to cite sources and create a bibliography at the end of a paper, should I use any.

One way I’ve probably used in the past is to just list out all the information as it’s bound to be displayed in the document.

G.A.F. Seber and C. J. Wild.
\textit{Nonlinear Regression}. 
Wiley Series in Probability and Mathematical 
Statistics: Probability and Mathematical Statistics. 
John Wiley & Sons, Inc., New York, 1989.

But this is tedious.

A far quicker and easier way is to use BibTeX, which was hardly straightforward at first (why I’m writing this quick little tutorial for my future self and others).

First, go look look up your text on MathSciNet. We’ll look up the same text I’ve listed above. At the top we have the option to get BibTeX information from the drop down menu:

@book {MR986070,
    AUTHOR = {Seber, G. A. F. and Wild, C. J.},
     TITLE = {Nonlinear regression},
    SERIES = {Wiley Series in Probability and Mathematical Statistics:
              Probability and Mathematical Statistics},
 PUBLISHER = {John Wiley \& Sons, Inc., New York},
      YEAR = {1989},
     PAGES = {xxii+768},
      ISBN = {0-471-61760-1},
   MRCLASS = {62-02 (62J02)},
  MRNUMBER = {986070},
MRREVIEWER = {Roy T. Saint Laurent},
       DOI = {10.1002/0471725315},
       URL = {},

Now copy/paste this information into a file *.bib (you may name it whatever you’d like). At the bottom of your \TeX document before \end{document} put the following lines:

\bibliographystyle{plain} % adjust styling here
\bibliography{<filename of your *.bib file>} % Don't include .bib ext

There are many different styles you can choose from, I’ve just kept it basic.

Now we recompile using pdflatex and bibtex – I’ve written the following simple bash script to automate the process if you’re in the same directory as the file.

#! /bin/bash
# Compile TeX documents with *.bib sources
# Append : `complete -g '*.tex'` to your .bashrc
#     for automatic selection of the file ending upon <tab>ing

if (( $# != 1 )); then 
    echo "Please provide filename." 2>&1
    exit 1

# the file we wish to work with

if ! [ -e $FILE ]; then
    echo "** File $1 requested does not exist." 2>&1
    exit 1

# Compile using latex and bibtex
pdflatex ${FILE%.*}
bibtex ${FILE%.*}
pdflatex ${FILE%.*}
pdflatex ${FILE%.*}

echo "** Compiled $FILE successfully"

# View the file with evince
if [ -e "${FILE%.*}.pdf" ]; then
    evince "${FILE%.*}.pdf" &
    echo "File ${FILE%.*}.pdf does not exist"
    exit 1

exit 0

I think I also saw it in one of the Stack Exchange forums where someone suggested Python, so I wrote the following little script too.

#! /usr/bin/env python2
# -*- coding: utf-8 -*-

""" Compile a *.tex file """

from __future__ import print_function

import os
import subprocess
import sys
import argparse

class ArgumentError(Exception): pass

def main(arguments):
    # Loop over all files in arguments.inputfile
    for name in arguments.inputfile:
        if (not os.path.isfile(name)):
            raise ArgumentError('** File {0} does not exist'\
        elif (not name.split('.')[-1] == 'tex'):
            raise ArgumentError('** File {0} is not a TeX file'\
        # If there's a path to the file involved...
        DIRECTORY = '/'.join(name.split('/')[:-1]) + '/' \
            if len('/'.join(name.split('/')[:-1])) == len(os.getcwd()) \
            else ''

        # The filename may have multiple .'s
        FILE = ''.join(name[len(DIRECTORY):].split('.')[:-1])

        # Compile
        if (arguments.bibtex):
            if (len(DIRECTORY) == 0):
                # Don't call -aux_directory or -output-directory
      ['pdflatex', FILE])

                if (arguments.bibtex):
          ['bibtex', FILE])
          ['pdflatex', FILE])
          ['pdflatex', FILE])
                        'File {0}.bib does not exist, normal compile'
                # Ensure all the extra files end up in same directory
                    '-aux_directory=' + DIRECTORY, 
                    '-output-directory=' + DIRECTORY,
                    DIRECTORY + FILE

                if (os.path.isfile(DIRECTORY + FILE + '.bib')):
          ['bibtex', DIRECTORY + FILE]) 

                        '-aux_directory=' + DIRECTORY, 
                        '-output-directory=' + DIRECTORY,
                        DIRECTORY + FILE
                        '-aux_directory=' + DIRECTORY, 
                        '-output-directory=' + DIRECTORY,
                        DIRECTORY + FILE
            if (len(DIRECTORY) == 0):
      ['latex', FILE])
      ['pdflatex', FILE])
      ['latex', FILE])
                    '-aux_directory=' + DIRECTORY,
                    '-output-directory=' + DIRECTORY,
                    DIRECTORY + FILE

        # Display
        if (arguments.display and not arguments.convertdvi):
            if (os.path.isfile(DIRECTORY + FILE + '.pdf')):
                    DIRECTORY + FILE + '.pdf',
                raise \
                        'File {0} does not exist'.format(\
                            DIRECTORY + FILE + '.pdf'
        elif (arguments.display and arguments.convertdvi):
            # Convert .dvi to .ps 
            print('** Converting to dvi and displaying ps')
            if (len(DIRECTORY) == 0):
                    'latex', DIRECTORY + FILE
                    '-aux_directory=' + DIRECTORY,
                    '-output-directory=' + DIRECTORY,
                    DIRECTORY + FILE
            if (os.path.isfile(DIRECTORY + FILE + '.dvi')):
                print('** Displaying using dvips')
      ['dvips', DIRECTORY + FILE + '.dvi', 
                    '-o', DIRECTORY + FILE + '.ps'])
                    DIRECTORY + FILE + '.ps',
                print('** Could not locate file ' + \
                    DIRECTORY + FILE + '.dvi')
            # Don't display anything

        if (arguments.delete):
            # Delete files that don't have *.tex or *.pdf extension

            # Simplify perhaps ?
            files = filter(lambda f: FILE in f, os.listdir(os.getcwd()))
            for fl in files:
                if (arguments.convertdvi):
                    if (not any(map(\
                            lambda ext: ext == fl.split('.')[-1],
                            ['tex', 'pdf', 'ps', 'dvi']
              ['rm', fl])
                        print('** Removed {0}'.format(fl))
                    if (not any(map(\
                            lambda ext: ext == fl.split('.')[-1],
                            ['tex', 'pdf']
                        # delete file
              ['rm', fl])
                        print('** Removed {0}'.format(fl))

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Compile a *.tex file.')

    # input file name(s)
    parser.add_argument('inputfile', nargs='+', type=str, \
      help='input file name')

    # compile with bibtex as well
    parser.add_argument('-bib', '--bibtex', action='store_true', \
      help='compile with bibtex as well')

    # convert *.dvi to *.ps by compiling again with latex
    parser.add_argument('-cdvi', '--convertdvi', action='store_true', \
      help='compile document again with latex and convert *.dvi to *.ps')

    # display the *.pdf or, if *.ps, display that
    parser.add_argument('-d', '--display', action='store_true', \
      help='display *.pdf')

    # delete unnecessary files
    parser.add_argument('-dlt', '--delete', action='store_true', \
      help='delete additional files that are created')
    args = parser.parse_args()

Because I’m more comfortable with Python than Bash, I tried to make the script pretty flexible. Running either of these produces the results we were expecting on a sample document:


An example compiled source.


Now all you have to do is include `\cite{MR986070}` in the spots you wish to reference [1] (and I wish it were this simple in WordPress too :P). I’m sure there are other nifty tricks and details I’ve overlooked or haven’t come across yet, but I think this is the gist of it.

In the code I uploaded to GitHub for future reference I’ve also included a bunch of if-statements to remove files that clutter the directory, but these are of course optional.

Posted in Bash, LaTeX, Programming, Python | Tagged , | Leave a comment

Fitting PSF Models to Stellar Distributions in FITS Data

As an end-of-semester project I would like to investigate the mathematics of Gaussian and Moffat distributions, which are used to model stars in astronomical data. My inspiration for writing this post came from reading [1] and [5], where many interesting analytics are presented.

Specifically, in this post we’ll generate example PSFs and sources therefrom, extract stellar sources from images taken by the SDSS, and fit Gaussian and Moffat functions to each using a linear gradient descent (GD) algorithm.

The Gaussian Distribution

We will consider the two dimensional case of the bivariate normal distribution since image arrays are usually two dimensional arrays of pixels.


where the covariance matrix \Sigma will be defined as


so that


and thus \arg\exp\left(d_{M}^2\left(X,\mu\right)\right) becomes



Therefore, we arrive at a simplified distribution


which is basically a fixed-axis elliptical normal distribution. Furthermore, because the property is rather elegant, we see that



and by parameterizing R\begin{bmatrix}\cos\left(\theta\right)\\\sin\left(\theta\right)\end{bmatrix}+\mu\mapsto X, we obtain



Thence the former result is



and by yet another substitution one may simplify this integral drastically to


If \sigma_{1}=\sigma_{2}=1 (to obtain the “standard normal distribution”), then the volume of g_{2,c}=1. This is a widely understood identity that easily generalizes to higher dimensions (e.g. see [4] in my last post). On another mathematical note, in reference to [6], I might suspect the multivariate normal distribution to aid in determining the value of \Gamma_{m}\left(\frac{1}{2}\right), the multivariate gamma function [I will have to investigate this in a future post].

Because we will only be considering the circular Moffat function in the next section, here we will restrict this function even further by setting \sigma_{1}=\sigma_{2}=\sigma>0. This also means that


where I_{2} is the square identity matrix. If you’ll recall from statistics, when \Sigma=I we just obtain the Euclidean metric from d_{M}, so the entire equation simplifies to


where ||\cdot|| denotes the Euclidean norm.

Therefore, the only axes we must solve for in the gradient descent algorithm are translation, i.e. the centroid of a source, and the variance (which is related to the stability of the atmosphere). Feature scaling of the dataset will also be important, because the ADUs won’t be in the range \big[0,1\big].

The Moffat Distribution

The authors of [1] present the Gaussian as a limiting case of the more general Moffat distribution, introduced by none other than A. Moffat in 1969 [2]. The circular case is defined as


The authors, again, give a very interesting derivation in appendix A which fortifies their claim that


I would like to retrace their steps here. But first, as with the former distribution, we shall determine (or simply verify, as it turns out) the conditions for which the volume of m_{2} is unity. We see that



and by the same substitution we used last time, we may simplify this integral to






which works out quite nicely. Continuing on our plight to unravel their work in appendix A, we solve for the full width at half maximum (FWHM) of m_{2}, which may be visualized as the width of the fitted PSF halfway along its magnitude. Thus

\dfrac{1}{2}=\dfrac{\pi\alpha^2\,m_{2}(\,\vec{0},\mu;\alpha,\beta\,)}{\beta-1} (normalized amplitude)



\implies ||\mu||=\alpha\sqrt{2^{1/\beta}-1},


The extra factor of 2 came from the fact that we’re trying to find the width, whereas the penultimate equation yields the radius (so we multiply it by 2). Next, we eliminate \alpha in m_{2} by writing it in terms of \text{FWHM}_{m}, since


For this I obtain the expression


Now we can make the case that since


it would be perfectly fine to substitute \log\left(2\right)/\beta\mapsto 2^{1/\beta}-1 as


Hence, altogether we have


At this point it may appear ridiculously complex. But if you’ll recall the classic identity

\displaystyle\lim_{x\rightarrow\infty}\left(1+\dfrac{1}{x}\right)^{x}=e (c.f. [4])

we are able to obtain the exponential from the limit


Look familiar yet? Well, we’re off by a few constants; letting \text{FWHM}_{m}^{2}=8\sigma^2\log\left(2\right), as the authors suggest,



In the next section we will write scripts that ‘shift and shimmy’ the m_{2} and g_{2,c} distributions about until a reasonable convergence is obtained by the GD algorithm; moreover, we will compare the fits for both the Gaussian and Moffat distributions. We shall also test the authors’ claim that setting \beta\sim 100 in m_{2} should yield similar results to g_{2,c} (I don’t have any doubts).

Gradient Descent

The gradient descent algorithm is ubiquitous in machine learning. It was originally devised as a first order method to arrive at the local optima of a multivariate function by Isaac Newton through iteration. We will utilize this tool to optimize the parameters of m_{2} and g_{2,c} for various real-world examples of stellar distributions in astronomical data.

The first order algorithm may be written recursively as


where in this case, \text{cost} is defined as

\text{cost}(\vec{\theta}_{\kappa}; h,z)=\dfrac{1}{2P}\displaystyle\sum_{i=1}^{P}(h^{i}(\vec{\theta}_{\kappa})-z^{i})^{2}


(e.g. see 173 of [4]), and P=\prod_{j=1}^{\dim I_{s}}\text{NAXIS}_{j} is the total size of the image subset containing a star. We will be substituting g_{2,c} and m_{2} for h, and their parameter vectors \vec{\theta} will be \begin{bmatrix}A & \sigma & \mu_{1} & \mu_{2}\end{bmatrix}^{\text{T}} and \begin{bmatrix}A & \alpha & \beta & \mu_{1} & \mu_{2}\end{bmatrix}^{\text{T}}, respectively. I have added A to scale either function, which is necessary since scaling the data to [0,1] does not mitigate the possibility that the amplitude of the best fit is slightly greater than unity. Moreover,


so we may simplify the necessary computations slightly.

\eta_{\kappa} is the “learning rate” of the algorithm. In the past I’ve just set this to a constant value, but I’ve found that, in practice, a constant vector works a little better when one parameter may depend on others. This requires a little tuning to arrive at a decent (though not quite optimal) rate along each parameter. In the future I hope to implement a simple method to allow the algorithm to tune its own learning rates (I’m still reading about this in the literature).

To begin, let’s compute the 7 partial derivatives. I arrive at




for g_{2,c}, and




and \partial_{\mu_{i}}\,m_{2}=\dfrac{2\,m_{2}\,\beta\left(x_{i}-\mu_{i}\right)}{\alpha^{2}+||X-\mu||^{2}},

for m_{2}. I’ve written m_{2} and g_{2,c} in Python as follows.

def m2(X, theta):
    """ Moffat distribution """
    return theta[0]*(theta[2]-1.0)/(np.pi*theta[1]**2.0)*(1.0\

def g2(X, theta):
    """ Gaussian Distribution """
    return theta[0]/(2.0*np.pi*theta[1]**2.0)*np.exp(-((X[0]-\

As per the authors’ earlier suggestion, we might expect

\displaystyle\int_{\mathbb{R}^{n}}(m_{n}(X\,\mu;\alpha,100)-g_{n,c}(X;\mu,\Sigma))\,\mathrm{d}X\approx 0.

Using dimension-independent versions of the previous two functions (which only take an additional line of code each), I have computed the following results.

def mn(X, theta):
    """ n-dimensional Moffat distribution """
    return theta[0]*(theta[2] - 1.0)/(np.pi*theta[1]**2.0)\
        *(1.0 + sum((X[i] - theta[3:][i])**2.0 for i in \

def gn(X, theta):
    """ n-dimensional Gaussian distribution """
    return theta[0]/(((2.0*np.pi)**(len(X) / 2.0))*(theta[1]**\
        len(X)))*np.exp(-(1.0 / (2.0*(theta[1]**2.0)))\
        *sum((X[i]-theta[2:][i])**2.0 for i in xrange(len(X))))

I found that we would be adding O(n^d) time for dimension d
(viewing the integrals as nested loops), so I only attempted n≤4
as that seemed to be the max my laptop could handle in a reasonable
amount of time (about an hour).

>>> subset = [-3.0, 3.0]
>>> integrate.nquad(lambda x, y, z, t: gn([x, y, z, t], 
[1.0, 1.0, 0.0, 0.0, 0.0, 0.0]) - mn([x, y, z, t], 
[1.0, 1.0, 100.0, 0.0, 0.0, 0.0, 0.0]), 
[subset, subset, subset, subset])[0]

>>> integrate.nquad(lambda x, y, z: gn([x, y, z], 
[1.0, 1.0, 0.0, 0.0, 0.0]) - mn([x, y, z], 
[1.0, 1.0, 100.0, 0.0, 0.0, 0.0]), 
[subset, subset, subset])[0]

>>> integrate.nquad(lambda x, y: gn([x, y], 
[1.0, 1.0, 0.0, 0.0]) - mn([x, y], 
[1.0, 1.0, 100.0, 0.0, 0.0]), 
[subset, subset])[0]

I think the reason these don’t quite match the prediction is due to roundoff error, or because \beta=100 isn’t quite far enough; if you think I have actually made a mistake here, feel free to let me know in the comments. Because it isn’t necessary to write this program to be dimensionally independent (as much as I would enjoy the challenge), we’ll stick to the previous two functions.

The seven partial derivatives I’ve listed above may be written as follows.

from math import log
import numpy as np
from numpy.linalg import norm

### Gaussian

def PartialAg2(X, theta):
    return g2(X, theta) / theta[0]

def PartialSigmaXg2(X, theta):
    return g2(X, theta)*((norm(X-theta[2:])**2.0)/(theta[1]**3.0)-2.0/

def PartialMuXg2(X, theta, NAXIS=0):
    return (g2(X, theta)*(X[NAXIS]-theta[NAXIS + 2])) / (theta[1]**2.0)

### Moffat

def PartialAm2(X, theta):
    return m2(X, theta) / theta[0]

def PartialAlpham2(X, theta):
    return (2.0*m2(X,theta)/theta[1])*
        (theta[1]**2.0 + norm(X - theta[3:])**2.0) - 1.0)

def PartialBetam2(X, theta):
    return m2(X, theta)*(log((theta[1]**2.0) / 
        (theta[1]**2.0+norm(X-theta[3:])**2.0))+1.0 / (theta[2]-1.0))

def PartialMuXm2(X, theta, NAXIS=0):
    return (2.0*m2(X, theta)*theta[2]*(X[NAXIS]-theta[3+NAXIS]))/
        (theta[1]**2.0 + norm(X - theta[3:])**2.0)

### Also include the Gaussian and Moffat functions above

We may now piece together a class with methods for performing linear GD.

from kernel import *
from numpy.linalg import norm
from numpy.random import randn
import partials
import numpy as np
import sys

class GD(object):
    def __init__(self, array, form, derivatives, max_iterations=1000, 

        if (type(array).__module__ != np.__name__):
            # handle lists as well
            self.maximum = np.array(array).max()
            self.minimum = np.array(array).min()
            self.maximum = array.max()
            self.minimum = array.min()

        if (self.maximum != 1.0 or self.minimum != 0.0):
            # Map range of image array → [0, 1]
            self.array = np.array(map(lambda arr: 
                (arr-self.minimum)/(self.maximum-self.minimum), array))
            # probably not common in practice...
            self.array = np.array(array)

        if (self.dim(self.array) > 2):
            raise ValueError('Array must be dimension 2.')

        self.rangeX = self.array.shape[0]
        self.rangeY = self.array.shape[1]
        self.form = form
        self.max_iterations = max_iterations
        self.limit = limit
        self.derivs = derivatives

    def dim(arr):
        if (type(arr).__module__ != np.__name__):
            return len(np.array(arr).shape)
            return len(arr.shape)

    def expected(arr):
        if (type(arr).__module__ != np.__name__):
            return np.average(arr)
            return np.average(np.array(arr))

    def linear(self):
        """ perform linear GD using all derivatives """
        self.param_list = []

        # randomly generate initial parameter vector
        theta = np.abs(randn(len(self.derivs) + 1)) + 
            (((self.rangeX + self.rangeY) // 2) - 1) // 2
        theta[0] = 1.0

        # perform GD using all partial derivatives
        for _ in xrange(self.max_iterations):
            cost = self.__cost_(theta)
            theta += self.__gamma_(1.0)*cost


            #print theta

            if (not (0 < theta[-2] < self.rangeX) or 
                not (0 < theta[-1] < self.rangeY)):
            if (norm(cost) < self.limit):
            if (norm(cost) > 10e3):
                print "Values diverged."

        return theta, np.array(self.param_list)

    def __gamma_(self, magnitude=1.0):
        """ estimate the value of γ """
        if (len(self.derivs) + 1 == 4):
            # tuned for a decent Gaussian GD
            return -magnitude*np.array([1e3, 25.0, 2e2, 2e2])
        elif (len(self.derivs) + 1 == 5):
            # tuned for a decent Moffat GD
            return -magnitude*np.array([1e2, 1e1, 1e1, 1e1, 1e1])

    def __cost_(self, theta):
        """ compute the cost, Ω(mN^2) time. """
        total_cost = np.zeros(len(self.derivs) + 1)

        for i in xrange(self.rangeX):
            for j in xrange(self.rangeY):
                # get difference between model and array
                cost = self.form([i, j], theta) - self.array[i][j]

                # set up nabla
                nabla = np.zeros(len(self.derivs) + 1)

                # compute partials at each pixel
                for k in range(len(self.derivs) - 1):
                    nabla[k] = self.derivs[k]([i, j], theta)
                    # compute the last derivative twice, once for each
                    # axis. This may be collapsed into one line
                    # with a slight modification to the partial
                    # functions.
                    nabla[-1] = self.derivs[-1]([i, j], theta, NAXIS=1)
                    nabla[-2] = self.derivs[-1]([i, j], theta, NAXIS=0)

                # scale nabla by the difference
                total_cost += cost*nabla
            return total_cost / (self.rangeX*self.rangeY)

Most of the work in this class is done in the cost method, which builds \vec{\nabla}\text{cost} by iterating over the entire image subset and computing the partials at each pixel. Also note that I’ve written the code to work with lists as input too, so the class is a bit more flexible.

In the event that you don’t want to print out theta on every iteration (which fills the terminal), the following quick lines (or some variation thereof) may be substituted for that which is highlighted above.

if (len(self.derivs) + 1 == 4):
    sys.stdout.write('%ft%ft%ft%fr' % (theta[0], theta[1], 
                                           theta[2], theta[3]))
elif (len(self.derivs) + 1 == 5):
    sys.stdout.write('%ft%ft%ft%ft%fr' % 
                                          (theta[0], theta[1], 
                                           theta[2], theta[3], 

Now before we begin testing this algorithm on real data, let’s test it on a controlled set of examples with and without additive Gaussian noise. To generate these examples I have re-purposed the following class I wrote for UAV club (to convolve images).

import scipy.integrate as integrate
import numpy as np
import partials

class Kernel(object):
    """ creates kernels """
    def __init__(self, random_noise=False, noise_level=0.001):
        self.random_noise = random_noise
        if (random_noise):
            self.noise_level = noise_level

    def Kg2(self, X, Y, sigma, muX=0.0, muY=0.0):
        """ return a 2 dimensional Gaussian kernel """
        kernel = np.zeros([X, Y])
        theta = [1.0, sigma, muX, muY]
        for i in xrange(X):
            for j in xrange(Y):
                kernel[i][j] = integrate.dblquad(lambda x, y: 
                     partials.g2([x + float(i) - (X-1.0)/2.0, 
                         y + float(j) - (Y-1.0)/2.0], theta), 
                   -0.5, 0.5, lambda y: -0.5, lambda y: 0.5)[0]

        if (self.random_noise):
            kernel = kernel + 
                np.abs(np.random.randn(X, Y))*self.noise_level
            maximum = np.max(kernel)
            minimum = np.min(kernel)
            return map(lambda arr: 
                (arr-minimum)/(maximum-minimum), kernel)
            maximum = np.max(kernel)
            minimum = np.min(kernel)
            print maximum, minimum
            return map(lambda arr: 
                (arr-minimum)/(maximum-minimum), kernel)

    def FWHMg2(theta):
        """ get the FWHM of the Gaussian distribution """
        return 2*theta[3]*sqrt(log(1.0 / 

    def Km2(self, X, Y, alpha, beta, muX=0.0, muY=0.0):
        """ return a 2 dimensional Moffat kernel """
        kernel = np.zeros([X, Y])
        theta = [1.0, alpha, beta, muX, muY]
        for i in xrange(X):
            for j in xrange(Y):
                kernel[i][j] = integrate.dblquad(lambda x, y: 
                      partials.m2([x+ float(i) - (X-1.0)/2.0, 
                         y + float(j) - (Y-1.0)/2.0], theta), 
                   -0.5, 0.5, lambda y: -0.5, lambda y: 0.5)[0]

        if (self.random_noise):
            kernel = kernel + 
                np.abs(np.random.randn(X, Y))*self.noise_level
            maximum = np.max(kernel)
            minimum = np.min(kernel)
            return map(lambda arr: 
                (arr-minimum)/(maximum-minimum), kernel)
            maximum = np.max(kernel)
            minimum = np.min(kernel)
            return map(lambda arr: 
                (arr-minimum)/(maximum-minimum), kernel)

    def FWHMm2(theta):
        """ get the FWHM of the Moffat distribution """
        return 2*theta[0]*sqrt(pow(2, 1 / theta[1]) - 1.0)

Now that we know what the accepted values should be, we can truly test the results of the algorithm. Below is a sample Moffat distribution, generated by the above script.


An example Moffat distribution with additive noise N=0.005, and parameters \alpha=3.5, \beta=4.5.

And another without background noise for visual comparison to that above.


Moffat without additive noise, same parameters as the last.

We may create example PSFs as follows.

>>> from kernel import *
>>> PSF = Kernel(random_noise=False, noise_level=0.01)
>>> example = PSF.Kg2(15, 15, sigma=1.75, muX=0.0, muY = 0.0)

And perform a fit:

>>> import partials
>>> from GD import *
>>> fit = GD(image1
...     , form=partials.g2 
...     , derivatives=[partials.PartialAg2
...         , partials.PartialSigmaXg2
...         , partials.PartialMuXg2]
...     , max_iterations=1000
...     , limit=1e-5
... )
>>> theta, progress = fit.linear()

I could have written the code so the partials are inherent to the class, but I thought it better for the future (in case we wanted to try a different dirstribution) to just submit each partial, as well as the related function. Also note that I’ve passed partials.PartialMuXg2 last – for this class it’s important that the \partial_{\mu_{i}}‘s be in the last position.

I get a convergence for g_{2,c} in ~500 iterations using the above pre-defined rate vector, and ~1200 for m_{2}. Again, these rates aren’t optimal, so it doesn’t make sense to compare. Below are graphs showing the convergence of either function to a generated g_{2,c} with parameters \sigma=1.75, \mu_{1}=7, \mu_{2}=7 (and of course it’s rescaled in to [0,1]).


Convergence of the Moffat function to a Gaussian profile. What’s interesting is that \beta (red) converges to a relatively high value, so the function is closer in shape to a Gaussian. In this plot Blue=A, Red=\beta, Purple, Cyan=x_{1}, x_{2} and Green=\alpha.


Convergence of the Gaussian function to a Gaussian profile. Because I sought to initialize \mu_{1} and \mu_{2} (two indistinguishable lines about 7.0) as close to center as possible, they converge extremely fast, whereas A (blue) and \sigma (green) take a few more iterations.

Now for a bunch of noise to see how well the algorithm copes. The centroid shouldn’t vary much, but I’m expecting A, \sigma, \alpha and \beta to change quite a bit.


I didn’t get a decent convergence from m_{2} inside of 5000 iterations. Setting the rates faster caused a bit of chaos (e.g. see the last section). I’m assuming that a larger array may have been necessary, as in the fits with SDSS data (below), which worked fine.


Convergence of g_{2,c} occurred in about the same number of iterations, probably indicative that it’s a more robust model.

Now that we have definitive results showing that the algorithm has the ability to converge to the approximate expected values, we’ll try it on real data.

Working with FITS Images and Testing on Real Data

We’ll try fitting these two models to single stars in the latest SDSS3 data release, DR9. You can try this on any data you like, however.

Specifically, I have downloaded frame 15 from the G-band. If you would like to use these same data, you may download and uncompress the image as follows.

bzip2 -dk frame-g-008158-6-0015.fits.bz2

We can open these images with PyFITS in the following function.

import pyfits

def get_fits_array(image_name, axis=0, info=False):
    """ Get the data from an image along a specified axis. """
        image =, mode='readonly')
        if (info):
        return image[axis].data

This returns a NumPy array containing the image data. Extracting stars and objects from the frame is a huge statistical problem in itself, but here I’m just going to take the array and crop it nicely about a star that I’ve physically located. By the way, if you’re not already familiar, I’ve found that SExtractor is extremely simple to use for this kind of work too.

>>> image = get_fits_array('frame-g-008158-6-0015.fits')
>>> image = image[1046:1086,530:570]

We can also plot this image and view it as follows.

# be careful logging - there may be -'s or 0's
image[np.where(image < 0.0)] = 0.01
plt.imshow(np.log10(image), interpolation='none', cmap='gray')



A single star in the g-band image. I’ve purposely made it off-center as well.

The \text{cost} function required quite a bit of time to compute in either case because the subset above is now 1600 pixels instead of a mere 225.


The values the algorithm arrived at are A=27.06944038, \sigma=2.11533306, x_{1}=20.71500907, and x_{2}=20.69633224.

The Moffat function didn’t take nearly as long to converge as I was expecting, and its parameters do some interesting things too. The only bottleneck is the extra partial derivative it has to compute at each pixel, which becomes more noticeable as the array becomes larger (not to mention the fact that its derivatives are a bit more complex).


GD with m_{2} arrived at A=27.28057256, \alpha=13.05963548, \beta=20.17842481, 20.71206149, 20.69690351.

Parameters such as A, x_{1} and x_{2} compare favorably with those of the Gaussian. It also appears that the Moffat function becomes far closer to normal during the descent, before very slowly decreasing \beta again. Altogether I do not think this is a good method for solving for \beta as it’s a “weaker” variable, only having an impact after the distribution is centered and A is closer to its optimal value.


I mentioned above that I would like to implement a method that determines a decent “learning rate” for each parameter. It’s rather easy to drive the parameters too fast, causing unstable results like that below.


GD that’s driven too hard makes for chaotic results.

I’m also interested in pursuing one of the many second order GD methods, but again I am still reading the literature and working out the details (especially the computation of the Hessian).

If you can think of any ways to improve this code feel free to let me know in the comments!


[1] The effects of seeing on Seeing on Sérsic profiles – II. The Moffat PSF, by Trujillo et. al. (2001)

[2] A Theoretical Investigation of Focal Stellar Images in the Photographic Emulsion and Application to Photographic Photometry, by A. Moffat (1969)

[3] e as the limit of \left(1+\frac{1}{n}\right)^{n}, by D. Joyce (click here for the .pdf)

[4] Stochastic Optimization, by J. Spall, Handbook of Computational Statistics p. 173

[5] Common-Resolution Convolution Kernels for Space- and Ground-Based Telescopes, by G. Aniano et. al.

[6] Advanced Calculus, D. Widder (1898), p. 371

Posted in Algorithms, Astronomy, Mathematics, Python | Tagged , , , | 1 Comment

The RX Algorithm

Over the past few months I have been writing programs to perform basic computer vision tasks for a drone we built in UAV club. To be blunt, I learned a lot – both about what I’m capable of, and what it means to work on a small team.

Before I explain the algorithm and share the code I wrote, I’ll outline our goals. They were to

  1. obtain usable imagery despite any vibrations and attitude variations of the quadcopter,
  2. search a complete frame for anomalies (colored targets),
  3. and classify any anomalies.

My teammate worked a lot on algorithms and methods related to #3, while I was slightly more interested in #’s 1 & 2. I saw the goals as hierarchical; namely that a) we probably cannot classify anomalies in the data if we have no means to locate them in entire frames, and b) we cannot search the data for anomalies if the data are not clean and unblurred.

In this post I will talk briefly about #2.

After a lot of research I eventually found several papers on anomaly detection, including [1], [2] and [3]. I also went to the library and found a book on color science – a field for which I now have the utmost of respect, because the mathematics involved with color matching and other concepts is quite elegant. Moreover, in each of aforementioned papers the authors cover the RX algorithm, so I thought I’d give it a try.

Consider the multivariate Gaussian distribution

g_{n}\left(X; \mu,\Sigma\right)=\dfrac{1}{\left(2\pi\right)^{n/2}|\Sigma|^{1/2}}\exp\left[-\dfrac{1}{2}\left(X-\mu\right)^{\text{T}}\Sigma^{-1}\left(X-\mu\right)\right],

where |\cdot| denotes the determinant and \Sigma is the covariance matrix. There are a lot of interesting properties of this function that you may read about in [4]. For the RX algorithm we are, for the most part, only interested in the argument of the exponential function


known as the Mahalanobis distance. It is basically a measure of how many standard deviations X, a spectral vector in the RGB tristimulus space, lies from the mean spectral vector mu. For the RX algorithm we hope to compute this distance metric for all pixels in an image (which we may visualize as vectors in the space \text{RGB}=\left[0,255\right]^{3}).

This operation’s complexity is similar to that of convolution (it will be bound below by \Omega\left(N^2\right) at least), but there are a few steps we may take to reduce the complexity of the problem. For instance, to compute the covariance matrix \Sigma for each image it’s not necessary to use every single pixel. In [3] the authors state that \Sigma “is the estimated background covariance matrix.” By extracting a random sample of the image pixels (in practice I found that only ~10^4 worked well, but as few as 10^3 yielded reasonable results) this matrix can be estimated quickly.

See the below figure for an actual comparison (the Frobenius norm of an average of 100 runs for each extraction sample size, ranging from 10^{2}10^{5}).


This process assumes that the background (the ground) is the dominant feature in a given image, and from a high vantage point this is likely the case.

Yet another simplification we can make is from the properties of the matrix \Sigma. It is commonly defined as



since \Sigma is symmetric. We may use this definition so that we only have to compute p\left(p+1\right)/2 values instead of every entry, where p is the dimension of the original image.

Now to compute d_{M} I’ve written the following class, which utilizes many vector and matrix operations available in NumPy.

from math import sqrt
import numpy as np

class dM(object):
    def __init__(self, array):
        self.array = array
        self.Xpix = array.shape[0]
        self.Ypix = array.shape[1]

    def Mahalanobis(self):
        """ Compute the Mahalanobis distance to every point in RGB
        space within the image """
        mean_vector = np.mean(self.array, axis=(0, 1))
        variance_covariance = self.variance_covariance()

        # iterate over the original image and store dM in this new array
        distances = np.zeros([self.Xpix, self.Ypix])
        for i in range(self.Xpix):
            for j in range(self.Ypix):
                distances[i][j] = sqrt(
                self.array[i][j] - mean_vector), variance_covariance), 
                self.array[i][j] - mean_vector))

        return distances

    def variance_covariance(self, number=10000):
        """ create a variance-covariance matrix """
        reshaped_array = self.array.reshape((self.Xpix * self.Ypix, 3))

        # sample the above array to reduce computing time
        choices = np.random.randint(0, len(reshaped_array), number)
        reshaped_array = np.array([reshaped_array[i] for i in choices])
        average = np.mean(reshaped_array)

        # compute the variance-covariance matrix for these RGB data
        matrix = np.array(sum([np.outer(np.array([reshaped_array[i] - 
           average]), np.array(reshaped_array[i] - average)) for i in 
            range(len(reshaped_array))]) / len(reshaped_array))

        return np.linalg.inv(matrix)

This code yields the following results upon a sample image.


Artifacts are due to the image compression and not the algorithm itself. As we can see, the objects that are “different” show up far brighter (like the pink target at upper-right), whereas the background, viz. grass and runway, are much darker.


I would like to try writing this algorithm in pure C to get a sense of how expensive it actually is to compute – as of late I’ve felt quite distant from the “happenings” while using Python.


[1] Anomaly Detection based on a parallel kernel RX algorithm for multicore platforms, Molero, J. et. al.

[2] Is there a best hyperspectral detection algorithm?, Manolakis, D. et. al.

[3] Anomaly Detection for Hypaerspectral Imagery Using Analytical Fusion and RX, Zhang, G. and Yang, C.

[4] The Multivariate Gaussian Distribution, by Do, C.

Posted in Algorithms, Mathematics, Programming, Python | Tagged | 3 Comments


I won’t get the chance to write another post on February 29 for four years, so I wrote the following script in to give us some chaos to remember it by [1].

:Local upperb,pts2save,divs,maxin,minin
:© set variables
:© make domain set x
:Lbl retspot
: augment(x,seq(i,i,1,divs))↦x
: If m≤pts2save Then
:  m+1↦m
:  Goto retspot
: EndIf
:SortA x
:Disp dim(x)
:© make empty range set
:Disp dim(total) ©so we can make sure dim(x)=dim(total)
:For h,minin,maxin,(maxin-minin)/(divs-1)
: .1↦p ©initial condition
: {}↦list
: For i,1,upperb,1
:  (1+10*h)*p-10*h*p^2↦p ©difference equation
:  If i≥upperb-pts2save+1 Then
:   p↦list[i-(upperb-pts2save)]
:  EndIf
: EndFor
: Disp h ©nice to know progress
: For j,1,pts2save,1
:  list[j]↦total[c+j]
: EndFor
: c+pts2save↦c ©adjust next ten entries in total
:DelVar list
:NewPlot 1,1,x,total,,,,5 ©commas are necessary
:ZoomData ©plots

And the output after like 20 minutes:


The two biggest problems were memory overflow (the Titanium has a whopping 188 kB of RAM) and dimension mismatches. The latter may sound like it has a simple fix, but I found it rather difficult to wrestle the lists to the same length; there are just so many places you can mess up! Not to mention how confusing it is to index lists at 1 !

If you have suggestions for improvements please let me know in the comments.

Sources/Additional Resources

[1] Nagle, R., E. Saff, and Snider, A. Fundamentals of Differential Equations. 8th ed. Boston: Pearson/Addison Wesley, 2012. 151-52. Print.

[2] TIGCC Documentation

[3] Rosetta Code TI-89 BASIC

Posted in Programming, Ti 89 | Tagged | Leave a comment

The Method of Least Squares

This post is a primer to the method of least squares, which allows one to fit a line to roughly linear data. In my next post I hope to expand upon these results and attempt to fit a nonlinear model to astronomical data, as outlined toward the end of my video in my first post.

Consider the graph I have drawn below of ten points in the real plane.


A bunch of points we’d like to fit a line to.

Explicitly, we have the finite set



and we wish to fit the continuous function


to this plot such that the distance between f and P is minimal, hence “least squares.” To determine the error or cost between the points and f we sum their squared differences at each point as


wheren=\text{card }P. Squaring each distance makes the problem appear as though we are optimizing the area of the squares with one vertex at the point p_{0}\in P and the other on the line. The benefits are that there can’t be a negative or canceling area.

The equation of a line has two distinct constant parameters a,b that we hope to solve for that represent the slope and the y-intercept. In more complex models, an example of which we will see in my next post (I have two computers processing those results as I’m writing this), there are more parameters and iterations.

When we speak of optimization in Calculus we’re usually talking about derivatives. Taking the partial derivatives ofSwith respect to its parameters yields the following results.

\dfrac{\partial S}{\partial a}=\displaystyle\sum\limits_{k=1}^n 2x_k\left[ax_k+b-y_k\right]

=2a\displaystyle\sum\limits_{k=1}^n x_k^2+2b\displaystyle\sum\limits_{k=1}^n x_k-2\displaystyle\sum\limits_{k=1}^n x_k y_k

=a\displaystyle\sum\limits_{k=1}^n x_k^2+b\displaystyle\sum\limits_{k=1}^n x_k-\displaystyle\sum\limits_{k=1}^n x_k y_k=0,

\dfrac{\partial S}{\partial b}=\displaystyle\sum\limits_{k=1}^n 2\left[ax_k+b-y_k\right]

=2a\displaystyle\sum\limits_{k=1}^n x_k+2bn-2\displaystyle\sum\limits_{k=1}^n y_k

=a\displaystyle\sum\limits_{k=1}^n x_k+bn-\displaystyle\sum\limits_{k=1}^n y_k=0.

I have set them equal to 0 because, after all, optimal values of functions (relative mins and maxes) occur where \partial_{a}S=0 and \partial_{b}S=0. By the last result we can actually solve directly for b as

b=\dfrac{\displaystyle\sum\limits_{k=1}^ny_k-a\displaystyle\sum\limits_{k=1}^n x_k}{n}.

Furthermore, by direct substitution of this value ofbinto the former partial derivative we may solve fora(this is what makes linear regression so “nice”) to obtain

a\displaystyle\sum\limits_{k=1}^n x_k^2+\left(\dfrac{\sum_{k=1}^ny_k-a\sum_{k=1}^n x_k}{n}\right)\displaystyle\sum\limits_{k=1}^n x_k-\displaystyle\sum\limits_{k=1}^n x_k y_k=0

\implies a=\dfrac{n\displaystyle\sum\limits_{k=1}^n x_ky_k-\left(\displaystyle\sum\limits_{k=1}^n y_k\right)\left(\displaystyle\sum\limits_{k=1}^n x_k\right)}{n\displaystyle\sum\limits_{k=1}^n x_k^2 -\left(\displaystyle\sum\limits_{k=1}^n x_k\right)^2}.

These two sums are incredibly simple to program in Python. Applying these equations to the set of points above, we obtain summation results of

10\displaystyle\sum\limits_{k=1}^{10} x_ky_k=407.5,






and a\displaystyle\sum\limits_{k=1}^{10}x_k=1\cdot a.

Therefore,a=\dfrac{407.5-\left(-2.5\right)\left(1\right)}{500-\left(1\right)}\approx 0.82164328\cdots andb=\dfrac{-2.5-a}{10}\approx -0.33216432\cdots, giving us a function

y\approx 0.82164328x-0.33216432.

Graphing this against the points above shows that it is indeed a pretty good fit:


The result of fitting a line to this set of points.

Another verification we can use to determine that this is the optimal value is by taking a look at the Hessian matrix,

\det H_{S}=\det\begin{bmatrix}\displaystyle\sum\limits_{k=1}^n x_k^2 & \displaystyle\sum\limits_{k=1}^n x_k\\\displaystyle\sum\limits_{k=1}^n x_k & \displaystyle\sum\limits_{k=1}^n 1\end{bmatrix}

=\left(\displaystyle\sum\limits_{k=1}^n x_k^2\right)\left(\displaystyle\sum\limits_{k=1}^n 1\right)-\left(\displaystyle\sum\limits_{k=1}^n x_k\right)^2

=\dfrac{1}{2}\displaystyle\sum\limits_{j,k=1}^n\left(x_j-x_k\right)^2>0 (by Lagrange’s Identity)

which verifies that it is in fact the best-fit line.

As I’ve already mentioned, in my next post I hope to expand upon almost everything here with a direct application to fitting functions to discrete stellar profiles in astronomical images. I basically hope to compare Gaussian vs. Moffat function fits to several hundred stars in SDSS data, very similar to the results of Trujillo et. al. [1].


[1] The effects of seeing on Sersic profiles – II. The Moffat PSF, by Trujillo et. al.

Posted in Mathematics | Tagged | 2 Comments

The Relation Between the Laplace Transform and Power Series

Tonight I rewatched lecture 19 from a differential equations class that was taught just over 10 years ago at MIT. As I was highly interested by David Widder’s comments on the similarities between the transform and power series, I thought I would paste a few comments on things that I find remarkable.

First and foremost might be the difference, as mentioned by the professor in the above lecture, between transforms and operators. For example, the mapping \mathcal{D}:C^{n}\left(X\right)\rightarrow C^{n-1}\left(X\right), for some arbitrary n\in\overline{\mathbb{Z}}_{+} (the differential operator), is defined for f\in\text{dom}\left(\mathcal{D}\right) as

f\left(x\right)\mapsto f'\left(x\right),

such that both are functions of x. On the other hand, if \mathcal{T} is to denote a transform, then


the highlight of which is that the outcome is a function of an entirely different domain. In the case of the Laplace transform, it might be such that \Re\left(s\right)> 0, or some value that allows the improper integral to converge.

Power series take the form


The coefficients of x^{k}, a\left(k\right), can be integers, rational functions or even more complicated functions of k. Either way, we will visualize them as mappings of nonnegative integers a\left(k\right):\mathbb{Z}^{+}_{0}\rightarrow \mathbb{R}. For example, as in the lecture above, we see that

a\left(k\right)=1\leadsto\displaystyle\frac{1}{1-x}\,;|x|< 1

a\left(k\right)=\displaystyle\frac{1}{k!}\leadsto e^{x}\,;x\in\mathbb{R},


In the last I cheated a little, adding an x-term, but from the perspective of a\left(k\right) this is little more than an arbitrary constant. The big difference between the power series above and the Laplace transform is that \mathcal{L} is a continuous analog of the power series. To see how, write the above summation over the nonnegative integers for t\in\mathbb{R} (thus replacing the sequence of coefficients \left\{a\left(k\right)\right\}_{k=0}^{\infty} with a continuous function a\left(t\right)) as


Here we restrict the values of x to 0<x<1 to “assert convergence,” in a sense, and we rewrite the exponential x^{t} as

x^{t}=\exp\log x^{t}=\exp\left(t\log\left(x\right)\right).

Since x\in\left(0,1\right), we see that \log\left(x\right) takes on values


By letting s=-\log\left(x\right) (and thus s\in\left(0,\infty\right) by the prior results), we obtain the definition of the Laplace transform:


Widder makes a slightly different move in his derivation but achieves the same results.

At the end of chapter 13, section 1 Widder’s text he provides a lot of examples. Similar to power series, we see now that


f\left(t\right)=e^{at}\leadsto\displaystyle\frac{1}{s-a}\,;a\neq s\,;s>0,


The last result is quite interesting to me because it leads at times to important simple results that can be obtained from Fractional Calculus. For instance, one might notice that


for fractional powers. It (should) also be the case that

t^{n}f\left(t\right)\leadsto\exp\left(n\pi i\right)\displaystyle\frac{d^{n}F\left(s\right)}{{ds}^{n}},

but I’ve not yet proved it.

Posted in Mathematics, Online Lectures | Tagged , , | Leave a comment