Coding in the Sun

24 August 2012

Just a final note

Well, the email from Melange could be less ambiguous, but I'm proud to say I completed Summer of Code. As soon as I get information on where to upload the code sample, I'll be happily sending a .tar.gz of my summer's work! This is just a quick note to say thank you, to everyone who was following my project this summer—and to GNU Octave, for making this excellent experience possible.

I will still be contributing to my code and to Octave in general; I'm going to take a little time off to enjoy the rest of the summer, but I'm going to get back to working on the wavelet functions with some more reading when I have time during the semester. I'll also happily respond to questions and updates about the code that's already present, just send them along to me.

Happy computations!

17 August 2012

Correct functions aren't everything. There's always more.

It's been a little long since my last post here; I've mostly been staring at code and thinking about phrasing recently. The best and worst part of documentation strings is making something sensible out of 80 characters—doubly hard when the name of a function is the Lomb Normalized Periodogram. That's 27 characters right there!

Admittedly, not everything has been documentation. With Carlo's help, I've replaced the guts of the lscomplex and lsreal functions with code that can take advantage of vector processing methods; the differences between the results were minimal, and the code became far more manageable to read, so it's now the main form of the functions. (Thanks, Carlo!)

Also from Carlo are two texts on wavelet transforms that I'm going to look into; I've set aside hopes of completing the wavelet functions currently—other than lswaveletcoeff and lscorrcoeff; those two work just fine, however they're limited in their capabilities, as they each only operate over a single window defined by the user. Carlo provided these references:

Abramovich, F., Bailey, T.C. & Sapatinas, T. (2000). Wavelet analysis and its statistical applications. The Statistician, Vol. 49, 1-29. http://www2.ucy.ac.cy/~fanis/Papers/statistician2000.pdf

Guy Nason, Wavelet Methods in Statistics with R, Springer, 2008.

I intend to continue working on this package after GSoC is over—there are several linearly-independent solution finding methods available, and I'd love to implement at least one in the package—and hopefully I can write at least some form of wavelet transform after studying other wavelet transforms.

As a closing note, the biggest change that I've made in the latest release (lssa-0.1.2) is adding all sorts of input checking code to all of the functions; if you give a function the wrong number of variables, it will complain—but I haven't come up with a way to check the window function for those functions that take a window function as input. So be careful with the window function.

24 July 2012

Coming to grips with the wavelet function

First off, OctConf was awesome! Meeting the other GSoC students and all the core developers was an awesome experience, and I hope I can go to more events in the future.

Now then. It is safe to say that the implementation of nurealwavelet() is in fact completely broken and may never have been meant to work. There's an infinite loop, there's an unused pointer, there's so many things wrong that I wonder how this article made it into print. On that note, I'm going to go over the entire function as it is represented in the paper here, and try to understand it. (In the following, the series is represented as a (t,ξ) series of complex values associated with a time series.)

$\bg_white \zeta _ { t, \omega } = \frac{ \sum _ { k \in K } w ( \sigma \omega ( t _ k - t ) ) e ^ { - i \omega ( t _ k - t ) } \xi _ k }{ \sum _ { k \in K } w ( \sigma \omega ( t _ k - t ) ) }$ From the context in the paper, it's clear that t has some import versus t_k; unfortunately for me, how to use t is not at all clear. I am working from the assumption that t is supposed to be the centre of each window, and as such the proper implementation is to divide the range specified in the input by the number of intervals to determine the width of each interval (and thus the radius of said window being half again the width of the window), thus being able to apply the transform over the whole data set.

The problem that I've been avoiding so far is that the window changes as the frequency drops; a lower frequency requires a much larger window to define it, but this means the number of windows tested decreases, thus a matrix for storing results doesn't make sense, as it will become progressively more full of junk data/excess unused elements. I'm going to look for other wavelet transforms as they're performed with Octave to see what I can learn from others on how to implement this.

17 July 2012

An exciting update on the fast functions

So, fastlscomplex and fastlsreal have had some problems. Lots of problems, in fact, and I wasn't really happy with my code. However, after rewriting most of fastlscomplex, I've got it correct on the first result, while further approximations appear to run afoul of compounding error. At the very least, however, this is far improved from the previous behaviour, and I'll work on getting fastlsreal rewritten.

10 July 2012

Mid-session report

I really haven't updated recently. Then again, most of my recent work has been in fixing minor errors and adding documentation. So, as it stands, none of the tests I have written work if you call test() and the info strings (which I tried to make informative) also don't work. I have no clue why; Jordi thinks it's the negation field and I'm agreeing with him for now. I'm sure I'll find a way to fix it shortly. In the meantime, I'm reading the Introduction to Wavelets article and working on my presentation.

This will all be pretty cool, I think.

30 June 2012

Re-reading (and re-reading again) the wavelet transform

So, a quick update: the lombcoeff function, which implements the Lomb unnormalized transform at one frequency, has been written and has a doc string; there was actually no problem with my code, I needed to reboot Octave; I may try compiling all of the Fortran libraries for x64 (or check if they already are, since I installed a 64-bit only system on the computer in question) and build Octave with 64-bit support; finally, I'll be writing a batch of tests this weekend based around a sum of a few sine curves, just to show how the function works (and I'll apply them across the whole suite, too. Once I've got sample data, it's pretty quick to extend.) The next step there is to write a test script using the Vostok data, and then I need to write a documentation file clearing up the source of that data (collected and documented by JR Petit et al, published in Nature, and available in tab-separated values from the NOAA's paleoclimatology site.) Currently in the /data folder is a CSV export of the .rda archives from the Mathias paper, but I think I may re-export them with a text file explaining each column, since the R export included various nonsense values around the data.

As for the nurealwavelet and fastnurealwavelet commands, I will need to study the code and the paper a few more times (I think I've read it at least five times so far) because neither section actually makes any sense to me. There is no nurealwavelet.R file, so nurealwavelet() in fastnu.c is not actually accessible from R, while the fast- version seems to have other problems. I get the feeling that I'm better off writing using only the paper as a guide when it comes to this transform, and I may not implement the fast- variation, since thus far the "fast" implementations have been orders of magnitude slower than their supposedly slower brethren (although I have a few ideas about accelerating them, mostly by changing how some math is handled and changing to switches from my current control structure. The second is easier to implement, and I will add that change before the more intricate changes I'm planning.)

In short, I'm back to work as usual, except for scratching my head over the wavelet transform. (I'll take some time now to read the paper Mathias cited, An introduction to wavelets, as it is available online and will most likely make my life easier.)

25 June 2012

Examining current results

Well, the methods all run. That's a nice thing. The not-so-nice thing is that their results are not exactly in line with what the R code suggests they should be. Specifically, in the case of lscomplex, all of the R results are compressed into the first 400 values, taking omegamax = 1, 20 octaves and 100 coefficients per octave, while the next 1600 values are all infinitesimally small frequencies which are in essence garbage values ... oh, and my code generates values 283.42 times larger than the R results. (Other oddities are cropping up in the fast methods as well, but I'd rather break it down in the ordinary methods first.)

In short, I'm slightly confused and need to work out what exactly is going wrong here. I have a feeling this will involve the rest of the day.