Coding in the Sun: Least squares spectral analysis

Showing posts with label Least squares spectral analysis. Show all posts

24 July 2012

Coming to grips with the wavelet function

First off, OctConf was awesome! Meeting the other GSoC students and all the core developers was an awesome experience, and I hope I can go to more events in the future.

Now then. It is safe to say that the implementation of nurealwavelet() is in fact completely broken and may never have been meant to work. There's an infinite loop, there's an unused pointer, there's so many things wrong that I wonder how this article made it into print. On that note, I'm going to go over the entire function as it is represented in the paper here, and try to understand it. (In the following, the series is represented as a (t,ξ) series of complex values associated with a time series.)

$\bg_white \zeta _ { t, \omega } = \frac{ \sum _ { k \in K } w ( \sigma \omega ( t _ k - t ) ) e ^ { - i \omega ( t _ k - t ) } \xi _ k }{ \sum _ { k \in K } w ( \sigma \omega ( t _ k - t ) ) }$ From the context in the paper, it's clear that t has some import versus t_k; unfortunately for me, how to use t is not at all clear. I am working from the assumption that t is supposed to be the centre of each window, and as such the proper implementation is to divide the range specified in the input by the number of intervals to determine the width of each interval (and thus the radius of said window being half again the width of the window), thus being able to apply the transform over the whole data set.

The problem that I've been avoiding so far is that the window changes as the frequency drops; a lower frequency requires a much larger window to define it, but this means the number of windows tested decreases, thus a matrix for storing results doesn't make sense, as it will become progressively more full of junk data/excess unused elements. I'm going to look for other wavelet transforms as they're performed with Octave to see what I can learn from others on how to implement this.

17 July 2012

An exciting update on the fast functions

So, fastlscomplex and fastlsreal have had some problems. Lots of problems, in fact, and I wasn't really happy with my code. However, after rewriting most of fastlscomplex, I've got it correct on the first result, while further approximations appear to run afoul of compounding error. At the very least, however, this is far improved from the previous behaviour, and I'll work on getting fastlsreal rewritten.

10 July 2012

Mid-session report

I really haven't updated recently. Then again, most of my recent work has been in fixing minor errors and adding documentation. So, as it stands, none of the tests I have written work if you call test() and the info strings (which I tried to make informative) also don't work. I have no clue why; Jordi thinks it's the negation field and I'm agreeing with him for now. I'm sure I'll find a way to fix it shortly. In the meantime, I'm reading the Introduction to Wavelets article and working on my presentation.

This will all be pretty cool, I think.

30 June 2012

Re-reading (and re-reading again) the wavelet transform

So, a quick update: the lombcoeff function, which implements the Lomb unnormalized transform at one frequency, has been written and has a doc string; there was actually no problem with my code, I needed to reboot Octave; I may try compiling all of the Fortran libraries for x64 (or check if they already are, since I installed a 64-bit only system on the computer in question) and build Octave with 64-bit support; finally, I'll be writing a batch of tests this weekend based around a sum of a few sine curves, just to show how the function works (and I'll apply them across the whole suite, too. Once I've got sample data, it's pretty quick to extend.) The next step there is to write a test script using the Vostok data, and then I need to write a documentation file clearing up the source of that data (collected and documented by JR Petit et al, published in Nature, and available in tab-separated values from the NOAA's paleoclimatology site.) Currently in the /data folder is a CSV export of the .rda archives from the Mathias paper, but I think I may re-export them with a text file explaining each column, since the R export included various nonsense values around the data.

As for the nurealwavelet and fastnurealwavelet commands, I will need to study the code and the paper a few more times (I think I've read it at least five times so far) because neither section actually makes any sense to me. There is no nurealwavelet.R file, so nurealwavelet() in fastnu.c is not actually accessible from R, while the fast- version seems to have other problems. I get the feeling that I'm better off writing using only the paper as a guide when it comes to this transform, and I may not implement the fast- variation, since thus far the "fast" implementations have been orders of magnitude slower than their supposedly slower brethren (although I have a few ideas about accelerating them, mostly by changing how some math is handled and changing to switches from my current control structure. The second is easier to implement, and I will add that change before the more intricate changes I'm planning.)

In short, I'm back to work as usual, except for scratching my head over the wavelet transform. (I'll take some time now to read the paper Mathias cited, An introduction to wavelets, as it is available online and will most likely make my life easier.)

25 June 2012

Examining current results

Well, the methods all run. That's a nice thing. The not-so-nice thing is that their results are not exactly in line with what the R code suggests they should be. Specifically, in the case of lscomplex, all of the R results are compressed into the first 400 values, taking omegamax = 1, 20 octaves and 100 coefficients per octave, while the next 1600 values are all infinitesimally small frequencies which are in essence garbage values ... oh, and my code generates values 283.42 times larger than the R results. (Other oddities are cropping up in the fast methods as well, but I'd rather break it down in the ordinary methods first.)

In short, I'm slightly confused and need to work out what exactly is going wrong here. I have a feeling this will involve the rest of the day.

Just a few thoughts this morning

So, I ran mkoctfile on the fast transforms this morning; fastlscomplex works fine and cleanly (and fast, on this processor!) Unfortunately, I've left some error in fastlsreal still, and it segfaults with a SIGSEGV 11 — Address Boundary Error; I'll figure that out this morning, and I'll check the output of fastlscomplex against the reference R implementation.

UPDATE: Found the problem, fixed it. There was an accidental call to a zero address (whoops.)

UPDATE 2: There's now the excellent question of what to do about the fact that the results produced vary in some areas, occasionally by spectacular amounts. I'm going to do some analysis on the results today, including generating the cosine functions indicated and comparing them to the data.

21 June 2012

Progress? Absolutely.

Well, I'm about to commit some code that compiles without a hitch for fastlsreal, definitely a plus. Once I set up a clean environment to build the necessary R code, I'll generate the results to compare against my .oct files (as I've gotten behind on that front.) Finally, I'll write the documentation for the functions from the past two weeks, and that'll be it!

08 June 2012

Success!

I just uploaded fastlscomplex.cc in its current version, and it works. It generates what looked to me to be correct results, I'll do a few spot-tests later. So far, it doesn't implement any sanity checks, so if run with bad input, it will segfault right now; again, I'll add sanity checks this afternoon. (The other thing is, it's much slower than lscomplex, most likely because I didn't hard-code many optimisations yet. Some functions could be accelerated by changing the number of operations used, for instance. I'll look into reducing the complexity of some sections during the buffer time before OctConf, while I'm also cleaning up documentation.

In the meantime, I'm just happy to have a working function!

07 June 2012

Rewritten code and irregular segfaults

So, I decided on Monday that the best option was to completely rewrite fastlscomplex from scratch; the code took a day to write, and on Wednesday morning, it compiled just fine. One minor problem is that it causes segfaults. (Okay, more than just minor. But it runs, sometimes.) I've figured out where the segfault is, after abusing std::cout as a way to check that it's running. The segfault-causing error is somewhere in the merging code, so I'm going to remove the output I added to check on what output was generated so far, and focus my attention on the merging code so that I can determine why it's failing.

The other problem with this code currently is that it doesn't actually output results. When it does run, it generates a long vector of the correct size, all of whose elements are zero. That's ... not exactly the result I was expecting, so while I know the actual answer-generating code is not killing the process, there's obviously a problem in it somewhere. That's what I'll be focusing on this afternoon, after I find a way to keep the code from randomly going ballistic on me.

02 June 2012

Sometimes macros are more of a pain than you expect.

Right now, I have most of the fastlscomplex method structured the way I want it to be upon completion, however it still requires some tweaking to manage the shift from C to C++ — after a few applications of replace-string, most of the mkoctfile errors vanished (apparently it didn't like a macro that was used in the original code). In short, the code I'm working with still doesn't compile, but it's very close at this point, and I should have a fully operational fastlscomplex by early next week. (Even with this slowdown, I'm still almost a week ahead!)

Once I've gotten fastlscomplex running, I'll spend a little time cleaning up other functions, adding tests and other error-response components, so that I've got a lesser load later on to finish that.

31 May 2012

Towards a full oct-file

Yesterday (well, early this morning) I uploaded a fastlscomplex.cpp file, which is still in its interim state; many of the control structures are taken directly from the fastnu.c source, and I need to adapt how fastlscomplex interacts with what will be the result vector, but it's a solid start. Now to figure out why my test oct-file caused a segfault ...

EDIT I left out return octave_value_list (); at the end (it was just a test that wrote the contents of a complex row vector to stdout), now back to weightier matters.

29 May 2012

Working with Oct files and next steps

I spent yesterday reading about Oct files and examining the code in fastnu.c with an eye towards producing a working oct-file form of fastlscomplex, and I think I've found the proper course to take, in the end. The one painful part of this is that it will involve rewriting fastlscomplex as C++, not C — but, I'd like to point out, this may not be a bad thing.

The greatest driving factor in my decision is the signature on the current fastnucomplex() function:

void fastlscomplex(Real *tptr, Complex *xptr, int *nptr, double *lengthptr, int *ncoeffptr, int *noctaveptr, Real *omegamaxptr, Complex *rp)

which involves quite a few pointers; in the context of the original function, this proved to be no problem, as the R code was designed to integrate with the C in use here, and as such it used more primitive structures like Real[] and Complex[] (Real, Complex being typedef-d parts of the C'99 spec) — however, Octave reveals structures to C++ as objects, so my new code has the following objects:

const RowVector tvals = args(0).row_vector_value(); 

const ComplexVector xvals = args(1).row_vector_value();

which do not behave nicely in the context of pointers. They're much better traversed with methods designed to do so. As such, in the end, I've determined that it'll be of more use to me to rewrite fastlscomplex, and, when I get to it, fastlsreal as C++ code — but I don't expect this to introduce too much overhead into my work, since the bulk of the work is changing the relevant pointers to the proper methods, while altering a few structures into forms that I'd prefer.

I'm starting working on this today, and I'll keep up my SVN commits; I corrected a typo in my commit today, and I'll write some tests/different demo functions when I'm too burned out on the C++.

26 May 2012

Not quite the progress I'd hoped for

I had hoped to have mostly written a demonstration script on Thursday, but an error in my lsreal code (that I should have noticed sooner) took up most of my day; on the other hand, that error is caught, corrected, and I've got a few demonstrations in the works. I also wrote lscomplex, and wrote some documentation for it. As of right now, I don't know what I'll do for a demo function (in the function itself) unless I create a sample dataset to work from; I'll consider that while I work on the demonstration script.

As for this next week, I'm going to start on fastlscomplex, which may be somewhat time-consuming, particularly as the reference implementation makes broad use of single-linked lists (which I am not sure I can implement in Octave, thus I may need an alternate solution.) As always, commentary is welcome both here and over the various Octave mailing lists.

24 May 2012

A new naming schema and another function

It struck me that naming my functions exactly like the functions in the nuspectral package was a foolish idea, since mine is named lssa; as such, I'm renaming nureal to lsreal, and all other functions similarly. So, I'll handle that tomorrow and all future functions will be appropriately renamed.

In other news, ~~nureal~~ lsreal is now written and has a (mediocre) documentation string (I don't really like it, I'll edit it tomorrow/today). Finally, tomorrow I'll start writing a demonstration suite using the Vostok ice core data (now that I have a Least-Squares transform implemented.)