Coding in the Sun: GNU Octave

Showing posts with label GNU Octave. Show all posts

24 July 2012

Coming to grips with the wavelet function

First off, OctConf was awesome! Meeting the other GSoC students and all the core developers was an awesome experience, and I hope I can go to more events in the future.

Now then. It is safe to say that the implementation of nurealwavelet() is in fact completely broken and may never have been meant to work. There's an infinite loop, there's an unused pointer, there's so many things wrong that I wonder how this article made it into print. On that note, I'm going to go over the entire function as it is represented in the paper here, and try to understand it. (In the following, the series is represented as a (t,ξ) series of complex values associated with a time series.)

$\bg_white \zeta _ { t, \omega } = \frac{ \sum _ { k \in K } w ( \sigma \omega ( t _ k - t ) ) e ^ { - i \omega ( t _ k - t ) } \xi _ k }{ \sum _ { k \in K } w ( \sigma \omega ( t _ k - t ) ) }$ From the context in the paper, it's clear that t has some import versus t_k; unfortunately for me, how to use t is not at all clear. I am working from the assumption that t is supposed to be the centre of each window, and as such the proper implementation is to divide the range specified in the input by the number of intervals to determine the width of each interval (and thus the radius of said window being half again the width of the window), thus being able to apply the transform over the whole data set.

The problem that I've been avoiding so far is that the window changes as the frequency drops; a lower frequency requires a much larger window to define it, but this means the number of windows tested decreases, thus a matrix for storing results doesn't make sense, as it will become progressively more full of junk data/excess unused elements. I'm going to look for other wavelet transforms as they're performed with Octave to see what I can learn from others on how to implement this.

17 July 2012

An exciting update on the fast functions

So, fastlscomplex and fastlsreal have had some problems. Lots of problems, in fact, and I wasn't really happy with my code. However, after rewriting most of fastlscomplex, I've got it correct on the first result, while further approximations appear to run afoul of compounding error. At the very least, however, this is far improved from the previous behaviour, and I'll work on getting fastlsreal rewritten.

10 July 2012

Mid-session report

I really haven't updated recently. Then again, most of my recent work has been in fixing minor errors and adding documentation. So, as it stands, none of the tests I have written work if you call test() and the info strings (which I tried to make informative) also don't work. I have no clue why; Jordi thinks it's the negation field and I'm agreeing with him for now. I'm sure I'll find a way to fix it shortly. In the meantime, I'm reading the Introduction to Wavelets article and working on my presentation.

This will all be pretty cool, I think.

30 June 2012

Re-reading (and re-reading again) the wavelet transform

So, a quick update: the lombcoeff function, which implements the Lomb unnormalized transform at one frequency, has been written and has a doc string; there was actually no problem with my code, I needed to reboot Octave; I may try compiling all of the Fortran libraries for x64 (or check if they already are, since I installed a 64-bit only system on the computer in question) and build Octave with 64-bit support; finally, I'll be writing a batch of tests this weekend based around a sum of a few sine curves, just to show how the function works (and I'll apply them across the whole suite, too. Once I've got sample data, it's pretty quick to extend.) The next step there is to write a test script using the Vostok data, and then I need to write a documentation file clearing up the source of that data (collected and documented by JR Petit et al, published in Nature, and available in tab-separated values from the NOAA's paleoclimatology site.) Currently in the /data folder is a CSV export of the .rda archives from the Mathias paper, but I think I may re-export them with a text file explaining each column, since the R export included various nonsense values around the data.

As for the nurealwavelet and fastnurealwavelet commands, I will need to study the code and the paper a few more times (I think I've read it at least five times so far) because neither section actually makes any sense to me. There is no nurealwavelet.R file, so nurealwavelet() in fastnu.c is not actually accessible from R, while the fast- version seems to have other problems. I get the feeling that I'm better off writing using only the paper as a guide when it comes to this transform, and I may not implement the fast- variation, since thus far the "fast" implementations have been orders of magnitude slower than their supposedly slower brethren (although I have a few ideas about accelerating them, mostly by changing how some math is handled and changing to switches from my current control structure. The second is easier to implement, and I will add that change before the more intricate changes I'm planning.)

In short, I'm back to work as usual, except for scratching my head over the wavelet transform. (I'll take some time now to read the paper Mathias cited, An introduction to wavelets, as it is available online and will most likely make my life easier.)

25 June 2012

Examining current results

Well, the methods all run. That's a nice thing. The not-so-nice thing is that their results are not exactly in line with what the R code suggests they should be. Specifically, in the case of lscomplex, all of the R results are compressed into the first 400 values, taking omegamax = 1, 20 octaves and 100 coefficients per octave, while the next 1600 values are all infinitesimally small frequencies which are in essence garbage values ... oh, and my code generates values 283.42 times larger than the R results. (Other oddities are cropping up in the fast methods as well, but I'd rather break it down in the ordinary methods first.)

In short, I'm slightly confused and need to work out what exactly is going wrong here. I have a feeling this will involve the rest of the day.

Just a few thoughts this morning

So, I ran mkoctfile on the fast transforms this morning; fastlscomplex works fine and cleanly (and fast, on this processor!) Unfortunately, I've left some error in fastlsreal still, and it segfaults with a SIGSEGV 11 — Address Boundary Error; I'll figure that out this morning, and I'll check the output of fastlscomplex against the reference R implementation.

UPDATE: Found the problem, fixed it. There was an accidental call to a zero address (whoops.)

UPDATE 2: There's now the excellent question of what to do about the fact that the results produced vary in some areas, occasionally by spectacular amounts. I'm going to do some analysis on the results today, including generating the cosine functions indicated and comparing them to the data.

21 June 2012

Progress? Absolutely.

Well, I'm about to commit some code that compiles without a hitch for fastlsreal, definitely a plus. Once I set up a clean environment to build the necessary R code, I'll generate the results to compare against my .oct files (as I've gotten behind on that front.) Finally, I'll write the documentation for the functions from the past two weeks, and that'll be it!

08 June 2012

Success!

I just uploaded fastlscomplex.cc in its current version, and it works. It generates what looked to me to be correct results, I'll do a few spot-tests later. So far, it doesn't implement any sanity checks, so if run with bad input, it will segfault right now; again, I'll add sanity checks this afternoon. (The other thing is, it's much slower than lscomplex, most likely because I didn't hard-code many optimisations yet. Some functions could be accelerated by changing the number of operations used, for instance. I'll look into reducing the complexity of some sections during the buffer time before OctConf, while I'm also cleaning up documentation.

In the meantime, I'm just happy to have a working function!

07 June 2012

Rewritten code and irregular segfaults

So, I decided on Monday that the best option was to completely rewrite fastlscomplex from scratch; the code took a day to write, and on Wednesday morning, it compiled just fine. One minor problem is that it causes segfaults. (Okay, more than just minor. But it runs, sometimes.) I've figured out where the segfault is, after abusing std::cout as a way to check that it's running. The segfault-causing error is somewhere in the merging code, so I'm going to remove the output I added to check on what output was generated so far, and focus my attention on the merging code so that I can determine why it's failing.

The other problem with this code currently is that it doesn't actually output results. When it does run, it generates a long vector of the correct size, all of whose elements are zero. That's ... not exactly the result I was expecting, so while I know the actual answer-generating code is not killing the process, there's obviously a problem in it somewhere. That's what I'll be focusing on this afternoon, after I find a way to keep the code from randomly going ballistic on me.

02 June 2012

Sometimes macros are more of a pain than you expect.

Right now, I have most of the fastlscomplex method structured the way I want it to be upon completion, however it still requires some tweaking to manage the shift from C to C++ — after a few applications of replace-string, most of the mkoctfile errors vanished (apparently it didn't like a macro that was used in the original code). In short, the code I'm working with still doesn't compile, but it's very close at this point, and I should have a fully operational fastlscomplex by early next week. (Even with this slowdown, I'm still almost a week ahead!)

Once I've gotten fastlscomplex running, I'll spend a little time cleaning up other functions, adding tests and other error-response components, so that I've got a lesser load later on to finish that.

31 May 2012

Towards a full oct-file

Yesterday (well, early this morning) I uploaded a fastlscomplex.cpp file, which is still in its interim state; many of the control structures are taken directly from the fastnu.c source, and I need to adapt how fastlscomplex interacts with what will be the result vector, but it's a solid start. Now to figure out why my test oct-file caused a segfault ...

EDIT I left out return octave_value_list (); at the end (it was just a test that wrote the contents of a complex row vector to stdout), now back to weightier matters.

29 May 2012

Working with Oct files and next steps

I spent yesterday reading about Oct files and examining the code in fastnu.c with an eye towards producing a working oct-file form of fastlscomplex, and I think I've found the proper course to take, in the end. The one painful part of this is that it will involve rewriting fastlscomplex as C++, not C — but, I'd like to point out, this may not be a bad thing.

The greatest driving factor in my decision is the signature on the current fastnucomplex() function:

void fastlscomplex(Real *tptr, Complex *xptr, int *nptr, double *lengthptr, int *ncoeffptr, int *noctaveptr, Real *omegamaxptr, Complex *rp)

which involves quite a few pointers; in the context of the original function, this proved to be no problem, as the R code was designed to integrate with the C in use here, and as such it used more primitive structures like Real[] and Complex[] (Real, Complex being typedef-d parts of the C'99 spec) — however, Octave reveals structures to C++ as objects, so my new code has the following objects:

const RowVector tvals = args(0).row_vector_value(); 

const ComplexVector xvals = args(1).row_vector_value();

which do not behave nicely in the context of pointers. They're much better traversed with methods designed to do so. As such, in the end, I've determined that it'll be of more use to me to rewrite fastlscomplex, and, when I get to it, fastlsreal as C++ code — but I don't expect this to introduce too much overhead into my work, since the bulk of the work is changing the relevant pointers to the proper methods, while altering a few structures into forms that I'd prefer.

I'm starting working on this today, and I'll keep up my SVN commits; I corrected a typo in my commit today, and I'll write some tests/different demo functions when I'm too burned out on the C++.

26 May 2012

Not quite the progress I'd hoped for

I had hoped to have mostly written a demonstration script on Thursday, but an error in my lsreal code (that I should have noticed sooner) took up most of my day; on the other hand, that error is caught, corrected, and I've got a few demonstrations in the works. I also wrote lscomplex, and wrote some documentation for it. As of right now, I don't know what I'll do for a demo function (in the function itself) unless I create a sample dataset to work from; I'll consider that while I work on the demonstration script.

As for this next week, I'm going to start on fastlscomplex, which may be somewhat time-consuming, particularly as the reference implementation makes broad use of single-linked lists (which I am not sure I can implement in Octave, thus I may need an alternate solution.) As always, commentary is welcome both here and over the various Octave mailing lists.

24 May 2012

A new naming schema and another function

It struck me that naming my functions exactly like the functions in the nuspectral package was a foolish idea, since mine is named lssa; as such, I'm renaming nureal to lsreal, and all other functions similarly. So, I'll handle that tomorrow and all future functions will be appropriately renamed.

In other news, ~~nureal~~ lsreal is now written and has a (mediocre) documentation string (I don't really like it, I'll edit it tomorrow/today). Finally, tomorrow I'll start writing a demonstration suite using the Vostok ice core data (now that I have a Least-Squares transform implemented.)

22 May 2012

The day's development(s)

Sorry for the pun. Really, I am. A little.

In actuality, though, an interesting thing came up in today's work; in testing the wavelet correlation coefficient function, I ran into a problem—minor, but large enough to be considered for wishlist work: the time and frequency inputs need to be scalars, and the time series data needs to be vectors. At the least, that's true with the current code strucutre; there are probably ways to get around it, but it will require either loops or completely restructuring the algorithm as it stands (unless there's something I'm not seeing.) This was the lion's share of today's work, then, testing why exactly some operations failed; as it is, internally, if the input vectors are not row vectors, the code will convert them since it's easier than duplicating other operations to handle the column-vector case. On the other hand, the code I've produced I've tested against the R functions, and they all check out!

With the documentation and tests I've written so far, I'm going to plough on forward and start on the next functions; once I've written at least one implementation of the LSSA algorithm (with nureal being the next function up) I will start writing a demo script using the Vostok data, currently stored in lssa/data, at least until I can find a final location to store the files.

Extracting usable data from RDA files and plotting

This post is just to discuss the R code I've determined works to extract data from the binary RDA files included with the nuspectral package as well as plotting commands for some of the graphs included with the article, done in R:

First, as Nir determined, there need to be some modifications to the firstlib.r file, in the nuspectral/R folder:

 .First.lib <- function(lib, pkg) {
       library.dynam('nuspectral', pkg, lib)
       }

To actually extract the data, after running a test, I just ran these commands, while in the folder above /nuspectral:

> load("./nuspectral/data/ch4.rda")
> load("./nuspectral/data/co2.rda")
> load("./nuspectral/data/o18.rda")
> load("./nuspectral/data/deut.rda")
> load("./nuspectral/data/dust.rda")
> write.csv(ch4,file="./nuspectral/data/ch4.csv")
> write.csv(co2,file="./nuspectral/data/co2.csv")
> write.csv(o18,file="./nuspectral/data/o18.csv")
> write.csv(deut,file="./nuspectral/data/deut.csv")
> write.csv(dust,file="./nuspectral/data/dust.csv")

As for plotting the data in question,

> plot(-(co2[[3]]),co2[[4]],type="l")
> plot(-(deut[[2]]),deut[[4]],type="l")

To generate the other plots, I'm waiting until I've written the necessary tools in Octave. Then I'll tackle a 2D plotting method.

21 May 2012

A short update

As of right now, I've finished writing all three functions I had planned for this week (yes, more than a little ahead of schedule; then again, I may well need that time and then some when it comes to the C functions.) Of more interest to me currently is learning more about testing mechanisms and documentation methods. I went to a cafe to read about that ... but their router was down, for some reason or another. As such, I'll be reading about that tonight, and I will probably do an SVN commit shortly just to have the code I've generated so far in the repository, if anyone wants to have a look at it. (As I noted in one of the files, I've kept the variable names from the original R functions, except where changing the variable made it more comprehensible.)

(I am somewhat distraught by having an O(2n) complexity on a few functions; if anyone has a better solution than my truth vector workaround (in nuwaveletcoeff, nucorrcoeff) please leave a note in the comments!)

Beyond that, I'll look to add the CSV versions of the Vostok data to the folder also, as I'll be building a few tests based on it, as well as example code. There does not seem to be a normal method for including data files so far, so I'm going to see what there is to be done about that, and also ask some questions.

In short, that's where I am as of 16h00, 21 may — (and to everyone in Canada, happy Victoria Day (or Journée des Patriotes for anyone in Québec)!)

Start of coding!

Well, maybe. I must admit, I already wrote and tested cubicwgt; I was so excited to start that on Friday I just put it together and tested it. (I'll do a better test today, I just whipped up a small data set off the top of my head.) This really makes my first task learning to build tests, which I'll handle once it's morning.

17 May 2012

A wishlist, of sorts

In Assembling a roadmap, I put together an order for twelve functions to implement. Here, in no particular order, is a wishlist of functions that are related to those actually in the nuspectral package, but which are not in the package and are wholly additional.

Statistical confidence interval:

nucomplex
nureal
fastnucomplex
fastnureal
nuwavelet
fastnuwavelet

(Further suggestions are welcome; there may not be time for all of them during the summer, but I'll use this list to continue development.)

11 May 2012

State of research and decisions

Recently, I fielded an email which suggested I look at Multi-Taper spectral analysis as a part of my project; I've done some reading about it and found a few useful pieces, such as an implementation in ANSI-C from 1995 by Jonathan M. Lees and Jeffrey Park, as well as an associated article hosted on Yale's servers here. However, as this is a wholly independent method, it falls outside the scope of my current project (but I'll add it to a wishlist for further development at another point.)