24 August 2012

Just a final note

Well, the email from Melange could be less ambiguous, but I'm proud to say I completed Summer of Code. As soon as I get information on where to upload the code sample, I'll be happily sending a .tar.gz of my summer's work! This is just a quick note to say thank you, to everyone who was following my project this summer—and to GNU Octave, for making this excellent experience possible.

I will still be contributing to my code and to Octave in general; I'm going to take a little time off to enjoy the rest of the summer, but I'm going to get back to working on the wavelet functions with some more reading when I have time during the semester. I'll also happily respond to questions and updates about the code that's already present, just send them along to me.

Happy computations!

17 August 2012

Correct functions aren't everything. There's always more.

It's been a little long since my last post here; I've mostly been staring at code and thinking about phrasing recently. The best and worst part of documentation strings is making something sensible out of 80 characters—doubly hard when the name of a function is the Lomb Normalized Periodogram. That's 27 characters right there!

Admittedly, not everything has been documentation. With Carlo's help, I've replaced the guts of the lscomplex and lsreal functions with code that can take advantage of vector processing methods; the differences between the results were minimal, and the code became far more manageable to read, so it's now the main form of the functions. (Thanks, Carlo!)

Also from Carlo are two texts on wavelet transforms that I'm going to look into; I've set aside hopes of completing the wavelet functions currently—other than lswaveletcoeff and lscorrcoeff; those two work just fine, however they're limited in their capabilities, as they each only operate over a single window defined by the user. Carlo provided these references:

Abramovich, F., Bailey, T.C. & Sapatinas, T. (2000). Wavelet analysis and its statistical applications. The Statistician, Vol. 49, 1-29. http://www2.ucy.ac.cy/~fanis/Papers/statistician2000.pdf

Guy Nason, Wavelet Methods in Statistics with R, Springer, 2008.
I intend to continue working on this package after GSoC is over—there are several linearly-independent solution finding methods available, and I'd love to implement at least one in the package—and hopefully I can write at least some form of wavelet transform after studying other wavelet transforms.

As a closing note, the biggest change that I've made in the latest release (lssa-0.1.2) is adding all sorts of input checking code to all of the functions; if you give a function the wrong number of variables, it will complain—but I haven't come up with a way to check the window function for those functions that take a window function as input. So be careful with the window function.

24 July 2012

Coming to grips with the wavelet function

First off, OctConf was awesome! Meeting the other GSoC students and all the core developers was an awesome experience, and I hope I can go to more events in the future.

Now then. It is safe to say that the implementation of nurealwavelet() is in fact completely broken and may never have been meant to work. There's an infinite loop, there's an unused pointer, there's so many things wrong that I wonder how this article made it into print. On that note, I'm going to go over the entire function as it is represented in the paper here, and try to understand it. (In the following, the series is represented as a (t,ξ) series of complex values associated with a time series.)

From the context in the paper, it's clear that t has some import versus tk; unfortunately for me, how to use t is not at all clear. I am working from the assumption that t is supposed to be the centre of each window, and as such the proper implementation is to divide the range specified in the input by the number of intervals to determine the width of each interval (and thus the radius of said window being half again the width of the window), thus being able to apply the transform over the whole data set.

The problem that I've been avoiding so far is that the window changes as the frequency drops; a lower frequency requires a much larger window to define it, but this means the number of windows tested decreases, thus a matrix for storing results doesn't make sense, as it will become progressively more full of junk data/excess unused elements. I'm going to look for other wavelet transforms as they're performed with Octave to see what I can learn from others on how to implement this.

17 July 2012

An exciting update on the fast functions

So, fastlscomplex and fastlsreal have had some problems. Lots of problems, in fact, and I wasn't really happy with my code. However, after rewriting most of fastlscomplex, I've got it correct on the first result, while further approximations appear to run afoul of compounding error. At the very least, however, this is far improved from the previous behaviour, and I'll work on getting fastlsreal rewritten.

10 July 2012

Mid-session report

I really haven't updated recently. Then again, most of my recent work has been in fixing minor errors and adding documentation. So, as it stands, none of the tests I have written work if you call test() and the info strings (which I tried to make informative) also don't work. I have no clue why; Jordi thinks it's the negation field and I'm agreeing with him for now. I'm sure I'll find a way to fix it shortly. In the meantime, I'm reading the Introduction to Wavelets article and working on my presentation.

This will all be pretty cool, I think.

30 June 2012

Re-reading (and re-reading again) the wavelet transform

So, a quick update: the lombcoeff function, which implements the Lomb unnormalized transform at one frequency, has been written and has a doc string; there was actually no problem with my code, I needed to reboot Octave; I may try compiling all of the Fortran libraries for x64 (or check if they already are, since I installed a 64-bit only system on the computer in question) and build Octave with 64-bit support; finally, I'll be writing a batch of tests this weekend based around a sum of a few sine curves, just to show how the function works (and I'll apply them across the whole suite, too. Once I've got sample data, it's pretty quick to extend.) The next step there is to write a test script using the Vostok data, and then I need to write a documentation file clearing up the source of that data (collected and documented by JR Petit et al, published in Nature, and available in tab-separated values from the NOAA's paleoclimatology site.) Currently in the /data folder is a CSV export of the .rda archives from the Mathias paper, but I think I may re-export them with a text file explaining each column, since the R export included various nonsense values around the data.

As for the nurealwavelet and fastnurealwavelet commands, I will need to study the code and the paper a few more times (I think I've read it at least five times so far) because neither section actually makes any sense to me. There is no nurealwavelet.R file, so nurealwavelet() in fastnu.c is not actually accessible from R, while the fast- version seems to have other problems. I get the feeling that I'm better off writing using only the paper as a guide when it comes to this transform, and I may not implement the fast- variation, since thus far the "fast" implementations have been orders of magnitude slower than their supposedly slower brethren (although I have a few ideas about accelerating them, mostly by changing how some math is handled and changing to switches from my current control structure. The second is easier to implement, and I will add that change before the more intricate changes I'm planning.)

In short, I'm back to work as usual, except for scratching my head over the wavelet transform. (I'll take some time now to read the paper Mathias cited, An introduction to wavelets, as it is available online and will most likely make my life easier.)

25 June 2012

Examining current results

Well, the methods all run. That's a nice thing. The not-so-nice thing is that their results are not exactly in line with what the R code suggests they should be. Specifically, in the case of lscomplex, all of the R results are compressed into the first 400 values, taking omegamax = 1, 20 octaves and 100 coefficients per octave, while the next 1600 values are all infinitesimally small frequencies which are in essence garbage values ... oh, and my code generates values 283.42 times larger than the R results. (Other oddities are cropping up in the fast methods as well, but I'd rather break it down in the ordinary methods first.)

In short, I'm slightly confused and need to work out what exactly is going wrong here. I have a feeling this will involve the rest of the day.

Just a few thoughts this morning

So, I ran mkoctfile on the fast transforms this morning; fastlscomplex works fine and cleanly (and fast, on this processor!) Unfortunately, I've left some error in fastlsreal still, and it segfaults with a SIGSEGV 11 — Address Boundary Error; I'll figure that out this morning, and I'll check the output of fastlscomplex against the reference R implementation.

UPDATE: Found the problem, fixed it. There was an accidental call to a zero address (whoops.)

UPDATE 2: There's now the excellent question of what to do about the fact that the results produced vary in some areas, occasionally by spectacular amounts. I'm going to do some analysis on the results today, including generating the cosine functions indicated and comparing them to the data.

21 June 2012

Progress? Absolutely.

Well, I'm about to commit some code that compiles without a hitch for fastlsreal, definitely a plus. Once I set up a clean environment to build the necessary R code, I'll generate the results to compare against my .oct files (as I've gotten behind on that front.) Finally, I'll write the documentation for the functions from the past two weeks, and that'll be it!

17 June 2012

Sorry about the radio silence

So, on Wednesday I had a bunch of stuff stolen out of my locker at the gym, including my laptop and various other important things. This halted my work for a little while as I reassembled my life, but I have a computer again and am currently running ./configure to build Octave; I should be completely set up to get to work again later today.

In code-related news, everything I've written up to this point (except for edits to the lombnormcoeff.m file) was stored in the octave-forge svn, so nothing was lost (other than time, money, a sense of privacy, and my general peace of mind) when the thief got at my stuff. That's the upside of offsite storage! To my mentors, I'll be back at work tomorrow, and once I get everything really rebuilt I'll have more code to submit (I'll test fastlscomplex tomorrow and then write fastlsreal as it's just a few additional steps on top of fastlscomplex.)

08 June 2012

Success!

I just uploaded fastlscomplex.cc in its current version, and it works. It generates what looked to me to be correct results, I'll do a few spot-tests later. So far, it doesn't implement any sanity checks, so if run with bad input, it will segfault right now; again, I'll add sanity checks this afternoon. (The other thing is, it's much slower than lscomplex, most likely because I didn't hard-code many optimisations yet. Some functions could be accelerated by changing the number of operations used, for instance. I'll look into reducing the complexity of some sections during the buffer time before OctConf, while I'm also cleaning up documentation.

In the meantime, I'm just happy to have a working function!

07 June 2012

Rewritten code and irregular segfaults

So, I decided on Monday that the best option was to completely rewrite fastlscomplex from scratch; the code took a day to write, and on Wednesday morning, it compiled just fine. One minor problem is that it causes segfaults. (Okay, more than just minor. But it runs, sometimes.) I've figured out where the segfault is, after abusing std::cout as a way to check that it's running. The segfault-causing error is somewhere in the merging code, so I'm going to remove the output I added to check on what output was generated so far, and focus my attention on the merging code so that I can determine why it's failing.

The other problem with this code currently is that it doesn't actually output results. When it does run, it generates a long vector of the correct size, all of whose elements are zero. That's ... not exactly the result I was expecting, so while I know the actual answer-generating code is not killing the process, there's obviously a problem in it somewhere. That's what I'll be focusing on this afternoon, after I find a way to keep the code from randomly going ballistic on me.

02 June 2012

Sometimes macros are more of a pain than you expect.

Right now, I have most of the fastlscomplex method structured the way I want it to be upon completion, however it still requires some tweaking to manage the shift from C to C++ — after a few applications of replace-string, most of the mkoctfile errors vanished (apparently it didn't like a macro that was used in the original code). In short, the code I'm working with still doesn't compile, but it's very close at this point, and I should have a fully operational fastlscomplex by early next week. (Even with this slowdown, I'm still almost a week ahead!)

Once I've gotten fastlscomplex running, I'll spend a little time cleaning up other functions, adding tests and other error-response components, so that I've got a lesser load later on to finish that.

31 May 2012

Towards a full oct-file

Yesterday (well, early this morning) I uploaded a fastlscomplex.cpp file, which is still in its interim state; many of the control structures are taken directly from the fastnu.c source, and I need to adapt how fastlscomplex interacts with what will be the result vector, but it's a solid start. Now to figure out why my test oct-file caused a segfault ...

EDIT I left out return octave_value_list (); at the end (it was just a test that wrote the contents of a complex row vector to stdout), now back to weightier matters.

29 May 2012

Working with Oct files and next steps

I spent yesterday reading about Oct files and examining the code in fastnu.c with an eye towards producing a working oct-file form of fastlscomplex, and I think I've found the proper course to take, in the end. The one painful part of this is that it will involve rewriting fastlscomplex as C++, not C — but, I'd like to point out, this may not be a bad thing.

The greatest driving factor in my decision is the signature on the current fastnucomplex() function:

void fastlscomplex(Real *tptr, Complex *xptr, int *nptr, double *lengthptr, int *ncoeffptr, int *noctaveptr, Real *omegamaxptr, Complex *rp)
which involves quite a few pointers; in the context of the original function, this proved to be no problem, as the R code was designed to integrate with the C in use here, and as such it used more primitive structures like Real[] and Complex[] (Real, Complex being typedef-d parts of the C'99 spec) — however, Octave reveals structures to C++ as objects, so my new code has the following objects:
const RowVector tvals = args(0).row_vector_value(); 
const ComplexVector xvals = args(1).row_vector_value();
which do not behave nicely in the context of pointers. They're much better traversed with methods designed to do so. As such, in the end, I've determined that it'll be of more use to me to rewrite fastlscomplex, and, when I get to it, fastlsreal as C++ code — but I don't expect this to introduce too much overhead into my work, since the bulk of the work is changing the relevant pointers to the proper methods, while altering a few structures into forms that I'd prefer.

I'm starting working on this today, and I'll keep up my SVN commits; I corrected a typo in my commit today, and I'll write some tests/different demo functions when I'm too burned out on the C++.

26 May 2012

Not quite the progress I'd hoped for

I had hoped to have mostly written a demonstration script on Thursday, but an error in my lsreal code (that I should have noticed sooner) took up most of my day; on the other hand, that error is caught, corrected, and I've got a few demonstrations in the works. I also wrote lscomplex, and wrote some documentation for it. As of right now, I don't know what I'll do for a demo function (in the function itself) unless I create a sample dataset to work from; I'll consider that while I work on the demonstration script.

As for this next week, I'm going to start on fastlscomplex, which may be somewhat time-consuming, particularly as the reference implementation makes broad use of single-linked lists (which I am not sure I can implement in Octave, thus I may need an alternate solution.) As always, commentary is welcome both here and over the various Octave mailing lists.

24 May 2012

A new naming schema and another function

It struck me that naming my functions exactly like the functions in the nuspectral package was a foolish idea, since mine is named lssa; as such, I'm renaming nureal to lsreal, and all other functions similarly. So, I'll handle that tomorrow and all future functions will be appropriately renamed.

In other news, nureal lsreal is now written and has a (mediocre) documentation string (I don't really like it, I'll edit it tomorrow/today). Finally, tomorrow I'll start writing a demonstration suite using the Vostok ice core data (now that I have a Least-Squares transform implemented.)

22 May 2012

The day's development(s)

Sorry for the pun. Really, I am. A little.

In actuality, though, an interesting thing came up in today's work; in testing the wavelet correlation coefficient function, I ran into a problem—minor, but large enough to be considered for wishlist work: the time and frequency inputs need to be scalars, and the time series data needs to be vectors. At the least, that's true with the current code strucutre; there are probably ways to get around it, but it will require either loops or completely restructuring the algorithm as it stands (unless there's something I'm not seeing.) This was the lion's share of today's work, then, testing why exactly some operations failed; as it is, internally, if the input vectors are not row vectors, the code will convert them since it's easier than duplicating other operations to handle the column-vector case. On the other hand, the code I've produced I've tested against the R functions, and they all check out!

With the documentation and tests I've written so far, I'm going to plough on forward and start on the next functions; once I've written at least one implementation of the LSSA algorithm (with nureal being the next function up) I will start writing a demo script using the Vostok data, currently stored in lssa/data, at least until I can find a final location to store the files.

Extracting usable data from RDA files and plotting

This post is just to discuss the R code I've determined works to extract data from the binary RDA files included with the nuspectral package as well as plotting commands for some of the graphs included with the article, done in R:

First, as Nir determined, there need to be some modifications to the firstlib.r file, in the nuspectral/R folder:

 .First.lib <- function(lib, pkg) {
       library.dynam('nuspectral', pkg, lib)
       }

To actually extract the data, after running a test, I just ran these commands, while in the folder above /nuspectral:

> load("./nuspectral/data/ch4.rda")
> load("./nuspectral/data/co2.rda")
> load("./nuspectral/data/o18.rda")
> load("./nuspectral/data/deut.rda")
> load("./nuspectral/data/dust.rda")
> write.csv(ch4,file="./nuspectral/data/ch4.csv")
> write.csv(co2,file="./nuspectral/data/co2.csv")
> write.csv(o18,file="./nuspectral/data/o18.csv")
> write.csv(deut,file="./nuspectral/data/deut.csv")
> write.csv(dust,file="./nuspectral/data/dust.csv")

As for plotting the data in question,
> plot(-(co2[[3]]),co2[[4]],type="l")
> plot(-(deut[[2]]),deut[[4]],type="l")
To generate the other plots, I'm waiting until I've written the necessary tools in Octave. Then I'll tackle a 2D plotting method.

21 May 2012

A short update


As of right now, I've finished writing all three functions I had planned for this week (yes, more than a little ahead of schedule; then again, I may well need that time and then some when it comes to the C functions.) Of more interest to me currently is learning more about testing mechanisms and documentation methods. I went to a cafe to read about that ... but their router was down, for some reason or another. As such, I'll be reading about that tonight, and I will probably do an SVN commit shortly just to have the code I've generated so far in the repository, if anyone wants to have a look at it. (As I noted in one of the files, I've kept the variable names from the original R functions, except where changing the variable made it more comprehensible.)

(I am somewhat distraught by having an O(2n) complexity on a few functions; if anyone has a better solution than my truth vector workaround (in nuwaveletcoeff, nucorrcoeff) please leave a note in the comments!)

Beyond that, I'll look to add the CSV versions of the Vostok data to the folder also, as I'll be building a few tests based on it, as well as example code. There does not seem to be a normal method for including data files so far, so I'm going to see what there is to be done about that, and also ask some questions.

In short, that's where I am as of 16h00, 21 may — (and to everyone in Canada, happy Victoria Day (or Journée des Patriotes for anyone in Québec)!)

Start of coding!

Well, maybe. I must admit, I already wrote and tested cubicwgt; I was so excited to start that on Friday I just put it together and tested it. (I'll do a better test today, I just whipped up a small data set off the top of my head.) This really makes my first task learning to build tests, which I'll handle once it's morning.

17 May 2012

A wishlist, of sorts

In Assembling a roadmap, I put together an order for twelve functions to implement. Here, in no particular order, is a wishlist of functions that are related to those actually in the nuspectral package, but which are not in the package and are wholly additional.

  • Statistical confidence interval:
    • nucomplex
    • nureal
    • fastnucomplex
    • fastnureal
    • nuwavelet
    • fastnuwavelet
(Further suggestions are welcome; there may not be time for all of them during the summer, but I'll use this list to continue development.)

Week-by-week timeline

So, after having had a fairly crazy week, here's my expected week-by-week timeline; note that documentation and testing will be ongoing, even though there is a buffer zone specifically for that. That buffer time is also in case something takes longer than I expect, or other problems/discoveries crop up. In short, the buffer zone is to catch problems and make sure I've got enough room for everything, but if I'm up to date I will stop and document, write tests, and do background work for some wishlist tasks (see the post I'll put up after this for wishlist details.)

  • 21 may: Coding begins
    • 21 may - 25 may: cubicwgt, nucorrcoeff, nuwaveletcoeff
    • 28 may - 1 jun: nureal, nucomplex, start fastnucomplex
    • 4 jun - 8 jun: fastnucomplex
    • 11 jun - 15 jun: fastnucomplex, lombnormcoeff
    • 18 jun - 22 jun: fastnureal
    • 25 jun - 29 jun: nuwavelet, lombcoeff
    • 2  jul - 6 jul: fastnuwavelet
  • 9 jul: Midterm evaluation submission window opens
    • 9 jul - 13 jul: Buffer zone for catching up on documentation, writing OctConf presentation.
  • 13 jul: Midterm evaluation submission window closes
    • 16 jul - 20 jul: [OctConf] — Talk presented at some point
    • 22 jul - 27 jul: nurealcoeff, package and release prototype package. Begin receiving community comments and making community edits.
    • 30 jul - 10 aug: Hopefully, wishlist time. Research on weekends.
  • 13 aug: Suggested "pencils down" date
    • 13 aug - 20 aug: code revision
  • 20 aug: Firm "pencils down" date; final evaluation submission window opens
  • 24 aug: Final evaluation submission window closes
(Other after-coding stuff happens.)

Note that the order has been preserved from Assembling a Roadmap, as that order seemed perfectly rational to me. Also, depending on what my further code reading turns up, nuwavelet may not be done (as I noted in Roadmap) since there is no R function that would appear to use it.

11 May 2012

State of research and decisions

Recently, I fielded an email which suggested I look at Multi-Taper spectral analysis as a part of my project; I've done some reading about it and found a few useful pieces, such as an implementation in ANSI-C from 1995 by Jonathan M. Lees and Jeffrey Park, as well as an associated article hosted on Yale's servers here. However, as this is a wholly independent method, it falls outside the scope of my current project (but I'll add it to a wishlist for further development at another point.)

08 May 2012

Assembling a roadmap

So, the first task in creating a roadmap was considering dependencies within the functions I'll be implementing; another consideration was the current form of the code—is it R or C? As such, I've established this set of dependencies:

R-only code:

  • cubicwgt
    • nucorrcoeff
    • nuwaveletcoeff
  • lombnormcoeff
  • lombcoeff
  • nurealcoeff
C implemented code:

  • nureal
  • nucomplex
  • fastnucomplex
    • fastnureal
  • nuwavelet
  • fastnuwavelet
Now, nureal, nucomplex, and nuwavelet are all fairly short functions, for which probably far less effort will be necessary in order to implement them compared to their fast versions; all of the R-only functions are even shorter; as such, I'm proposing to complete these functions in the following order:

  1. cubicwgt
  2. nucorrcoeff
  3. nuwaveletcoeff
  4. nureal
  5. nucomplex
  6. fastnucomplex
  7. lombnormcoeff
  8. fastnureal
  9. nuwavelet
  10. lombcoeff
  11. fastnuwavelet
  12. nurealcoeff
This distribution of tasks is slightly tail-weighted, as it features three R-only functions right off the bat, but  the structure of all three fast functions is fairly similar at the core (in fact, I will examine them a little more closely to determine if they would be better implemented with shared code, since they all have a similar divide-and-conquer structure.) This leads me to believe that it is not outlandish to consider being somewhere around 7 or 8 by the time midterms come around, with all previous functions completed and tested against their reference implementations, as well as documented.

As a postscript, there is no R function wrapper for nuwavelet; it exists only in the C code, and is not referred to elsewhere. I'll look into that a bit more closely before starting on it.

06 May 2012

Introductory Post

So, hi, I'm Ben Lewis, and this is the blog I'll be using to track my work for Google Summer of Code 2012. In particular, I'm writing a package that implements Least-squares spectral analysis for Octave-Forge, and I'm basing the package off of Adolf Mathias' R and C implementation; the full project page is available on Google Melange.

As of now, I've written up a roadmap of the methods to be implemented, I've read (and re-read, and re-re-read) the article involved and I've commented approximately a third of the associated C source, as well as tested the R code included (I still need to find a way to produce a version of the wavelet graphs in R, which according to the authors were not produced with it) and I'll post the sample code to produce identical results in further posts.

In terms of next steps over the next day or two, I'll be creating an lssa folder in the Octave-Forge svn tree, and I'll get a few files in place just to document where the project will be leading once coding starts. I will also take some time to follow up on the excellent recommendations from Michael Godfrey, and read the articles he suggested.