31 May 2012

Towards a full oct-file

Yesterday (well, early this morning) I uploaded a fastlscomplex.cpp file, which is still in its interim state; many of the control structures are taken directly from the fastnu.c source, and I need to adapt how fastlscomplex interacts with what will be the result vector, but it's a solid start. Now to figure out why my test oct-file caused a segfault ...

EDIT I left out return octave_value_list (); at the end (it was just a test that wrote the contents of a complex row vector to stdout), now back to weightier matters.

29 May 2012

Working with Oct files and next steps

I spent yesterday reading about Oct files and examining the code in fastnu.c with an eye towards producing a working oct-file form of fastlscomplex, and I think I've found the proper course to take, in the end. The one painful part of this is that it will involve rewriting fastlscomplex as C++, not C — but, I'd like to point out, this may not be a bad thing.

The greatest driving factor in my decision is the signature on the current fastnucomplex() function:

void fastlscomplex(Real *tptr, Complex *xptr, int *nptr, double *lengthptr, int *ncoeffptr, int *noctaveptr, Real *omegamaxptr, Complex *rp)
which involves quite a few pointers; in the context of the original function, this proved to be no problem, as the R code was designed to integrate with the C in use here, and as such it used more primitive structures like Real[] and Complex[] (Real, Complex being typedef-d parts of the C'99 spec) — however, Octave reveals structures to C++ as objects, so my new code has the following objects:
const RowVector tvals = args(0).row_vector_value(); 
const ComplexVector xvals = args(1).row_vector_value();
which do not behave nicely in the context of pointers. They're much better traversed with methods designed to do so. As such, in the end, I've determined that it'll be of more use to me to rewrite fastlscomplex, and, when I get to it, fastlsreal as C++ code — but I don't expect this to introduce too much overhead into my work, since the bulk of the work is changing the relevant pointers to the proper methods, while altering a few structures into forms that I'd prefer.

I'm starting working on this today, and I'll keep up my SVN commits; I corrected a typo in my commit today, and I'll write some tests/different demo functions when I'm too burned out on the C++.

26 May 2012

Not quite the progress I'd hoped for

I had hoped to have mostly written a demonstration script on Thursday, but an error in my lsreal code (that I should have noticed sooner) took up most of my day; on the other hand, that error is caught, corrected, and I've got a few demonstrations in the works. I also wrote lscomplex, and wrote some documentation for it. As of right now, I don't know what I'll do for a demo function (in the function itself) unless I create a sample dataset to work from; I'll consider that while I work on the demonstration script.

As for this next week, I'm going to start on fastlscomplex, which may be somewhat time-consuming, particularly as the reference implementation makes broad use of single-linked lists (which I am not sure I can implement in Octave, thus I may need an alternate solution.) As always, commentary is welcome both here and over the various Octave mailing lists.

24 May 2012

A new naming schema and another function

It struck me that naming my functions exactly like the functions in the nuspectral package was a foolish idea, since mine is named lssa; as such, I'm renaming nureal to lsreal, and all other functions similarly. So, I'll handle that tomorrow and all future functions will be appropriately renamed.

In other news, nureal lsreal is now written and has a (mediocre) documentation string (I don't really like it, I'll edit it tomorrow/today). Finally, tomorrow I'll start writing a demonstration suite using the Vostok ice core data (now that I have a Least-Squares transform implemented.)

22 May 2012

The day's development(s)

Sorry for the pun. Really, I am. A little.

In actuality, though, an interesting thing came up in today's work; in testing the wavelet correlation coefficient function, I ran into a problem—minor, but large enough to be considered for wishlist work: the time and frequency inputs need to be scalars, and the time series data needs to be vectors. At the least, that's true with the current code strucutre; there are probably ways to get around it, but it will require either loops or completely restructuring the algorithm as it stands (unless there's something I'm not seeing.) This was the lion's share of today's work, then, testing why exactly some operations failed; as it is, internally, if the input vectors are not row vectors, the code will convert them since it's easier than duplicating other operations to handle the column-vector case. On the other hand, the code I've produced I've tested against the R functions, and they all check out!

With the documentation and tests I've written so far, I'm going to plough on forward and start on the next functions; once I've written at least one implementation of the LSSA algorithm (with nureal being the next function up) I will start writing a demo script using the Vostok data, currently stored in lssa/data, at least until I can find a final location to store the files.

Extracting usable data from RDA files and plotting

This post is just to discuss the R code I've determined works to extract data from the binary RDA files included with the nuspectral package as well as plotting commands for some of the graphs included with the article, done in R:

First, as Nir determined, there need to be some modifications to the firstlib.r file, in the nuspectral/R folder:

 .First.lib <- function(lib, pkg) {
       library.dynam('nuspectral', pkg, lib)
       }

To actually extract the data, after running a test, I just ran these commands, while in the folder above /nuspectral:

> load("./nuspectral/data/ch4.rda")
> load("./nuspectral/data/co2.rda")
> load("./nuspectral/data/o18.rda")
> load("./nuspectral/data/deut.rda")
> load("./nuspectral/data/dust.rda")
> write.csv(ch4,file="./nuspectral/data/ch4.csv")
> write.csv(co2,file="./nuspectral/data/co2.csv")
> write.csv(o18,file="./nuspectral/data/o18.csv")
> write.csv(deut,file="./nuspectral/data/deut.csv")
> write.csv(dust,file="./nuspectral/data/dust.csv")

As for plotting the data in question,
> plot(-(co2[[3]]),co2[[4]],type="l")
> plot(-(deut[[2]]),deut[[4]],type="l")
To generate the other plots, I'm waiting until I've written the necessary tools in Octave. Then I'll tackle a 2D plotting method.

21 May 2012

A short update


As of right now, I've finished writing all three functions I had planned for this week (yes, more than a little ahead of schedule; then again, I may well need that time and then some when it comes to the C functions.) Of more interest to me currently is learning more about testing mechanisms and documentation methods. I went to a cafe to read about that ... but their router was down, for some reason or another. As such, I'll be reading about that tonight, and I will probably do an SVN commit shortly just to have the code I've generated so far in the repository, if anyone wants to have a look at it. (As I noted in one of the files, I've kept the variable names from the original R functions, except where changing the variable made it more comprehensible.)

(I am somewhat distraught by having an O(2n) complexity on a few functions; if anyone has a better solution than my truth vector workaround (in nuwaveletcoeff, nucorrcoeff) please leave a note in the comments!)

Beyond that, I'll look to add the CSV versions of the Vostok data to the folder also, as I'll be building a few tests based on it, as well as example code. There does not seem to be a normal method for including data files so far, so I'm going to see what there is to be done about that, and also ask some questions.

In short, that's where I am as of 16h00, 21 may — (and to everyone in Canada, happy Victoria Day (or Journée des Patriotes for anyone in Québec)!)

Start of coding!

Well, maybe. I must admit, I already wrote and tested cubicwgt; I was so excited to start that on Friday I just put it together and tested it. (I'll do a better test today, I just whipped up a small data set off the top of my head.) This really makes my first task learning to build tests, which I'll handle once it's morning.

17 May 2012

A wishlist, of sorts

In Assembling a roadmap, I put together an order for twelve functions to implement. Here, in no particular order, is a wishlist of functions that are related to those actually in the nuspectral package, but which are not in the package and are wholly additional.

  • Statistical confidence interval:
    • nucomplex
    • nureal
    • fastnucomplex
    • fastnureal
    • nuwavelet
    • fastnuwavelet
(Further suggestions are welcome; there may not be time for all of them during the summer, but I'll use this list to continue development.)

Week-by-week timeline

So, after having had a fairly crazy week, here's my expected week-by-week timeline; note that documentation and testing will be ongoing, even though there is a buffer zone specifically for that. That buffer time is also in case something takes longer than I expect, or other problems/discoveries crop up. In short, the buffer zone is to catch problems and make sure I've got enough room for everything, but if I'm up to date I will stop and document, write tests, and do background work for some wishlist tasks (see the post I'll put up after this for wishlist details.)

  • 21 may: Coding begins
    • 21 may - 25 may: cubicwgt, nucorrcoeff, nuwaveletcoeff
    • 28 may - 1 jun: nureal, nucomplex, start fastnucomplex
    • 4 jun - 8 jun: fastnucomplex
    • 11 jun - 15 jun: fastnucomplex, lombnormcoeff
    • 18 jun - 22 jun: fastnureal
    • 25 jun - 29 jun: nuwavelet, lombcoeff
    • 2  jul - 6 jul: fastnuwavelet
  • 9 jul: Midterm evaluation submission window opens
    • 9 jul - 13 jul: Buffer zone for catching up on documentation, writing OctConf presentation.
  • 13 jul: Midterm evaluation submission window closes
    • 16 jul - 20 jul: [OctConf] — Talk presented at some point
    • 22 jul - 27 jul: nurealcoeff, package and release prototype package. Begin receiving community comments and making community edits.
    • 30 jul - 10 aug: Hopefully, wishlist time. Research on weekends.
  • 13 aug: Suggested "pencils down" date
    • 13 aug - 20 aug: code revision
  • 20 aug: Firm "pencils down" date; final evaluation submission window opens
  • 24 aug: Final evaluation submission window closes
(Other after-coding stuff happens.)

Note that the order has been preserved from Assembling a Roadmap, as that order seemed perfectly rational to me. Also, depending on what my further code reading turns up, nuwavelet may not be done (as I noted in Roadmap) since there is no R function that would appear to use it.

11 May 2012

State of research and decisions

Recently, I fielded an email which suggested I look at Multi-Taper spectral analysis as a part of my project; I've done some reading about it and found a few useful pieces, such as an implementation in ANSI-C from 1995 by Jonathan M. Lees and Jeffrey Park, as well as an associated article hosted on Yale's servers here. However, as this is a wholly independent method, it falls outside the scope of my current project (but I'll add it to a wishlist for further development at another point.)

08 May 2012

Assembling a roadmap

So, the first task in creating a roadmap was considering dependencies within the functions I'll be implementing; another consideration was the current form of the code—is it R or C? As such, I've established this set of dependencies:

R-only code:

  • cubicwgt
    • nucorrcoeff
    • nuwaveletcoeff
  • lombnormcoeff
  • lombcoeff
  • nurealcoeff
C implemented code:

  • nureal
  • nucomplex
  • fastnucomplex
    • fastnureal
  • nuwavelet
  • fastnuwavelet
Now, nureal, nucomplex, and nuwavelet are all fairly short functions, for which probably far less effort will be necessary in order to implement them compared to their fast versions; all of the R-only functions are even shorter; as such, I'm proposing to complete these functions in the following order:

  1. cubicwgt
  2. nucorrcoeff
  3. nuwaveletcoeff
  4. nureal
  5. nucomplex
  6. fastnucomplex
  7. lombnormcoeff
  8. fastnureal
  9. nuwavelet
  10. lombcoeff
  11. fastnuwavelet
  12. nurealcoeff
This distribution of tasks is slightly tail-weighted, as it features three R-only functions right off the bat, but  the structure of all three fast functions is fairly similar at the core (in fact, I will examine them a little more closely to determine if they would be better implemented with shared code, since they all have a similar divide-and-conquer structure.) This leads me to believe that it is not outlandish to consider being somewhere around 7 or 8 by the time midterms come around, with all previous functions completed and tested against their reference implementations, as well as documented.

As a postscript, there is no R function wrapper for nuwavelet; it exists only in the C code, and is not referred to elsewhere. I'll look into that a bit more closely before starting on it.

06 May 2012

Introductory Post

So, hi, I'm Ben Lewis, and this is the blog I'll be using to track my work for Google Summer of Code 2012. In particular, I'm writing a package that implements Least-squares spectral analysis for Octave-Forge, and I'm basing the package off of Adolf Mathias' R and C implementation; the full project page is available on Google Melange.

As of now, I've written up a roadmap of the methods to be implemented, I've read (and re-read, and re-re-read) the article involved and I've commented approximately a third of the associated C source, as well as tested the R code included (I still need to find a way to produce a version of the wavelet graphs in R, which according to the authors were not produced with it) and I'll post the sample code to produce identical results in further posts.

In terms of next steps over the next day or two, I'll be creating an lssa folder in the Octave-Forge svn tree, and I'll get a few files in place just to document where the project will be leading once coding starts. I will also take some time to follow up on the excellent recommendations from Michael Godfrey, and read the articles he suggested.