Sunday, March 10, 2013

Using ROS to save periodic snapshots from Asus Xtion camera

This is a small program that uses ROS (Groovy) to take snapshots from your openni camera and saves them in .ppm format (P3).
Internally, it is just a subscriber to the camera node. It is probably best to have worked through the ROS beginner tutorials before following this.

The data coming from the camera is in this format: http://www.ros.org/doc/api/sensor_msgs/html/msg/Image.html
We want to write it in ppm format: http://en.wikipedia.org/wiki/Netpbm_format#PPM_example

We will take the time stamp of the image and generate the file name as YYYYMMDDThhmms.ppm, with a small detour through posix time because ros::Time can't be converted directly to string.
More on posix time here: http://www.boost.org/doc/libs/1_53_0/doc/html/date_time/posix_time.html

First, create and build the package for this project (I've called it a very uninspired "screenshot", and by the time I realized I was not in fact taking screenshots I was almost finished, so there you have it), adding as dependency image_transport (you can also add it later via the manifest file):
$ roscreate-pkg screenshot std_msgs roscpp image_transport
$ rosmake screenshot
Then create the subscriber (mine is called "cam_listener"), not forgetting to add it to Cmakelists.txt:
$ rosbuild_add_executable(cam_listener src/cam_listener.cpp)
The main() function in cam_listener.cpp looks like this:
int main(int argc, char **argv)
{
 ros::init(argc, argv, "cam_listener");
 ros::NodeHandle n;
 
 image_transport::ImageTransport it(n);
 image_transport::Subscriber sub = it.subscribe("camera/rgb/image_color", 1, imageCallback);

 while (true)
 {
  ros::spinOnce();
  ros::Duration(15).sleep(); // Take screenshots every 15 seconds
 }
 
  return 0;
}
We use image_transport to create a bit more specialized subscriber, in order to get data from the camera. The callback is named "imageCallback", and it looks like this:
void imageCallback(const sensor_msgs::ImageConstPtr& msg)
{
    // Writes the raw image data in a PPM formatted file
    
    // We convert the time stamp that comes in the header of the data from the camera to posix time, in order to use it as part of the ppm filename
    ros::Time stamp = msg->header.stamp;
    boost::posix_time::ptime posix_time = stamp.toBoost();
    // We only take part of the posix time
    std::string filename = boost::posix_time::to_iso_string(posix_time).substr(0, 14) + ".ppm"; 
    std::ofstream f(filename.c_str());
    if (!f)
    {
       std::cout << "Couldn't open file.\n";
    }

    std::cout << "Writing data to file " << filename << "\n";

    // On the first line, the magic marker for a PPM file -- "P3"
    f << "P3\n";
    // On the second line, the number of columns and the number of rows
    f << msg->width << " " << msg->height << "\n";
    // The max colour value
    f << "255\n";

    // Write the triplets of colour
    int row = 0;
    int pixel;
    while (row < msg->height)
    {
        pixel = row * msg->step;
        while (pixel < (row+1)*msg->step)
        {
            // B = pixel, G = pixel + 1, R = pixel + 2 -> we need RGB for the PPM format
            // Since we are working with unsigned char, cast it to int for writing to the file
            f <<  static_cast< int >(msg->data[pixel+2]) << " ";
            f <<  static_cast< int >(msg->data[pixel+1]) << " ";
            f <<  static_cast< int >(msg->data[pixel]) << " ";
            pixel += 3; // Jump to the next bgr triple
        }
        f << std::endl;
        row++;
    }
    f.close();
}
Basically, we write the header of the ppm file and then the plain text values for the pixels. The only tricky aspect is the order of the rgb pixels. The encoding of the data from the image is bgr8 (check it yourself in msg->encoding!), whereas for the ppm file we need rgb. That's why, on lines 35-37, we take the values in reverse order.

In order to run the program, we start roscore in one terminal, and the openni camera in another terminal:
$ roslaunch openni_launch openni.launch
In a third terminal, we start our listener:
$ rosrun screenshot cam_listener
Every 15 seconds, a shot from the camera will be saved in the package's root folder in which you start the listener, so you can see the angry faces you make while debugging the ever-changing ROS packages! :D

Using ROS to view input from Asus Xtion

This post shows how you can use ROS to view the input from your Asus Xtion camera.

My system configuration:
Ubuntu 12.10 32b
ROS Groovy
Asus Xtion


Start roscore in a terminal.

In another terminal, open the openni device (here, the Asus Xtion camera):

$ roslaunch openni_launch openni.launch
Make sure the camera has started ok. You should see at the end something like this:
...
[ INFO] [1362906100.839021507]: Initializing nodelet with 2 worker threads.
[ INFO] [1362906104.029471453]: Number devices connected: 1
[ INFO] [1362906104.029784696]: 1. device on bus 001:09 is a PrimeSense Device (600) from PrimeSense (1d27) with serial id ''
[ INFO] [1362906104.031251209]: Searching for device with index = 1
[ INFO] [1362906104.065753491]: Opened 'PrimeSense Device' on bus 1:9 with serial number ''
[ INFO] [1362906104.097498307]: rgb_frame_id = '/camera_rgb_optical_frame'
[ INFO] [1362906104.097716391]: depth_frame_id = '/camera_depth_optical_frame'
If it doesn't start ok, it might be because the camera is plugged in a USB3 port. In that case, just switch to a normal USB port.

In yet another terminal, run the image_view node:
$ rosrun image_view image_view image:=/camera/rgb/image_color
 A window should pop up, showing the normal video output of the camera.

Wednesday, October 31, 2012

A little something with coins and without Bayes

Don't have any coins on hand?
Do you need to prove empirically that the more you flip a fair coin, the closer to 50-50% heads-tails you get?

Look no further! You can use the coin simulator OF THE FUTURE that I rustled up in Khan Academy. Feel free to boil it, mash it, stick it in a stew, but you'll probably get the biggest benefits if you just play around with the variables in the source code.

Coin Flip Probability Simulator

Made using: Khan Academy Computer Science.

Saturday, September 15, 2012

Talk: Schraudolph -- Rapid Stochastic Gradient Descent (2006)

This is a talk given by Nicol Schraudolph at the 2006 Machine Learning Summer School in Canberra
Link to video of talk: http://videolectures.net/mlss06au_schraudolph_aml/

He gives a very clear explanation of the "big picture" of the optimization behind machine learning algorithms as part of the introduction of his research and, in particular, of stochastic gradient descent.

Notation:
n - size of search space (size of Theta)

Optimization methods can be categorized in three big classes:

  • Direct (evolutionary algorithms)
    • Cost per iteration of O(1) - O(n)
    • Basically, running around wildly in the search space
    • Low convergence speed
  • Indirect (gradient descent)
    • Cost per iteration O(n)
    • The gradient (slope of tangent) tells us in which direction to go to reach the minimum
    • Better convergence speed, but still slow
  • 2nd order gradient (for instance, Kalman Filters)
    • Cost per iteration O(n^2) - O(n^3)
    • Tells us in which direction to go, but also how far to go
    • Very fast convergence

In the talk, Schraudolph is of the opinion that in future the amount of data will lead to an increase of importance of gradient methods, because the size of the search space will make 2nd order gradient methods prohibitively expensive.
He's working on methods to accelerate the convergence of gradient descent, which is the most efficient if you set your problem right.

Stochastic Meta-Descent (SMD)
The gist of the method: to avoid running one loop of optimization on all the samples we have, we run it only on a small subset. This is why we should not be too accurate in the optimization method used, because we might have picked a subset that is leading us away from the real optimum. But over time, all the changes we did to the theta values in these mini-optimizations will sum up to a good approximation.

So, given the stochastic gradient

we adapt Theta by gradient descent with gain vector eta as follows:

The key idea is to adapt eta at the same time by using the exponentiated gradient method (same idea as for the Theta, just applied to the eta => hence the "meta" descent in the method title).

Edited to add link to exponentiated gradient method description.

Tuesday, September 11, 2012

kNN with Euclidean distance on the MNIST digit dataset

I am playing with the kNN algorithm from the mlpy package, applying it to the reduced MNIST digit dataset from Kaggle.

I am using the training data for doing a bit of cross-validation to see how the algorithm behaves for various values of k between 1 and 20.
For this purpose, at each run with a different k value I randomized the data set and split it in a proportion around 66% for training and 33% for cross-validation. This amounts to 28000 samples for training and 14000 for CV/testing. No preprocessing has been applied to the data.

The accuracy and errors I got are plotted in the figures below:




The best results (error 0.0365, accuracy 0.9635) are achieved for k=5. For values k>10 the performance goes downhill.
It would be interesting to use other metrics (Mahalanobis?) and see what effect that would have on the algorithm's performance.

For my own reference as R apprentice, here is also the script used to generate the plots:
k_values <- 1:20
accuracies <- c(0.963285714286, 0.9365, 0.9625, 0.955571428571, 0.9635, 0.957357142857, 0.959, 0.953714285714, 0.960214285714, 0.955428571429, 0.956571428571, 0.9565, 0.954857142857, 0.952, 0.956428571429, 0.954428571429, 0.953214285714, 0.952857142857, 0.950428571429, 0.952785714286)
errors <- c(0.0367142857143, 0.0635, 0.0375, 0.0444285714286, 0.0365, 0.0426428571429, 0.041, 0.0462857142857, 0.0397857142857, 0.0445714285714, 0.0434285714286, 0.0435, 0.0451428571429, 0.048, 0.0435714285714, 0.0455714285714, 0.0467857142857, 0.0471428571429, 0.0495714285714, 0.0472142857143)
plot(accuracies, xlab="k value", ylab="Accuracy", col="#00FF00", main="kNN for digit classification: Accuracy vs K value")
lines(k_values, accuracies, col="#00FF00")
plot(errors, xlab="k value", ylab="Error", col="#FF0000", main="kNN for digit classification: Error vs K value")
lines(k_values, errors, col="#FF0000")

Tuesday, July 10, 2012

Y'all are leeching DVD rips, admit it

In the previous post (refer to it for the R code for preliminary data reading and processing), I have made a naive analysis of the Pirate Bay data set (partly for taking the opportunity and playing with R functions). Let's be a bit more structured this time, and answer the question:
What torrent size is, overall, the most leached? 
I am aiming to show that, on average, DVD rips are the most commonly leached size.

Disclaimer: Don't take this too seriously! I needed a simple data set to trigger my process of learning R and exploratory data analysis. Nevertheless, the data doesn't lie :-D

The argument in this post is still shaky, since I won't be taking into account torrent categories, which undoubtedly influence the results; furthermore, one of the assumptions is that a torrent equals a DVD/CD-sized movie, or a TV series episode -- nothing is being done about torrents which might contain a whole season, for instance, and whose size is harder to predict due to the variable number of episodes per season across TV shows. I am also completely ignoring other types of content.

For these purposes, I will split the torrent data into groups based on their rough size:
  • smaller than 330MB
  • around 350MB, roughly the size of a TV series episode
  • between 370 and 690MB
  • between 690MB and 2GB, roughly the size of an average movie
  • between 2 and 4.6GB
  • around 4.7GB, roughly the size of a DVD
  • larger than 4.8GB
Then, we'll compute the average number of leechers in each of these groups, and plot them for easier analysis.

First, we define the variables mb (holds the number of bytes in 1 MB) and gb (same for 1GB):
mb <- 1048576
gb <- 1073741824

Then, we group the data in the size groups defined above:
tiny <- pirate.data[which(pirate.data$size/mb < 330), ]
episode <- pirate.data[which(pirate.data$size/mb > 330), ]
episode <- episode[which(episode$size/mb <= 370), ]
normal <- pirate.data[which(pirate.data$size/mb > 370), ]
normal <- normal[which(normal$size/mb <= 690), ]
movie <- pirate.data[which(pirate.data$size/mb > 690), ]
movie <- movie[which(movie$size/gb <= 2), ]
large <- pirate.data[which(pirate.data$size/gb > 2), ]
large <- large[which(large$size/gb <= 4.6), ]
dvd <- pirate.data[which(pirate.data$size/gb > 4.6), ]
dvd <- dvd[which(dvd$size/gb <= 4.8), ]
huge <- pirate.data[which(pirate.data$size/gb > 4.8), ]

(I still have to find out whether I can pass a composite condition to the which function in R, as I was not able to do so yet)

Now we just take the mean value of the leecher numbers in each group and pack them in an array (we could have done this in one step, but presented like this for readability):
mean_tiny <- mean(tiny$leechers)
mean_eps <- mean(episode$leechers)
mean_normal <- mean(normal$leechers)
mean_movie <- mean(movie$leechers)
mean_large <- mean(large$leechers)
mean_dvd <- mean(dvd$leechers)
mean_huge <- mean(huge$leechers)
means <- c(mean_tiny, mean_eps, mean_normal, mean_movie, mean_large, mean_dvd, mean_huge)

Adding some graphical details to make the result look better and be easier to analyse:
labels <- c("<330", "~350MB", "370-690", "~700MB", "2-4.6GB", "~4.7GB", ">4.8GB")
colors <- c("blue", "red", "blue", "blue", "blue", "red", "green")

Finally, we create a barplot of the data we created:
barplot(means, names.arg=labels, col=colors, xlab="Size groups", ylab="Average number of leechers")

Fig.1: You know who you are, red groups! ;-)

So, as expected, loads of leechers around 350MB and 4.7GB! Above 4.7GB we probably have many hi-res movies and huge packages of another nature, so it's hard to discriminate further.

It would be interesting next to do a similar analysis, but taking into account the torrent category. Which is just another way of saying that we expect the highest leeched torrents to be composed, at least partly, of materials of a pronounced adult nature.
(Or, perhaps, some ultimative lolcat compilations! I am ready to be proven wrong ;-)

I'm in your data, detecting your torrent size

aka "what can we find out from anonymized torrent data!"

In particular, we want to see what is the most leeched torrent size: do the stats back up our initial intuition that sizes ~350MB (=standard size of a TV series episode), ~700MB (=standard size of a movie/CD), or ~4.7GB (=standard size of a DVD) would be the most leeched?

The Pirate Bay 2008-12 Dataset can be downloaded here: http://www.csg.uzh.ch/publications/data/piratebay.html

The set is a csv file. Each line describes a torrent, using the following self-explanatory features:
  • idtorrent
  • category
  • size in bytes
  • number of seeders
  • number leechers
We can load the data set in R like this:
pirate.file <- file.path('C:\\Workspace\\datasets\\20081205_thepiratebay', '20081205_thepiratebay.csv')
pirate.data <- read.csv(pirate.file, header=TRUE, sep=',')

First, let's do a scatterplot of torrent size and leechers:
ggplot(pirate.data, aes(x=size, y=leechers)) + geom_point()
Fig.1: Scatterplot of torrent size against leechers
It seems like we have the most leechers for small sizes, but otherwise it doesn't look really great. Not much can be readily seen.
But wait! There's more! We can do a log10 of the size (just replace size by log10(size) in the ggplot command above), perhaps that looks better?
Fig.2: Scatterplot of the log10 of size against leechers


Using log10 has indeed helped to improve! We can already recognize some peaks in the plot: for instance, the one around a value of 8.5 for log10(size).

So now we can do some calculations using these equivalences:
1 Megabyte = 1048576 Bytes
1 Gigabyte = 1073741824 Bytes

Roughly, this would mean that the size in MB is around 10^8.55/1048576, or around 330MB.

(Another visible peak is at around a log10(size) of 8.9 or so, which puts us around 750MB.
Yet a third peak lies around 9.7, which gives us ~4.66GB, or roughly the size of a DVD).






But we can find out what the peak is much more efficiently and accurately.

For better readability, we first extract the sizes and leecher values in separate variables (lines 1 and 2 below).
Now we need to find out what is the exact value of leechers in the peak. This is simply the maximum among all the leecher values (line 3).
Then we find out the index of the torrent that has so many leechers (line 4).
Finally, we print out the size of that torrent (line 5).
pirate.size <- with(pirate.data, size)
pirate.leechers <- with(pirate.data, leechers)
max_leechers <- max(pirate.leechers)
index <- (1:length(pirate.leechers))[pirate.leechers >= max_leechers]
pirate.size[index]

The size is 366708000 Bytes (we only displayed it as log10, the size data itself is still not changed).

366708000 / 1048576 gives us a size of 349.72MB, which confirms our initial intuition.

But what can we do about the other two sizes we wish to test for? How frequent are those?
They are not the max peak, so our naive approach here does not function.

Fortunately, there are some interesting functions of R that can possibly aid us. Stay tuned for the next post!