Color Perception could be an Artifact of temporal Self-Supervised Learning

This has been a little side proof-of-concept project derived from our big ICLR publication. Turns out learning visual representations in a self-supervised manner with a temporal coherence loss can explain the phenomenon of color constancy.

 

[1] M. R. Ernst, F. M. López, A. Aubret, R. W. Fleming and J. Triesch, “Self-Supervised Learning of Color Constancy,” 2024 IEEE International Conference on Development and Learning (ICDL), Austin, TX, USA, 2024, pp. 1-7, doi: 10.1109/ICDL61372.2024.10644375.

Using Neural Networks to approximate a dynamical System

Ever since I started studying atmospheric physics I was fascinated with the concept of chaos theory. The idea that even in a deterministical system (as long as it is non-linear) the smallest of deviations to the initial conditions can lead to unforeseen consequences is captivating.

Deep Neural networks are a fascinating topic in its own right. Not only because of their amazing capabilities in modern AI reseach, but also more fundamentally because of the universal approximation theorem. Where neural networks are universal function approximators, recurrent neural networks can be seen as universal dynamical system approximators. Since that means you can learn an arbitrary dynamical system just from data it suddenly becomes something incredibly useful for all kinds of scientific questions. For example you can parameterize a process that you don’t understand, but have lots of measurements of.

I wanted to demo this approach and chose a simple RNN to approximate the famous Lorenz ’63 dynamical system.

First we need to define the system itself that is governed by the differential equations; here I defined it as a torch nn.module to make use of the accelerated matrix multiplications.

# lorenz system dynamics
class Lorenz(nn.Module):
    """
    chaotic lorenz system
    """
    def __init__(self):
        super(Lorenz, self).__init__()
        self.lin = nn.Linear(5, 3, bias=False)
        W = torch.tensor([[-10., 10., 0., 0., 0.],
                          [28., -1., 0., -1., 0.],
                          [0., 0., -8. / 3., 0., 1.]])
        self.lin.weight = nn.Parameter(W)

    def forward(self, t, x):
        y = y = torch.ones([1, 5]).to(device)
        y[0][0] = x[0][0]
        y[0][1] = x[0][1]
        y[0][2] = x[0][2]
        y[0][3] = x[0][0] * x[0][2]
        y[0][4] = x[0][0] * x[0][1]
        return self.lin(y)

We then use a Runge-Kutta fourth order scheme to basically get a ground truth by integrating the differential equations starting from an initial condition y_0.

So far so good, we have a dynamical system and we are integrating it over time. This ground truth will become our training data for the neural network. In fact we will give some predefined context to the network and ask it to predict the next state of the system. Since we have the data from the ground truth we automatically have our “labels”. The neural network for this simple exercise is quite basic and looks like this:

class fcRNN(nn.Module):
    def __init__(self, input_size, hidden_dim, output_size, n_layers):
        super(fcRNN, self).__init__()
        self.hidden_dim = hidden_dim
        self.n_layers = n_layers
        self.rnn = nn.RNN(input_size, 
                  		hidden_dim, n_layers, 
                  		nonlinearity='relu',
                  		batch_first=True) # RNN hidden units
        self.fc = nn.Linear(hidden_dim, output_size) # output layer
    
    def forward(self, x):
        bs, _, _ = x.shape
        h0 = torch.zeros(self.n_layers, 
            bs, self.hidden_dim).requires_grad_().to(device)
        out, hidden = self.rnn(x, h0.detach())
        out = out.view(bs, -1, self.hidden_dim)
        out = self.fc(out)
        return out[:, -1, :]

I put a link to my github gist at the end of the post, but for now let’s look at some of the results. The first one is an animation of training progress for 5000 epochs over time. The scatter plot shows the previously calculated ground truth. The network gets the first 15 time steps as a contest and is then tasked to predict the whole trajectory from there. What you can see is that in the beginning the network fails to correctly represent the system, but over time it becomes more stable and more similar to the ground truth. Keep in mind that exactly fitting all of the scatter points is incredibly difficult due to the “butterfly effect”. If you are only slightly incorrect at one of the previous timesteps the consequences can become arbitrarily large over time.

 

The second animation shows you all of the generated trajectories over the course of training, where early models are depicted in dark colors and late models in bright ones. Here it is quite cool to see that the first two models really didn’t yet train enough to capture the system’s dynamics and immediately leave the trajectory once the 15 context samples ran out. From there you can observe that models that have trained for longer better approximate the true state shown in blue. Now again after one loop or so the even the trained models diverge due to minor differences at some point in the past, but it is worth noting, that the trained recurrent networks (especially the ones with more training time) correctly characterize the dynamical system with its two lobes spanning the 3D space. Now what that means is that you can use the network to describe the system and you probably have a useful model when combined with some data fusion process like Kalman filtering.

 

Feel free to have a look at the full code and play around with the model yourself @github-gist.

How Natural Interaction can Bootstrap Self-Supervised Learning

This has been in the works for quite some time now. If you find the ideas teased in the video interested take a look at the accompanying paper [1], to be presented at this years ICLR.

 

[1] A. Aubret*, M. R. Ernst*, C. Teuliere, and J. Triesch, “Time to augment self-supervised visual representation learning” in International Conference on Learning Representations (ICLR) (2023).

* Equal contribution

Pytorch on Apple Silicon

In collaboration with the Metal engineering team at Apple, we are excited to announce support for GPU-accelerated PyTorch training on Mac.

It’s about time. It has been 18 months since the first M1 chips shipped and finally we got some support for the advanced GPUs on these chips. Being able to leverage your own local machine for simple Pytorch tasks and even just doing local testing of bigger projects is of tremendous value. One thing I am still confused about is that all of this technology still just uses the ‘standard’ GPU cores and there is still no way to access the custom Neural Engine of the chip. This would make inference on device even better. But it’s up to Apple to give access to their internal APIs.

Designing a Poster Experience in the times of COVID-19

Different Times

I spent the last few months working as a research assistant in artificial intelligence at the Frankfurt Institute for Advanced Studies. With the global COVID-19 situation almost all parts of the scientific workflow went digital. I was working from home, logging into the office computers and high performance clusters via SSH, but all in all this was not that much different than programming and logging on from the office. One thing that has changed significantly, though, is conferences.

As large gatherings of people are potential COVID hotspots it totally makes sense for conferences to go virtual, … at least for this year. And there is tremendous potential in having virtual events: Talks get recorded more frequently, allowing for a more flexible schedule; pre-recorded talks make sure that everyone keeps within their time-slot; it’s more inclusive for people who otherwise cannot afford to attend and last but definitely not least it’s better for global climate for not having people fly all over the world.

I firmly believe that the academic world should learn from this special situation and incorporate some of the up-sides of having virtual events into future conferences: Maybe an alternating schedule (virtual/in-person) would be a good idea. But right now I feel everything is going to go back to as it was before.

Why do I think this…? Well, I think people enjoy being with each other and we have not figured out how to do the networking part conferences yet. For me it becomes most apparent with poster-sessions. Sure you cannot drink a beer at bar with your colleagues when you are stuck behind your webcam at your desk, but that was never going to be. The poster-session format is partly social interaction with a presenter and it fails badly at virtual conferences, and actually there is no real reason why.

Fixing Virtual Interaction

So there might be conferences that actually do what I suggest next, so I apologize beforehand, but the three conferences I attended this year either scrubbed the poster session and made posters downloadable or made them short 5 min prerecorded talks. I understand the notion that this is the easiest way but I am kind of disappointed that we have not figured out something better.

In my opinion there’s actually two sides to the problem: conceptualization and software.

You have to think about what makes a poster-session a poster-session. How does it work and why does it work? Inspire from the real world and take it to the virtual one.

Start with a showcase. There needs to be a website where you can visually scroll through all of the posters. This is a no-brainer: don’t just go with abstracts and links and no, it doesn’t have to be a fancy 3D virtual floor. Just make it a long scrollable website with BIG images. People have spent time designing these posters, this is what catches our attention at the poster session. Have little hearts or thumbs-up icons next to the images to indicate whether some poster creates enthusiasm among the participants.

When you click on a poster you should be pulled into a virtual conference room, like a zoom call, where the presenter talks about her poster and answers questions. Now here comes the software part: Simple screen sharing is not enough, we need a simplified powerpoint for posters/documents. Conceptually it should be like prezi, because the poster is the whole canvas, but there should be simple tools that analyze the PDF and make it easy to highlight regions like boxes and round-rects and of course you should be able to pan and zoom around. Moreover it should not be a video-stream! Something like this needs to be rendered client-side. These posters are smaller than your average hd video and they are vector-based, making them effectively resolution independent. With client-side rendering the participants can decide for themselves on which parts of the canvas they would like to focus.

To end this little blogpost I want to showcase what I came up with for this year’s ESANN conference. Let me be clear, this was way more work than a simple poster. But given the right software it should be feasable to transform any PDF into something where you can zoom around and highlight passages in a similar manner [1].

On a little side-note I’m still on the lookout for a cool PhD position, so please get in touch if you want to work with me!

[1] If you liked that video, you can of course take a look at the corresponding paper:

Ernst M.R., Triesch J., Burwick T. (2020). Recurrent Feedback Improves Recognition of Partially Occluded Objects. In Proceedings of the 28th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN)