Project: We See You

We See You

The first git check-in for this project dates from june 2016. I had been working since 2008 on the Helena project where I used the machine learning tools from the excellent framework C# Framework by C├ęsar Souza [1] In this period I started learning more on machine learning by watching the excellent online classes of Yaser Abu-Mostafa from the California Institute of Technology (CalTech).

In my first attempts to turn data into abstract paintings I used a very rough collection of machine learning tools the naive way. KMeans cluster analysis to group data, bag of word tricks to transform data but I was never able to leverage neural network techniques for my projects because of the sheer complexity and my own mis- or absent- understandings of the proper math.

Then somewhere in 2015 I got notice of the emergence of the Tensorflow project by the Google Brain team. With the help of the excellent Tensorflow courses on Udemy by the Lazy Programmer step by step I was able to employ machine learning neural network techniques to solve some practical problems like the MNIST problem of recognising hand written numbers, object detection and classification and slowly but surely I was starting to get comfortable with computer based neural networks to be able to use it in my art projects more and more.

Already early in my studies I suspected that the GAN theories were very well suited for art projects and I still have the ambition to implement a GAN project although there are already some excellent examples of successfull GAN art projects out there. [2]

This project however is a neural style transfer project and tries to comment on our modern times challenges with the interconnected world and digital monitoring of our behaviour. We live in the highly connected era, the internet is starting to reach or already reached it's post puberty phase. We talk to each other through the internet, we post our images on the internet, we pay our bills we get via e-mail through the bank app on the internet we like and read our news articles via, on or the other, social or news app. Meanwhile the debate on privacy vs security is very much alive.

Data can describe a lot of things, from the economic and welfare state of a country to political or sexual preferences of a person to the expected behaviour in a certain situation based on historical data. Will you buy more or less vegetables when you have been bombarded with issues on environment, carbon dioxide emission problems and reports on the practices of the mass bio industry for instance.

A very popular and practical paradigm in machine learning states 'All Data Is The Same' [3]. This is of course not to state that all data means the same but in the end for our machine learning algorithms all data used is normalized down to a real number between minus one and plus one.

But with that true statement in mind it also means that all the data collected about us can be used in all machine learning algorithms to serve the purpose of the algorithm and therefor can be used to analyse, describe and predict us. But how true is this description and prediction then? All data may be the same but that makes the collection of the proper data even more important. When you feed the wrong data into a machine learning algorithm you might get results that you can work with but will it still represent the actual reality of the complexities in our societies and social interactions. [4]

For example super simplified of course, can yo read the irony of my intentions from my purchasing behaviour?

Maby a little bit on the nose: but the We See You images are in essence a picture taken of a person, that image or data is then analysed, processed and used to create a new perception of that person in the form of a new image. But is that new image still a representation of that person and if so what does that mean?

The We See You installation

The We See You project is an art installation around a painting of an eye that can be displayed/presented in the traditional art gallery sense. But the eye actually records it's observer. Somewhere behind the eye there is a brain connected that can interpret the recorded image. In this installation the brain is trained to detect people in the image. When a person is detected, a more creative part of the 'brain' is activated and the image will be mixed together with another image using the neural style transfer principle to create a completely new image.

The eye

The eye is a black framed, yellow, green and brown wooden layered panel painting with some epoxied computer parts attached (memory modules) and depicts a face en profile with the focus on the eye, decorated With the text 'We See You' in wooden letters. In the center of the iris of the eye an Arducam Autofocus Raspberry Pi camera is installed. The images captured by the eye are send wireless on the private network to the 'brain' for processing.

The network

The installation's network is provided by a dedicated router (actually a part of this installation) that is completely disconnected from the internet. So it is a stand alone network with no access to anything outside this network. This seems counterintuitive for a computer based and software driven art installation. The decision is a conscious one however. I wanted/needed this installation to be self-sufficient, in the sense the with some minimal instructions if you have access to the installation's components (and have ofcourse access to electrical power) the whole thing can be started up and still work even when we are in the year 2268 where the internet is that antique from the 21st century.

Object detection

One of the responsibilities of the 'brain' is to detect whether or not the image is 'interesting' enough to hand over to the neural style transfer part of the 'brain'. Interesting in this context just means that at least one person can be detected in the image.

The technology used for this is a YoloV3 object detection implementation with pretrained weights from Joseph Redmon's Darknet implementation.

Neural style transfer

When an image from the eye is 'interesting' it will be handed over to the next process in the 'brain' which is the Neural Style Transfer process first described by Leon Gatys, Alexander Ecker and Matthias Bethge on in 2015.

The neural style transfer is my own VGG19 Convolutional Model implementation made with the help of the courses of the Lazy Programmer, Tensorflow tutorials, Keras tutorials and a lot of 'inspirational' pieces of code and implementations of my IT brothers and sisters on the internet. [5]

The neural style transfer process is a very CPU or better yet GPU intensive process, for instance an NVIDIA GTX 1080 with 11GB of memory needs more than 5 minutes to process a 1500 by 1500 72 DPI image to process. Part of this installation therefore is a GPU enabled computer for this process exclusively. [6]

The drones

When the GPU machine is done processing the image it sends it to the next available drone to display the image. The drone in this context is not this loud annoying flying machine but a cheap micro computer with display capabilitities. For this installation I work with Raspberry Pi's with a simple monitor attached to it.

What I imagine for this installation to work properly is that the Eye is placed in a central place of the exhibition surrounded by initially empty displays with no clear connection to the painting itself. When time goes by and the 'brain' processes the images from the Eye the room's display will start to show the style transfer distorted images of the audience and eventually not only images of the audience looking at the Eye painting but also images of the audience looking at a representation of themselves.

Docker images

For this project I invested heavily on designing it to be archivable and easily installable for future use. As stated earlier the installation is build around its own private network for communication between the different components and it is not relying on any internet connection to use it. The 'brain' and drone components are provided through a docker image which basically means that to run this project you just need a computer with a Nvidia [7] GPU, a docker installation and access to a docker registry to pull and start up a the proper docker image.

More reading

The docker images can be found on the Docker hub.

The technical documentation can be found on Gitlab

[1] primarily for data analysis and subsequently transformation into a two dimenstional pixel array. See the Helena project description.

[2] Somewhere in 2018 a GAN created image already sold for a lot of money.

[3] Credits go to the Lazy Programmer for that one.

[4] In the Netherlands in 2021 our Covid crisis battling government had to resign because a specific tax collection procedure was registering nationalities which was in direct conflict with the dutch privacy regulation which made it illegitimate. The most problematic in this situation was that the used algorith was able to label people based on their nationality (data) were labeled (analyis) as fraudulent (prediction). This criteria decision was introduced in 2012 based on knowledge and experience of years prior when it was discovered that people with a certain nationality were over presented in a certain fraud case. A lot of information is published on this. You can start here (in Dutch).

[5] on that note I need to thank everyone that ever published any knowledge on the internet, without you I would have been lost a long time ago. Thank you thank you!

[6] this is the 'brain' I reference throughout this text. An Nvidia GPU enabled machine.

[7] of course Nvidia's future is not guaranteed. Let's hope computing is something that will survive the future.