It has become easier and easier to take pictures of everything, but technology to manage them has not progressed much. Photographs capture precious memories and emotions, yet a computer stores them as files and bits. Effectively, we, the users, are the only ones for whom these images still have a meaning. Maybe not for long: as photos accumulate by the hundreds, or thousands, on our phones, memory cards, hard drives or social networks, the cost of collecting, organizing, and managing them becomes so high that we also lose touch with these captured memories. Our memories could, ultimately, end up forgotten on some computer memory.
How did we get there? Isn’t it paradoxical that the more we capture, possibly not much more, in the end, we end up being able to recollect? Has quantity defeated quality? Shouldn’t computers be able to help? It seems to always be the time for a brand new camera; yet a convenient photo solution has yet to come.
A little over a year ago at Everpix, we introduced a feature called “assisted curation”. Its goal was simple: from a set of hundreds of photo taken during, let’s say, a vacation, extract with one click the ones that matter and clean up the rest. Remove the curation hassle, the “one-by-one” rating of images, and quickly transform a collection into something that users can share right away.
We, at Everpix, liked that idea, yet something was not right. Why would the system pick this photo, instead of that one? Why not the photo of that pet; yes it’s a little blurry, but it’s my cat! This simple observation actually conveys something pretty profound: the system seemed to have some intelligence of its own, but that intelligence was not mine. Only “I” know my life, thus my photos, and an algorithm won’t do.
An algorithm is like a black box which, given an input, produces an output. Computers execute algorithms; can they do anything else anyway? Yes and no, but to clarify all that, let’s jump back in time a little bit. It is not widely known that the birth of computer science can be tied back to the Cybernetics movement in the 40’s and 50‘s.Part of this movement were some of the brightest minds of the 20th century: the neurophysiologist Warren McCulloch, the logician Walter Pitts, the mathematician Norbert Wiener, John von Neumann and many more. Their vision and convictions deeply shaped the computers we know today.
Their aim was to effectively build machines that work like brains, or to quote Alan Turing, “human computers”. Black boxes, which given some input would produce an output indifferentiable from the output a human would produce; rapid descriptions of Turing’s “imitation game”which became known as the Turing test: if both a machine and a human are hidden in a room and can only communicate with an individual via transcripts, can I build a machine smart enough which would make the two indistinguisible? Alan Turing gave a fascinating prediction in just 1950 :
"I believe that in about fifty years’ time it will be possible to programme computers, which a storage capacity of about 10^9 [about 100Mb], to make them play the imitation game so well that an average interrogator will not have more than 70 per cent. chance of making the right identification after five minutes of questioning."
Not bad, although what do we have today, 63 years later? Siri. Ouch.
To call Siri an algorithm is a shortcut: its technological stack is extremely complex and made of many, many algorithms. Yet it is the closest we have today to a “human machine”, and yes, you can converse with Siri, ask her how she’s doing, the time or her gender. For small talk, she might pass the Turing test. However, when things get a little more complicated, what will Siri do?.. A Google search! Turing certainly didn’t see that one coming.
There is something fundamentally different in technologies like Siri and Google, which is where the model lies. Siri is artificial intelligence (AI). AI comes from the neural networks of McCulloch and Pitts, whose aim was effectively to model the brain. A “model” here, is taken in the scientific term of “modeling something”, like Newtonian physics modeling dynamics of our scale. It’s an “abstraction [which] consists in replacing the part of the universe in consideration by a model of similar but simpler structure”. When chatting with Siri, she’s the model. When doing a Google search, on the other hand, there’s no model. One can also say the whole web becomes the model, but that’s the other meaning of “model”, as “reference”; something to imitate. Endogeneity versus exogeneity. Now, which is more useful? Siri or Google? Who will you trust more? Maybe we should let people do the talking, and computers the computing?
That’s why at Everpix we’ve been doing something which is truly exemplary: we killed the algorithm. We removed the models, or rather we said the best models for managing our users’ photos, are our users. If a user, let’s call him John, is a food enthusiast and takes a lot of food close-ups, are we going to tell him that this photo is not the photo of a dish because an algorithm only learned to model some other kind of dishes? If another user, Sara, takes photos mostly with her camera phone on evenings, are we going to tell her she should use an expensive DSLR, because an algorithm only recognizes well lit photos? Just like Siri, an algorithm always disappoints.
Let’s say you are entering a room for the first time. What will you see? You’ll see walls, a floor, a ceiling, a bunch of tables and chairs. How did you recognize them? You could very possibly have never seen such a design of a chair or table, yet instantly you recognize what they are. I don’t think you have a mental model of how every possible chair is constituted and matched it to those you see; neither did you do the quite intense action of grabbing a mental model of table with four legs and somewhat stretching it so it matches the table in the room. Maybe, by the way, it is a designer table which does not even have legs. No, you effectively recognized it all without thinking about it. You just saw things as they reminded you of similar things in similar context (a room) and used to perform similar actions (chair: sitting, table: putting things on it), and identified the new objects from these memories. Well, our technology works just like that.
We want to reengage our users with their photos in a way that makes sense to them. What do users see in their photos? They see content, contexts, objects, and people. They see memories, which bring back to mind memories from other photos. They see a web of memories as opposed to folders of JPEG files. We navigate this web not by creating a complex brain in the machine, but an empty brain, in that it won’t contain any knowledge whatsoever on how meaning is determined in images. Like when searching for “buy paperclips”, Google won’t effectively know what webpages to “buy paperclips” look like but will infer it thanks to backlinks. Our empty brain just infers meaning in your images from “backlinks” from other images. The hard part is creating these image backlinks. Google already has words, we only have pixels. This is called the “semantic gap”.
I can’t go in details about how we close this gap, but will let some results speak for themselves. Wayne, our designer and co-founder, loves his cat, food and taking city photos. Here are some his memories, instantly found in his collection of 7,000 photos by our system.
Similar content is one of the many characteristic of images which can be “backlinked”. Sometimes photos reminds you of others because they have the same aesthetic characteristic, pose or framing, not only because of content. Navigating along these backlinks, like jumping from one memory to another, is what we call “Similar Photo Exploration”. Wayne can then traverse his collection from a photo to others with the same aesthetics:
Or from a photo of an Arc de Triomphe replica he took in North Korea to one of the Eiffel Tower in Paris and buildings in London:
The algorithm is dead, and your memories are coming back!