Software development is one of the fields in which the around the clock work of the extremely proficient people behind it can be objectively seen as it evolves. We’re now working on software which we couldn’t even have imagined a few years back. And of course, Google is one of the leading developers in the field. In one of their most recent reveals, the tech titan shows how PlaNet by Google identifies worldwide photo locations.
Where was that photo taken?
Taking advantage of the increasingly advanced techniques of deep learning, Google created the PlaNet. The software can identify where photographs were taken by analyzing clues in the image, although its accuracy is far from perfect. However, it can still outperform most humans in its knowledge of geography.
Currently, the software can identify somewhere around 48% of images at continent level, 28.4% at country level, 10.1% at city level, and only 3.6% at street level. It may not sound that impressive, but the software outperformed all other human and AI programs it was pit against. And the fact that it’s using deep learning means that it gets smarter and the accuracy increases with every photo it “sees”.
In order to test how it fares against well-travelled humans, the developers behind it had it compete in 50 rounds against worldwide travellers in the game Geoguessr. PlaNet won 28 out of the 50 rounds it played, and its average localization error was of somewhere around 1131.7 km compared to that of the human players which had an average localization error of 2320.75 km.
How does it work?
Instead of going for the route taken by other image locating software, the people behind PlaNet decided to take a different path. These other programs do their identifying by browsing through the hundreds of billions of photos on the internet until coming upon something as close as possible to the inputted image.
PlaNet, however, does things differently. By dividing the planet’s surface into thousands of cells, the program is trained to use geotagged images to identify in which square or cell the photo was taken. It relies on photos it has already “seen” to identify the location where picture was snapped.
This is where the limitations of the software stem from. Because it relies on photos it has seen from the internet, it’s at a severe disadvantage when it comes to rural areas or rarely circulated parts of the world. But as it’s using deep learning, seeing a few pictures of those areas would do the trick.
It’s pretty much the same technique used in videogame testing to clear a map of bugs – divide it into squares and go square by square. It’s a bit more time consuming in the short run, but in the long run it helps by reducing the need of going through same area more than once or missing a when passing by it.
Image source: Pixabay