Counting beans


How many people make a “crowd”? Ten? Fifty? A hundred? Based on experience, we may be able to distinguish a crowd from a group of friends, but if no auxiliary data are used, “it was a crowd of about 40”, for example, this particular word is, by definition, a vague one and simply means “enough” to “many”. However, a more accurate estimate of the number of people in crowds seems to remain an issue, especially in a mechanized social organization.

For example, the dominant trend on how to evaluate the impact of a protest march/event is counting the number of participants – and not without reason. The number of people taking to the streets is an indicative benchmark for those concerned and those willing to devote some of their personal time to the matter. This may not reflect all the “concerned” ones, but it definitely serves as an approximate picture.

But street presence, an event held in the context of political action, is not simply a matter of numbers. Certainly without them, after a certain point, there is not even the possibility of intervention, in terms of pulling off events of some intensity, enough to be considered competitive (there is room for discussion here, though). But is the quantity of a crowd the beginning and the end of political intervention? If we are talking about the state perception of competitiveness and power, then yes. Numbers are a key element of its ideology.

We say this, not to start a discussion on political action and its characteristics, but as an introduction to presenting a technology that may prove useful to all those “number-fetishists” for whom measured precision is of more critical importance than the approximate picture.

Images like these fed the particular system. A sports event, an urban center, a commercial exhibition and a concert.

Researchers at the German Aerospace Center have fed images of demonstrations from planes, helicopters and drones into an AI system, providing highly accurate estimates on the number of protesters. This one is a neural network system specializing in visual recognition, called Multi-Resolution Crowd Network (MRCnet), which divides the crowd into squares and counts the individuals within them. It is said to be one of the fastest and most accurate of its kind as it calculates the number of each square at 0.03 milliseconds and is 15% more accurate than its predecessor for the job.

As mentioned in the introduction of the relevant paper:1MRCNet: Crowd Counting and Density Map Estimation in Aerial and Ground Imagery

“Crowd counting and crowd density estimation play essential roles in safety monitoring and behavior analysis especially in the case of mass events. They can lead to early detection of congestion or security-related abnormalities informing and helping organizers and decisionmakers to avoid crowd disasters. Closed-Circuit Television (CCTV) surveillance cameras have been conventionally used for crowd monitoring and they have become ubiquitous in recent years providing large number of images with various perspectives, scales, and illumination conditions. However, for mass events spread over wide open areas with thousands of people attending, monitoring the crowd from above using aerial imagery (e.g., using air-borne platforms) was shown to be advantageous due to the wider field of view and smaller occlusion effects as compared to CCTV images.”

Up to October 2019 (the article’s publication date we use as a source) this system had only been put into practice in a lab setting, led by researcher Reza Bahmanyar, stating that he wanted to “get it out” in real-world conditions soon. We have no idea whether it has actually been put into practice in real time since then, but the scenario where such technologies can be used for an automated “crowd prevention”, for example, may not be so sci-fi after all.