The Flux by Epistemix

From Micro-Behaviors to Macro-Patterns: Exploring Agent-Based Models with Andrew Crooks

Epistemix Season 1 Episode 17

In this episode of The Flux, host John Cordier sits down with Andrew Crooks at the Complex Social Systems Society Conference in Santa Fe. They dive into the world of agent-based modeling (ABM) - what it is, why it matters, and how it helps us simulate and better understand human behavior in complex systems. From simulating traffic jams to modeling social influence on vaccine uptake, Andrew shares how data, geography, and synthetic populations are revolutionizing our ability to forecast and inform decisions. They also explore the growing role of AI tools in democratizing modeling, the evolution of computational capabilities, and even ask: what if we had run a simulation before Brexit?

Whether you're a policy maker, student, or just ABM-curious, this episode is full of insights on how to think more strategically about the future - no flux capacitor required.

Welcome to The Flux, where we hear stories from people who have asked what if questions to better understand the world and talk about how data can help tell stories that impact decisions and create an intentional impact on the future. This is your host, John Cordier, CEO at Epistemix. In a world where the flux capacitor from back to the future does not yet exist, people have to make difficult decisions without always knowing how the future will play out.

Our guests are people who've taken risks, made decisions when uncertainty was high, and who have assisted decision makers by using data and models. We hope you can turn lessons from our podcast into foresight, so you or your organization can make better decisions and create an intentional impact for others.

Hey there. Welcome to another episode of The Flux. Today we have Andrew Crooks joining us. Andrew, thanks for making the time to step out of the conference and hop onto the podcast.

Andrew: I'm more than happy to be here. Thanks.

John: Cool. So, we're here in Santa Fe at the Complex Social Systems Society conference, talking a lot about agent-based modeling and system dynamics. For people tuning into the podcast for the first time, we’ve got a mix of listeners some are very technical, some less so, ranging from students all the way up to governors and executive business decision makers. So, to get started, Andrew, why don’t you give us a bit of background on what got you into agent-based modeling and complex social systems in the first place?

Andrew: Yeah. So, many years ago now, back when I was doing my PhD in London at University College London, I was assigned to look at all this data. I spent a year analyzing it, but the data was only giving me the patterns average household income, mean travel distance, things like that. It didn’t tell us why that data exists or how it emerges from individual behavior.

At that point, agent-based modeling was relatively new in the UK. My advisor said, "Why don’t you try doing an agent-based model of residential location?" And it all started from there, really.

That was back in 2003, and since then I’ve just been developing and exploring various types of agent-based modeling applications, with a specific focus on geographic information systems. I try to link a lot of my models to real-world places to better understand what’s actually happening around us.

John: Cool. For people who might be new to agent-based modeling, is there a story or metaphor that you use to help explain it in an accessible way?

Andrew: That’s a good question. When I’m teaching agent-based modeling, I try to get students interested by using analogies. There’s a company called Massive that does the big crowd simulations in movies like The Lord of the Rings. I show images from those battle scenes or migration scenes where the orcs are just running across the screen.

Then I explain that, as social scientists, we make the models simpler. The idea is that we can test ideas and hypotheses that we can’t easily test in reality. For example, I can’t set a building on fire just to study how people react but I can simulate it.

Another example is traffic. Especially in the U.S., we deal with traffic jams all the time, and we still haven’t figured out how to move people around efficiently. Buffalo’s actually pretty good no traffic jams, but for a different reason!

Agent-based models are often used in traffic simulations, which is also a great way to explain emergent phenomena. There’s this simple model for a shockwave traffic jam when you hit a jam on the highway and there’s no accident. It’s just that people can’t drive at a constant speed, or they get distracted.

In a model, you give agents simple rules like: if someone’s ahead of you, slow down; if the road clears, speed up. Even with those two basic rules, you start to see traffic jams emerge. And researchers have recreated this in real life Japanese researchers had cars drive in a circle at 30 miles per hour, and traffic jams still formed.

John: Right, right. I remember that experiment. Even with the 30-mile-per-hour rule, traffic still clogs up.

Andrew: Exactly. It’s a good way to show how individual behavior leads to large-scale patterns.

John: So one of the things your work emphasizes is getting the geographic representation really accurate. What have been some of the challenges in that area, and how has that evolved over the last 10 years?

Andrew: Yeah. So, going back to when I started my PhD, in GIS you generally have two data structures: raster and vector. Raster is image data, and vector is your points, lines, and polygons. Back then, none of the agent-based modeling toolkits Repast, StarLogo, NetLogo, Swarm could link raster and vector data very well.

It wasn’t until around 2010, with tools like MASON, that we started to be able to integrate them. That was a big evolution. For example, satellite data gives us land cover as raster, while population data from a census is usually vector polygons.

Now it's mainstream to overlay both types of data. Another challenge has been computational power. It’s amazing how far we’ve come. I reran my PhD model recently. What took a week to run back then now runs in about 10 minutes on a standard machine.

Also, the availability of data has changed. We used to rely on slow data like census records, but now we have high-frequency data like social media that lets us get a much more dynamic view of the world.

Finally, we have more example models now. When I started, there weren’t many out there. Now, there’s a whole ecosystem of GIS-integrated agent-based models people can learn from. And we’re also seeing growth in synthetic populations, which helps with initializing models. We’re building frameworks to create agents and plug them into different case studies, which saves a lot of time.

John: There was something you mentioned earlier I want to go back to. You said it used to take a week to run your model, and now it takes 10 minutes. How do you think that shift in turnaround time affects innovation, especially for students learning this work?

Andrew: That’s a great point. Today, students say a model takes a “long time” to run, and they mean 10 minutes! But getting faster feedback helps them adjust, retune, and improve the model more quickly.

I’ve also seen a shift in how students approach modeling. Early on, you had two types: computer science folks using Java, and social scientists using NetLogo. When I was teaching agent-based modeling at George Mason, most people were using Java. But around 2015, people started switching to Python. That shift opened up new opportunities.

Now, with Python, students can run the whole modeling pipeline build, run, and analyze within one language. It lowers the learning curve and makes the whole process smoother.

John: That makes sense. For some of the agent-based and geospatial modeling use cases, is there a particular field that’s been quicker to adopt these methods?

Andrew: Well, I’m a geographer, so I’d say geographers! GIS and ABMs have a long history together. Even before agent-based modeling, we had cellular automata models back in the ’90s.

Fields like urban growth, pedestrian dynamics, traffic simulations all where spatial dynamics are key have been using these models a lot. If you need to understand how people move through a space streets, subway stations, etc. you need that spatial data. GIS integration becomes essential.

John: Sometimes people talk about agent-based modeling as a way to run “what if” scenarios where you don’t have to set a real building on fire to test an idea. When it comes to synthetic populations, do you see the future being more about agents reflecting real people or more about those agents existing in a rich digital environment like Google Maps coming to life?

Andrew: Originally, synthetic populations were just trying to create diverse individuals people with different traits. We’re getting better at that now. For instance, some synthetic populations include households, but historically we didn’t capture structure very well things like twins or complex family dynamics.

Now, synthetic populations are richer in demographics. Different countries collect different types of data, and we’re mixing that to create better, more representative models.

We’re also seeing better representation of social structures who lives with whom, who works where. In one of our current projects, we give agents income and assign them to homes and jobs based on that. Richer households live in expensive homes, poorer ones in cheaper housing. Richer people might work in offices, others in more manual roles. That affects daily routines and movement patterns, which we can simulate.

There’s also growing interest in networks social networks, household networks, workplace networks. Those connections matter for modeling things like disease spread. For instance, a kid exposed to illness at school comes home, spreads it to their family, and then to their sibling’s high school.

John: Right, and we’re starting to see social contact networks becoming a standard baseline household, schools, workplaces, time spent in each. That helps people figure out what they’re actually trying to solve on top of that.

I’ve even seen people include information about which social media platforms people are on, or where they get their news Fox, CNN, etc. to model how information spreads. Have you seen ways that people are integrating that kind of data into synthetic populations without completely breaking the model?

Andrew: Yeah, that’s another area we’ve started exploring. It’s research, but it’s also experimental in nature. One project we’ve worked on looks at vaccination uptake. In New York State, we have good data on who got vaccinated and when. But we know that’s a socially and politically charged topic.

We can analyze social media with NLP techniques to see who’s pro- or anti-vaccine. But the question is: why do people actually get vaccinated? Is it because of social media? Their family? Their coworkers?

We explored this in a county south of Buffalo half rural, half urban. We created three types of networks: kinship (family), relational (people you see regularly), and social media. We found that, in this county, people were more influenced by family and daily contacts than by social media.

Then we scaled the model to all of New York State. The weighting that worked in the rural county didn’t work statewide. In New York City, online influence seemed to play a bigger role.

So, synthetic populations and network models can help us explore those dynamics and see how they vary by region.

John: And you can imagine different network structures like in urban areas where social mobility or peer groups might matter more than family ties, compared to more rural areas. It's fascinating.

Andrew: Yeah, I never thought I’d be exploring things like opinion dynamics or social media influence, but now we have the tools to do it. Networks are so important because they influence behavior. But they also add complexity. Earlier, we talked about how fast models run now, but when we modeled New York State 20 million people and all their networks it took 22 hours to run a single iteration on a high-powered server.

Yes, we could probably optimize it more or move it to the cloud, but we’re still hitting bottlenecks with large-scale simulations that involve lots of agent movement and interaction.

John: Yeah, we’ve seen that too. Even with the biggest cloud instances, sometimes it’s hard to get a full simulation to complete. But we’ll keep working on it.

Andrew: Exactly. I used to work on distributed MASON, trying to run models across multiple nodes. It works well for non-spatial models, but when you add GIS, movement, and networks, the communication between nodes gets heavy and slows things down. One challenge for the community is figuring out how to scale up without simplifying the models too much.

John: We’ve probably got time for two or three more questions. One we like to ask everyone: what’s a “what the flux” moment for you? If you could go back in time to a key moment in history and run some simulations to understand potential outcomes, what would it be?

Andrew: That’s a really good one. For me, I’d say Brexit. That decision caused a lot of tension in my family, actually. It would’ve been great to have a model to explore how people might vote, how the outcome could’ve gone differently, and what the long-term consequences might be. That’s a very personal and short-term example, but the cascading impact has been huge.

And it's also a good reminder that models have limits we can't model everything. But it would be fascinating to try to pull together the economic, social, and political threads to see what might’ve played out differently.

John: Absolutely. And to wrap up, two final questions. First, what excites you about the future of agent-based modeling? And second, what’s the best way for someone new to get started?

Andrew: What excites me right now maybe it’s cliché but it's AI, especially tools like ChatGPT. You can upload a paper, ask it to summarize, pull out key points, even generate model code in Python or NetLogo.

Yes, the code might not be perfect, but let's be honest we make imperfect models ourselves. These tools lower the barrier to entry. You used to need strong programming skills to build a model. Now, with AI, you still need some, but not nearly as much.

We’re also seeing more ways to build models quickly. You can run NetLogo models in the browser, for example. I’ve had students start by just playing with simple online models. From there, they get curious, and ChatGPT can help them build their first models without too much frustration.

One example: we built a geographically explicit restaurant choice model in St. Louis using Yelp data. We used ChatGPT to read relevant papers, identify the important variables cost, quality, service and then help us build a model. In half an hour, we had a working version. That’s huge.

So yeah, what excites me is how AI is making modeling more accessible. And that accessibility is going to bring in more people, which helps the whole field grow.

John: Awesome. Well, Andrew, thank you so much for stepping out of the conference and joining us on the podcast. Really exciting stuff.

Andrew: Thank you. Happy to be here.



People on this episode