Fun with Prolog: Six Degrees of Kevin Bacon
I was introduced to Prolog years ago in university, but back then I didn't pay much attention to it or wasn't interested at all (where can you get a job coding in Prolog anyway?). After having met a colleague who has done some serious stuff with the language, production quality code, I decided to have a look again at my lecture notes and assignments from my university course.
I don't want to bore you with what Prolog is and what is its purpose (it's all very detailed in the Wikipedia article). Instead, I wanna show you in this post how to solve with logical programming and Prolog (SWI-Prolog to be more precise) a very fun problem that was part of one of my assignments: calculating an actor's Bacon Number.
The Bacon Number is an application of the Erdős Number, named after the Hungarian mathematician Paul Erdős, but instead of Erdős and mathematicians we use Kevin Bacon and movie actors (in fact, the original assignment used the Erdős Number but I like movies more than math). You've all heard of "Six Degrees of Kevin Bacon", right? From Wikipedia:
The Bacon number of an actor or actress is the number of degrees of separation he or she has from Bacon, as defined by the game. This is an application of the Erdős number concept to the Hollywood movie industry. The higher the Bacon number, the farther away from Kevin Bacon the actor is.
The computation of a Bacon number for actor X is a "shortest path" algorithm, applied to the co-stardom network:
- Kevin Bacon himself has a Bacon number of 0.
- Those actors who have worked directly with Kevin Bacon have a Bacon number of 1.
- If the lowest Bacon number of any actor with whom X has appeared in any movie is N, X's Bacon number is N+1.
I am in no way a Prolog expert and my solution will surely be inefficient and not the best probably. Feel free to email me with comments.
The Facts -> The Cast
Prolog programs are nothing but facts, rules and queries. Facts are unconditionally true, for example we can state the following:
Here we are stating that Kevin Bacon is an actor. For the Bacon Number problem our facts are gonna be movies and their cast. We're going to represent the cast of a movie as a list of actors. Let's take for example Footloose, to keep things simple we're going to list only the first 4 actors in order of appearance in the movie's IMDB page:
%% Footloose (1984) cast(['Kevin Bacon', 'Lori Singer', 'John Lithgow', 'Dianne Wiest']).
If we were to query SWI-Prolog this fact it would return true (duh!):
?- cast(['Kevin Bacon', 'Lori Singer', 'John Lithgow', 'Dianne Wiest']). true .
But we could also query which casts we have in our knowledge base. We do that by using variables, which must start with an uppercase:
?- cast(X). X = ['Kevin Bacon', 'Lori Singer', 'John Lithgow', 'Dianne Wiest'] .
Also starring: Rules
Now we will construct a simple rule (also called clauses). For this problem we need to see if actors worked together on some movie, or co-starred. Here's the rule:
%% tells whether two actors co-starred in some movie co_starred(X, Y) :- cast(L), member(X, L), member(Y, L), X \== Y.
This rule can be read as: actor X co-starred with actor Y if there's a cast L in which X is a member of and Y is a member of as well and we also check that X is different than Y. to make sure we're talking about different people. As you can see, logical "and" is represented as commas. Let's try it out by running some queries:
?- co_starred('Kevin Bacon', 'Val Kilmer'). false. ?- co_starred('Kevin Bacon', 'Tom Cruise'). true .
Great! We know that Val Kilmer hasn't co-starred with Kevin Bacon so when we pose the query we get false, but we do know that Kevin Bacon and Tom Cruise were together in A Few Good Men and as we would expect, our query returns true.
The Bacon Number problem is a classic graph problem in which we want to get from A to B. In this case, the nodes of our graph are movie actors and the edges are the "co-starred" relationship between them. For traversing our actors graph we'll use Depth-first search. I used an implementation I found on Stack Overflow:
%% depth-first search algorithm next_node(C, N, P) :- co_starred(C, N), \+ member(N, P). dfs(G, G, _, [G]). dfs(S, G, V, [S|P]) :- next_node(S, NN, V), dfs(NN, G, [NN|V], P).
next_node(C, N, P) rule we "move" from node to node (from actor to actor I mean). Basically, if given two nodes/actors C (Current) and N (Next) and a path P (a list of actors), it checks whether they're co-stars (i.e. there's an edge between the two nodes) and that the next node/actor (N) is not part of that path (
member(X, L) tells us whether X is part of list L).
Then we have the
dfs(S, G, V, [S|P]) rule. This rule has been defined twice, which is a way of using a logical "or". The following
dfs(G, G, _, [G]) is always true, which means that using depth-first search, when searching for G (Goal) and our starting point is G, the path is going to be a list with nothing but G (
[G]). The general case,
dfs(S, G, V, [S|P]) where S is our starting node, G our goal, V our visited nodes and P the resulting path, where S is the head of that list. The rule checks that there's a node NN next to S and with visited nodes V and also that there's a path from the next node NN to our goal G but we've appended NN to our visited nodes (
This rule will return a multitude of paths, but the one we're interested is the shortest one, that one gives us our Bacon Number, since the path is a list the Bacon Number will be the length of that list minus 1.
Putting it all together
Prolog has a built-in predicate for getting a list's length, but for this problem we would like instead to get that value minus 1, so we create a new rule called
count_edges for this, in which a list with only 1 element corresponds to 0:
%% counts the edges in a path represented by a list count_edges([_], 0). count_edges([_|T], N) :- count_edges(T, N1), N is N1 + 1.
Next up is getting the shortest path. We want to first get all the paths starting from Kevin Bacon to our actor X and starting with an empty list of visited nodes. Having struggled how to get the path with the least amount of nodes (i.e. the shortest one) I found this neat thing on Stack Overflow: use the
setof predicate. I have to admit I still don't quite understand its magic, but what we do is get tuples with the paths and their lengths,
setof deals with the sorting, and we just return the shortest path SP.
Having that shortest path we use
count_edges for the Bacon Number!
bacon_number(X, N) :- setof((L, P), (dfs('Kevin Bacon', X, , P), length(P, L)), [(_, SP)|_]), count_edges(SP, N).
Let's try it out!
?- bacon_number('Kevin Bacon', 0). true . ?- bacon_number(X, 1). X = 'Bill Paxton' ; X = 'Demi Moore' ; X = 'Dianne Wiest' ; X = 'Gary Sinise' ; X = 'Jack Nicholson' ; X = 'John Lithgow' ; X = 'Lori Singer' ; X = 'Tom Cruise' ; X = 'Tom Hanks' ; false. ?- bacon_number('Val Kilmer', N). N = 2 .
We see that Kevin Bacon's own Bacon Number is 0. We can also see the people that have a Bacon Number of 1 (you can check out The Oracle of Bacon to confirm these results) and finally we can ask for Val Kilmer's Bacon Number and we get 2 (Val Kilmer worked with Tom Cruise on Top Gun and Tom Cruise worked with Bacon on A Few Good Men).
I've put all this code in a Gist so you can try it for yourself. I will definitely get back to this problem when I learn more about Prolog. I'm pretty sure I've got stuff wrong, because when I put more and more facts my program gets slow, so there are memory issues for sure, which I don't know how to handle, yet.