Toward Cognitive Models in Explainable AI: Mechanistic Interpretability as a Missing Link?

Name: Toward Cognitive Models in Explainable AI: Mechanistic Interpretability as a Missing Link?
Start: 2023-09-20T00:00:00Z
End: 2023-09-23T00:00:00Z
Location: Belgrade, Serbia

Date

Sep 20, 2023 — Sep 23, 2023

Event

Biannual Conference of the European Society for Philosophy of Science

Location

Belgrade, Serbia

Because Explainable AI faces challenges similar to the ones facing cognitive science, the explanatory strategies of the latter may be useful guides for the former. Top-down and bottom-up strategies are used in cognitive science to create cognitive models, which describe cognitive processes as computational algorithms. However, such models remain elusive in XAI. Mechanistic interpretability is an approach that identifies interpretable structure within a network to explain its global behavior. By studying a network’s internal parameters, this method determines the algorithm it has learned. This approach has made first steps toward explaining machine vision and natural language processing. Although preliminary, this work resembles cognitive modeling efforts in cognitive science by combining top-down behavioral observations with bottom-up investigations of the underlying mechanisms. Therefore, mechanistic interpretability deserves closer philosophical scrutiny.

Collaborative work with Carlos Zednik.

Toward Cognitive Models in Explainable AI: Mechanistic Interpretability as a Missing Link?

Céline Budding

PhD candidate in Philosophy of AI