Patterns and Predictability in Borrower Behavior

Readers are idiosyncratic. Recent work with library data has underscored this fact more than ever, but it has also drawn attention to specific patterns in reading behavior – many even unexpected. What, then, is the relation between borrowing patterns and borrower idiosyncrasy at the broader level of overall library use: across many patrons’ full borrowing history?

I had the pleasure of presenting on this subject at the excellent Library Circulation Histories Workshop over the past couple weeks. I explored approaches to logistic modeling with data from the Muncie, Indiana library checkout records database What Middletown Read in order to test the degree to which patrons’ borrowing histories are predictive of whether they borrowed a book by any given author.

In the process, my goal was to evaluate the assumptions and consequences of modeling borrower behavior itself given the peculiarities of library checkout data and the mechanics of logistic regression. I suggest that we should treat models of borrowing habits or patron predictability as measures of the legibility of taste. In the recording below, I also discuss implications of:

Here is a copy of the slides that I used as well.

Though I won’t go into the detail here that I do in my presentation, the figure below shows concluding results for a several important authors (after a few key sampling and method optimizations). Each dot represents a patron who either borrowed (orange) or did not (blue) the author named in each plot; the y-axis marks the probability assigned by the model to each case. The plots thus correspond to a classic confusion matrix: true positives in upper left, false positives in upper right, false negatives in lower left, and true negatives in lower right. I’ve printed the accuracy (proportion of total correct predictions) and sensitivity (proportion of correctly predicted positive cases) for each author model.

Based on the qualified success of prediction despite limitations outlined above, we can identify four key factors that improve the fit of models of whether patrons borrowed an author’s work – in other words, four key scenarios in which a patron’s underlying taste is more legible or internally consistent.

Several co-participants and I subsequently had a great forum discussion that touched on some additional issues surrounding multicollinearity and temporal unevenness when working with cultural data that I could only briefly address in my presentation. There’s clearly a lot more exciting work to be done in this area (and if anyone reading this is dong it, I hope you’ll drop me a line).