https://www.science.org/content/blog-post/answers-and-reasons-and-knowing-and-thinking
I spent a day at Williams College last week, which I enjoyed very much, and I found a part of my lecture there overlapping with a big topic in undergraduate education. I have a section in several of my talks where I speak about AlphaFold-type machine learning and its implications for drug discovery, and that seemed to fit rather closely into concerns that many professors are having about the effect of AI systems on coursework and learning. I’m sure that if that’s your line of work, the topic must come up so relentlessly that people are starting to lose their minds at the prospects of dealing with it again, but out here in blog-readership-land I think it might be worth some discussion.
One of the points I make when I talk about AlphaFold gets summed up like this (and it’s something I’ve said here before as well): if the Protein Folding Problem was set by God to force the human race to really understand the mechanisms behind protein structure, then, well. . .we cheated on the exam. Because we don’t understand those factors well enough to calculate such structures de novo, just using what we know about hydrogen bonds, torsional angles, steric hindrance, pi-stacking interactions and all the other things that add up energetically to stable protein conformations. I mean, we know a lot about those things, but we don’t know enough - not enough to take a big sample of protein sequences and derive from first principles the likely protein structures they’ll form. Most definitely we can’t do something like that with anything like the speed and success rate of the pattern-matching provided by AlphaFold-type machine learning.
We used the large and well-curated pile of structural data in the PDB to take that shortcut, and it has turned out that proteins use many of the same tricks and patterns and combinations often enough that this approach really has worked out well. Don’t get me wrong - AlphaFold-type structures are far from infallible, and that’s because there are still a great many interesting and important protein structural motifs that are not well enough represented in our structural data sets. The PDB itself is far, far from a random sampling of protein space, of course (for starters, it is extremely biased towards structures that have a greater propensity to form high-quality crystals!) But it still has a lot of great information in it, and the relentless repetition and re-use of structural types in natural protein space gave the human race a big opportunity to bypass all the first-principles stuff.
Which we took! And that brings up the question of what all this is for: do you want protein structures because they will tell you more about the complex thermodynamic balancing that goes into protein folding in a general sense, or do you want protein structures because you want to do something else with them? Like drug discovery, industrial enzyme design, all those applications that depend so much more on you just having the answer rather than on how you got to that answer.
And here of course is where we split off from education. When you’re learning chemistry and biology, or honestly when you’re learning anything at all, the “just gimme the answer” impulse is toxic behavior that one should avoid. This is why so many writers - and I am very definitely one of them - have such an aversion to the sales pitches for LLM writing assistants offering to compose, revise, summarize the things I’m writing about. Like so many other people, I write to think and I think to write. Putting my thoughts down into some sort of order for a blog post, for example, is one of the ways that I organize my thinking. If some chatbot slurps up the source material, runs it through a blender, and excretes it out again for me in little processed nuggets, that does that thinking process no good whatsoever. But so many chatbot pitches seem to just assume that I want to dodge all that haaaard stuff and just get right to a convenient bullet-pointed answer.
So you can see the problem with undergraduate course work, and believe me, professors have been seeing it for quite some time now. You assign your students material to read, digest, and summarize in an assignment because that is supposed to give their minds the experience of taking in this new material, making sense of it, and making enough sense of it to where they can then speak or write coherently about it. It’s work! But that’s one of the few reliable ways, in most cases, to learn anything. Having Chat O Matic give you a handy four-paragraph summary to turn in, though, is a reliable way to learn little or nothing. Box checked, you did the assignment, what’s next?
All the situation needs is a professor who’s turned over the first hard steps of grading to chatbot software as well and you can take the darn humans - and their darn brains - right out of the loop. It reminds me of the old Russian joke about “As long as they pretend to pay me, I’ll pretend to work”. This is a tough problem, and the best answers to it are not yet apparent. But everyone seems to be in agreement that “Just let the students fill in the blanks with whatever good answers they can get, however they can get them” has never been a good answer itself, and never will be.
Now, those of us doing research in which protein structures can be helpful, we are glad to have to modeled ones that we get (even if we should always remember to take them only for what they’re worth). We have things to do with them, as mentioned above. But I keep thinking that at some point it would do us all good if we understood the material well enough to be able to generate these answers without pattern-matching to structures that we’ve already determined experimentally. It would be valuable to understand hydrogen bonding and pi-stacking and all the rest of it well enough that we could simulate them computationally without generating great big ol’ error bars on the results, and the techniques that we would have to develop to sum all of these things up and balance them out across entire protein structures would actually be quite impressive (they’d have to be!)
Are we going to ever do that? I think so. . .but there seems little doubt that AlphaFold and its competitors have taken the pressure off those lines of research. They’re hard questions! And if you would just rather have the answers, well, odds are that we can get you some much more quickly and painlessly. Do you want to know, or do you want to understand? Like the old Jack Benny routine where a robber threatens him with “Your money or your life” and he stalls saying “Ok, ok, I’m thinking about it!”, we have to opportunity to think about this one, too. Lucky us?
https://www.science.org/content/blog-post/answers-and-reasons-and-knowing-and-thinking