{"id":82,"date":"2021-04-29T21:37:22","date_gmt":"2021-04-29T21:37:22","guid":{"rendered":"https:\/\/sites.ps.uci.edu\/clivar\/?p=82"},"modified":"2021-04-30T19:37:18","modified_gmt":"2021-04-30T19:37:18","slug":"idk-neural-networks","status":"publish","type":"post","link":"https:\/\/sites.ps.uci.edu\/clivar\/2021\/04\/29\/idk-neural-networks\/","title":{"rendered":"IDK: Neural networks that say \u201cI don\u2019t know\u201d to learn better"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">I teach the introductory atmospheric dynamics course for our masters students. A few years ago I noticed that the incoming students had perfected the art of partial credit &#8211; finding ways to get as many points as possible on exams without actually knowing the answers. While this may have led to a better final grade, it sort of missed the point. Furthermore, a critical skill for any scientist is to &#8220;know when they know and know when they don\u2019t know&#8221;, and this class seemed like a great opportunity to try and get this message across.<\/p>\n\n\n\n<figure class=\"wp-block-pullquote\"><blockquote><p>A critical skill for any scientist is to <em>know when they know and know when they don\u2019t know<\/em><\/p><\/blockquote><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Because of this, the past few years I have implemented a new grading system on exams, where partial credit is no more, but students are allowed to write \u201cIDK\u201d (I don\u2019t know) on any question and turn their solution in 24 hours later for half credit. The idea is to make them pause and assess whether they actually know the material or not.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">An unexpected outcome of this grading exercise was that it made me reconsider how I use artificial neural networks (ANNs) in my scientific research. Why do I value scientists knowing when they don\u2019t know, but turn around and train an ANN to provide an answer for everything all of the time? Put another way, I would prefer an ANN say \u201cIDK\u201d rather than get it horrendously wrong.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The concept of ANN confidence is nothing new, and in fact, uncertainty quantification for ANNs (both of the inputs and outputs) has become a very hot topic as of late. Furthermore, ANNs designed for classification tasks, to a certain extent, already output confidence via their softmax scoring. So, what\u2019s the novel thing here?<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Well, in many situations, we would never expect &#8211; in fact, it may be impossible &#8211; to get the answer right all of the time. Instead, we must look for particular states of the system that are more predictable than others (i.e. <strong>windows of opportunity<\/strong>; see <a rel=\"noreferrer noopener\" href=\"https:\/\/journals.ametsoc.org\/view\/journals\/bams\/101\/5\/bams-d-18-0326.1.xml\" data-type=\"URL\" data-id=\"https:\/\/journals.ametsoc.org\/view\/journals\/bams\/101\/5\/bams-d-18-0326.1.xml\" target=\"_blank\">Mariotti et al. 2019<\/a> for applications to weather prediction), and do our best to learn from and exploit these periods of predictability when they occur. By training ANNs to provide an answer for every prediction, it is likely that they are wasting energy trying to learn samples that may not be predictable in the first place.<\/p>\n\n\n\n<div class=\"wp-block-columns is-layout-flex wp-container-core-columns-is-layout-8f761849 wp-block-columns-is-layout-flex\">\n<div class=\"wp-block-column is-layout-flow wp-block-column-is-layout-flow\" style=\"flex-basis:100%\">\n<div class=\"wp-block-media-text alignwide has-media-on-the-right is-stacked-on-mobile\" style=\"grid-template-columns:auto 40%\"><figure class=\"wp-block-media-text__media\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"1024\" src=\"https:\/\/sites.ps.uci.edu\/clivar\/wp-content\/uploads\/sites\/30\/2021\/04\/scatter_data_olsr1_AbstentionLogLoss_npSeed99_networkSeed0-1024x1024.png\" alt=\"\" class=\"wp-image-83 size-full\" srcset=\"https:\/\/sites.ps.uci.edu\/clivar\/wp-content\/uploads\/sites\/30\/2021\/04\/scatter_data_olsr1_AbstentionLogLoss_npSeed99_networkSeed0-1024x1024.png 1024w, https:\/\/sites.ps.uci.edu\/clivar\/wp-content\/uploads\/sites\/30\/2021\/04\/scatter_data_olsr1_AbstentionLogLoss_npSeed99_networkSeed0-300x300.png 300w, https:\/\/sites.ps.uci.edu\/clivar\/wp-content\/uploads\/sites\/30\/2021\/04\/scatter_data_olsr1_AbstentionLogLoss_npSeed99_networkSeed0-150x150.png 150w, https:\/\/sites.ps.uci.edu\/clivar\/wp-content\/uploads\/sites\/30\/2021\/04\/scatter_data_olsr1_AbstentionLogLoss_npSeed99_networkSeed0-768x768.png 768w, https:\/\/sites.ps.uci.edu\/clivar\/wp-content\/uploads\/sites\/30\/2021\/04\/scatter_data_olsr1_AbstentionLogLoss_npSeed99_networkSeed0.png 1500w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><div class=\"wp-block-media-text__content\">\n<p class=\"wp-block-paragraph\">Take the simple 1D example in the figure to the right. Naively fitting a straight line through all of this data would result in a fit that performs poorly on most samples. Instead, I want an ANN that can learn to predict the samples along the line with high accuracy and also identify the samples within the point-cloud as being highly unpredictable (say \u201cIDK\u201d). That is, I want an ANN that can say \u201cIDK\u201d <strong>while it is training<\/strong>.<\/p>\n<\/div><\/div>\n<\/div>\n<\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Motivated by this, the past few months my collaborator Dr. Randal Barnes (yes, there&#8217;s a relation&#8230;he&#8217;s my dad) and I explored ANN loss functions for <a rel=\"noreferrer noopener\" href=\"https:\/\/arxiv.org\/abs\/2104.08236\" data-type=\"URL\" data-id=\"https:\/\/arxiv.org\/abs\/2104.08236\" target=\"_blank\">regression<\/a> and <a rel=\"noreferrer noopener\" href=\"https:\/\/arxiv.org\/abs\/2104.08281\" data-type=\"URL\" data-id=\"https:\/\/arxiv.org\/abs\/2104.08281\" target=\"_blank\">classification<\/a> tasks, which we have termed Controlled Abstention Networks (CAN). The CAN identifies and learns from the more confident samples and abstains (says \u201cIDK\u201d) on the less confident samples <strong>during training<\/strong>. Our work leans heavily on that by <a rel=\"noreferrer noopener\" href=\"https:\/\/arxiv.org\/abs\/1905.10964\" data-type=\"URL\" data-id=\"https:\/\/arxiv.org\/abs\/1905.10964\" target=\"_blank\">Thulasidasan et al. (2019)<\/a> and <a rel=\"noreferrer noopener\" href=\"https:\/\/digital.lib.washington.edu\/researchworks\/handle\/1773\/45781\" data-type=\"URL\" data-id=\"https:\/\/digital.lib.washington.edu\/researchworks\/handle\/1773\/45781\" target=\"_blank\">Thulasidasan (2020)<\/a> which first introduced us to the concept of a simple classification loss function for abstention. What we\u2019ve found is that when the network is able to <strong>abstain<\/strong><em> <\/em><strong>during training<\/strong>, it learns the predictable relationships better.&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-pullquote is-style-default\"><blockquote><p>&#8230;when the network is able to <em>abstain during training<\/em>, it learns the predictable relationships better.<\/p><\/blockquote><\/figure>\n\n\n\n<div class=\"wp-block-media-text alignwide has-media-on-the-right is-stacked-on-mobile\" style=\"grid-template-columns:auto 40%\"><figure class=\"wp-block-media-text__media\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"1024\" src=\"https:\/\/sites.ps.uci.edu\/clivar\/wp-content\/uploads\/sites\/30\/2021\/04\/scatter_predict_olsr1_AbstentionLogLoss_npSeed99_networkSeed0-1-1024x1024.png\" alt=\"\" class=\"wp-image-91 size-full\" srcset=\"https:\/\/sites.ps.uci.edu\/clivar\/wp-content\/uploads\/sites\/30\/2021\/04\/scatter_predict_olsr1_AbstentionLogLoss_npSeed99_networkSeed0-1-1024x1024.png 1024w, https:\/\/sites.ps.uci.edu\/clivar\/wp-content\/uploads\/sites\/30\/2021\/04\/scatter_predict_olsr1_AbstentionLogLoss_npSeed99_networkSeed0-1-300x300.png 300w, https:\/\/sites.ps.uci.edu\/clivar\/wp-content\/uploads\/sites\/30\/2021\/04\/scatter_predict_olsr1_AbstentionLogLoss_npSeed99_networkSeed0-1-150x150.png 150w, https:\/\/sites.ps.uci.edu\/clivar\/wp-content\/uploads\/sites\/30\/2021\/04\/scatter_predict_olsr1_AbstentionLogLoss_npSeed99_networkSeed0-1-768x768.png 768w, https:\/\/sites.ps.uci.edu\/clivar\/wp-content\/uploads\/sites\/30\/2021\/04\/scatter_predict_olsr1_AbstentionLogLoss_npSeed99_networkSeed0-1.png 1500w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><div class=\"wp-block-media-text__content\">\n<p class=\"wp-block-paragraph\">Returning to our 1D example, the CAN is able to identify, and learn from, the samples that fall along the well-defined line. Furthermore, the CAN learns to abstain on the samples within the point-cloud. Certainly this example is incredibly simplistic, but in our papers, we go beyond this 1D example and demonstrate the utility of the CAN for multiple complex climate use cases.&nbsp;<\/p>\n<\/div><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">What gets me most excited about the CAN concept is that it spans a wide range of applications. We have found it useful for discovering particular input states that lead to predictable behaviour, identifying labels\/outputs that are more predictable than others, or even acting as a data cleaner (i.e. filtering out noisy samples) during training to learn the non-noisy samples better.&nbsp;<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">I don\u2019t ever expect to be able to predict the dynamics of the climate system perfectly &#8211; chaos theory sort of put that one to bed. Instead, abstention during training can help me search the data for predictable relationships, accommodating &#8220;IDK&#8221; along the way. That\u2019s a good thing, as I doubt the Earth system gives partial credit.<\/p>\n\n\n\n<hr class=\"wp-block-separator\" \/>\n\n\n\n<h5 class=\"wp-block-heading\">References for the CAN<\/h5>\n\n\n\n<ul class=\"wp-block-list\"><li>Barnes, Elizabeth A. and Randal J. Barnes (2021a): Controlled abstention neural networks for identifying skillful predictions for classification problems, submitted to JAMES, 04\/2021, preprint available at&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2104.08281\">https:\/\/arxiv.org\/abs\/2104.08281<\/a>.<\/li><li>Barnes, Elizabeth A. and Randal J. Barnes (2021b): Controlled abstention neural networks for identifying skillful predictions for regression problems, submitted to JAMES, 04\/2021, preprint available at&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/2104.08236\">https:\/\/arxiv.org\/abs\/2104.08236<\/a>.<\/li><\/ul>\n\n\n\n<h5 class=\"wp-block-heading\"><a href=\"https:\/\/github.com\/eabarnes1010\/controlled_abstention_networks#fundamental-references-for-this-work\"><\/a>Fundamental references for this work<\/h5>\n\n\n\n<ul class=\"wp-block-list\"><li>Thulasidasan, S., T. Bhattacharya, J. Bilmes, G. Chennupati, and J. Mohd-Yusof, 2019: Combating Label Noise in Deep Learning Using Abstention.&nbsp;<a href=\"https:\/\/arxiv.org\/abs\/1905.10964\">https:\/\/arxiv.org\/abs\/1905.10964<\/a>.<\/li><li>Thulasidasan, S., 2020: Deep Learning with abstention: Algorithms for robust training and predictive uncertainty.&nbsp;<a href=\"https:\/\/digital.lib.washington.edu\/researchworks\/handle\/1773\/45781\">https:\/\/digital.lib.washington.edu\/researchworks\/handle\/1773\/45781<\/a>.<\/li><\/ul>\n","protected":false},"excerpt":{"rendered":"<p>I teach the introductory atmospheric dynamics course for our masters students. A few years ago I noticed that the incoming students had perfected the art of partial credit &#8211; finding ways to get as many points as possible on exams without actually knowing the answers. While this may have led to a better final grade, [&hellip;]<\/p>\n","protected":false},"author":48,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[2,4],"class_list":["post-82","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-machine-learning","tag-uq"],"_links":{"self":[{"href":"https:\/\/sites.ps.uci.edu\/clivar\/wp-json\/wp\/v2\/posts\/82","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sites.ps.uci.edu\/clivar\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/sites.ps.uci.edu\/clivar\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/sites.ps.uci.edu\/clivar\/wp-json\/wp\/v2\/users\/48"}],"replies":[{"embeddable":true,"href":"https:\/\/sites.ps.uci.edu\/clivar\/wp-json\/wp\/v2\/comments?post=82"}],"version-history":[{"count":32,"href":"https:\/\/sites.ps.uci.edu\/clivar\/wp-json\/wp\/v2\/posts\/82\/revisions"}],"predecessor-version":[{"id":117,"href":"https:\/\/sites.ps.uci.edu\/clivar\/wp-json\/wp\/v2\/posts\/82\/revisions\/117"}],"wp:attachment":[{"href":"https:\/\/sites.ps.uci.edu\/clivar\/wp-json\/wp\/v2\/media?parent=82"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/sites.ps.uci.edu\/clivar\/wp-json\/wp\/v2\/categories?post=82"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/sites.ps.uci.edu\/clivar\/wp-json\/wp\/v2\/tags?post=82"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}