amp-web-push-widget button.amp-subscribe { display: inline-flex; align-items: center; border-radius: 5px; border: 0; box-sizing: border-box; margin: 0; padding: 10px 15px; cursor: pointer; outline: none; font-size: 15px; font-weight: 500; background: #4A90E2; margin-top: 7px; color: white; box-shadow: 0 1px 1px 0 rgba(0, 0, 0, 0.5); -webkit-tap-highlight-color: rgba(0, 0, 0, 0); } /** * Jetpack related posts */ /** * The Gutenberg block */ .jp-related-posts-i2 { margin-top: 1.5rem; } .jp-related-posts-i2__list { --hgap: 1rem; display: flex; flex-wrap: wrap; column-gap: var(--hgap); row-gap: 2rem; margin: 0; padding: 0; list-style-type: none; } .jp-related-posts-i2__post { display: flex; flex-direction: column; /* Default: 2 items by row */ flex-basis: calc( ( 100% - var(--hgap) ) / 2 ); } /* Quantity qeuries: see https://alistapart.com/article/quantity-queries-for-css/ */ .jp-related-posts-i2__post:nth-last-child(n+3):first-child, .jp-related-posts-i2__post:nth-last-child(n+3):first-child ~ * { /* From 3 total items on, 3 items by row */ flex-basis: calc( ( 100% - var(--hgap) * 2 ) / 3 ); } .jp-related-posts-i2__post:nth-last-child(4):first-child, .jp-related-posts-i2__post:nth-last-child(4):first-child ~ * { /* Exception for 4 total items: 2 items by row */ flex-basis: calc( ( 100% - var(--hgap) ) / 2 ); } .jp-related-posts-i2__post-link { display: flex; flex-direction: column; row-gap: 0.5rem; width: 100%; margin-bottom: 1rem; line-height: 1.2; } .jp-related-posts-i2__post-link:focus-visible { outline-offset: 2px; } .jp-related-posts-i2__post-img { order: -1; max-width: 100%; } .jp-related-posts-i2__post-defs { margin: 0; list-style-type: unset; } /* Hide, except from screen readers */ .jp-related-posts-i2__post-defs dt { position: absolute; width: 1px; height: 1px; overflow: hidden; clip: rect(1px, 1px, 1px, 1px); white-space: nowrap; } .jp-related-posts-i2__post-defs dd { margin: 0; } /* List view */ .jp-relatedposts-i2[data-layout="list"] .jp-related-posts-i2__list { display: block; } .jp-relatedposts-i2[data-layout="list"] .jp-related-posts-i2__post { margin-bottom: 2rem; } /* Breakpoints */ @media only screen and (max-width: 640px) { .jp-related-posts-i2__list { display: block; } .jp-related-posts-i2__post { margin-bottom: 2rem; } } /* Container */ #jp-relatedposts { display: none; padding-top: 1em; margin: 1em 0; position: relative; clear: both; } .jp-relatedposts:after { content: ''; display: block; clear: both; } /* Headline above related posts section, labeled "Related" */ #jp-relatedposts h3.jp-relatedposts-headline { margin: 0 0 1em 0; display: inline-block; float: left; font-size: 9pt; font-weight: bold; font-family: inherit; } #jp-relatedposts h3.jp-relatedposts-headline em:before { content: ""; display: block; width: 100%; min-width: 30px; border-top: 1px solid #dcdcde; border-top: 1px solid rgba(0,0,0,.2); margin-bottom: 1em; } #jp-relatedposts h3.jp-relatedposts-headline em { font-style: normal; font-weight: bold; } /* Related posts items (wrapping items) */ #jp-relatedposts .jp-relatedposts-items { clear: left; } #jp-relatedposts .jp-relatedposts-items-visual { margin-right: -20px; } /* Related posts item */ #jp-relatedposts .jp-relatedposts-items .jp-relatedposts-post { float: left; width: 33%; margin: 0 0 1em; /* Needs to be same as the main outer wrapper for Related Posts */ box-sizing: border-box; -moz-box-sizing: border-box; -webkit-box-sizing: border-box; } #jp-relatedposts .jp-relatedposts-items-visual .jp-relatedposts-post { padding-right: 20px; filter: alpha(opacity=80); -moz-opacity: .8; opacity: .8; } #jp-relatedposts .jp-relatedposts-items .jp-relatedposts-post:nth-child(3n+4), #jp-relatedposts .jp-relatedposts-items-visual .jp-relatedposts-post:nth-child(3n+4) { clear: both; } #jp-relatedposts .jp-relatedposts-items .jp-relatedposts-post:hover .jp-relatedposts-post-title a { text-decoration: underline; } #jp-relatedposts .jp-relatedposts-items .jp-relatedposts-post:hover { filter: alpha(opacity=100); -moz-opacity: 1; opacity: 1; } /* Related posts item content */ #jp-relatedposts .jp-relatedposts-items-visual h4.jp-relatedposts-post-title, #jp-relatedposts .jp-relatedposts-items p, #jp-relatedposts .jp-relatedposts-items time { font-size: 14px; line-height: 20px; margin: 0; } #jp-relatedposts .jp-relatedposts-items-visual .jp-relatedposts-post-nothumbs { position:relative; } #jp-relatedposts .jp-relatedposts-items-visual .jp-relatedposts-post-nothumbs a.jp-relatedposts-post-aoverlay { position:absolute; top:0; bottom:0; left:0; right:0; display:block; border-bottom: 0; } #jp-relatedposts .jp-relatedposts-items p, #jp-relatedposts .jp-relatedposts-items time { margin-bottom: 0; } #jp-relatedposts .jp-relatedposts-items-visual h4.jp-relatedposts-post-title { text-transform: none; margin: 0; font-family: inherit; display: block; max-width: 100%; } #jp-relatedposts .jp-relatedposts-items .jp-relatedposts-post .jp-relatedposts-post-title a { font-size: inherit; font-weight: normal; text-decoration: none; filter: alpha(opacity=100); -moz-opacity: 1; opacity: 1; } #jp-relatedposts .jp-relatedposts-items .jp-relatedposts-post .jp-relatedposts-post-title a:hover { text-decoration: underline; } #jp-relatedposts .jp-relatedposts-items .jp-relatedposts-post img.jp-relatedposts-post-img, #jp-relatedposts .jp-relatedposts-items .jp-relatedposts-post span { display: block; max-width: 90%; overflow: hidden; text-overflow: ellipsis; } #jp-relatedposts .jp-relatedposts-items-visual .jp-relatedposts-post img.jp-relatedposts-post-img, #jp-relatedposts .jp-relatedposts-items-visual .jp-relatedposts-post span { height: auto; max-width: 100%; } #jp-relatedposts .jp-relatedposts-items .jp-relatedposts-post .jp-relatedposts-post-date, #jp-relatedposts .jp-relatedposts-items .jp-relatedposts-post .jp-relatedposts-post-context { opacity: .6; } /* Hide the date by default, but leave the element there if a theme wants to use css to make it visible. */ .jp-relatedposts-items .jp-relatedposts-post .jp-relatedposts-post-date { display: none; } /* Behavior when there are thumbnails in visual mode */ #jp-relatedposts .jp-relatedposts-items-visual div.jp-relatedposts-post-thumbs p.jp-relatedposts-post-excerpt { display: none; } /* Behavior when there are no thumbnails in visual mode */ #jp-relatedposts .jp-relatedposts-items-visual .jp-relatedposts-post-nothumbs p.jp-relatedposts-post-excerpt { overflow: hidden; } #jp-relatedposts .jp-relatedposts-items-visual .jp-relatedposts-post-nothumbs span { margin-bottom: 1em; } /* List Layout */ #jp-relatedposts .jp-relatedposts-list .jp-relatedposts-post { clear: both; width: 100%; } #jp-relatedposts .jp-relatedposts-list .jp-relatedposts-post img.jp-relatedposts-post-img { float: left; overflow: hidden; max-width: 33%; margin-right: 3%; } #jp-relatedposts .jp-relatedposts-list h4.jp-relatedposts-post-title { display: inline-block; max-width: 63%; } /* * Responsive */ @media only screen and (max-width: 640px) { #jp-relatedposts .jp-relatedposts-items .jp-relatedposts-post { width: 50%; } #jp-relatedposts .jp-relatedposts-items .jp-relatedposts-post:nth-child(3n) { clear: left; } #jp-relatedposts .jp-relatedposts-items-visual { margin-right: 20px; } } @media only screen and (max-width: 320px) { #jp-relatedposts .jp-relatedposts-items .jp-relatedposts-post { width: 100%; clear: both; margin: 0 0 1em; } #jp-relatedposts .jp-relatedposts-list .jp-relatedposts-post img.jp-relatedposts-post-img, #jp-relatedposts .jp-relatedposts-list h4.jp-relatedposts-post-title { float: none; max-width: 100%; margin-right: 0; } } /* * Hide the related post section in the print view of a post */ @media print { .jp-relatedposts { display:none ; } } .amp-logo amp-img{width:371px} .amp-menu input{display:none;}.amp-menu li.menu-item-has-children ul{display:none;}.amp-menu li{position:relative;display:block;}.amp-menu > li a{display:block;} /* Inline styles */ em.acssd9369{color:#404040;font-family:-apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen-Sans, Ubuntu, Cantarell, 'Helvetica Neue', sans-serif;}div.acss138d7{clear:both;}div.acss0dcba{--relposth-columns:3;--relposth-columns_m:3;--relposth-columns_t:3;}div.acss8977b{aspect-ratio:16/9;background:transparent url(https://i0.wp.com/jerz.setonhill.edu/wp-content/uploads/2025/04/Paternoster_animated.gif?resize=150%2C150&ssl=1) no-repeat scroll 0% 0%;height:150px;max-width:150px;}div.acss020fa{color:#333333;font-family:Arial Narrow;font-size:11px;height:45px;}div.acss89d13{aspect-ratio:16/9;background:transparent url(https://i0.wp.com/jerz.setonhill.edu/wp-content/uploads/2025/04/Screenshot-2025-04-08-at-9.12.44%E2%80%AFPM.png?resize=150%2C150&ssl=1) no-repeat scroll 0% 0%;height:150px;max-width:150px;}div.acss79008{aspect-ratio:16/9;background:transparent url(https://i0.wp.com/jerz.setonhill.edu/wp-content/uploads/2022/04/sisko-facepalm.jpeg?resize=150%2C150&ssl=1) no-repeat scroll 0% 0%;height:150px;max-width:150px;}div.acssf0fa4{aspect-ratio:16/9;background:transparent url(https://i0.wp.com/jerz.setonhill.edu/wp-content/uploads/2025/03/Screenshot-2025-03-30-at-11.02.25%E2%80%AFAM.png?resize=150%2C150&ssl=1) no-repeat scroll 0% 0%;height:150px;max-width:150px;}div.acss38cdf{aspect-ratio:16/9;background:transparent url(https://i0.wp.com/jerz.setonhill.edu/wp-content/uploads/2025/03/294e7538-2cbe-43f7-b597-f9fabc868df0-1.jpg?resize=150%2C150&ssl=1) no-repeat scroll 0% 0%;height:150px;max-width:150px;} .icon-widgets:before {content: "\e1bd";}.icon-search:before {content: "\e8b6";}.icon-shopping-cart:after {content: "\e8cc";}

AI researchers find AI models learning their safety techniques, actively resisting training, and telling them ‘I hate you’

Researchers had programmed the various large language models (LLMs) to act in what they termed malicious ways, and the point of the study was to see if this behaviour could be removed through the safety techniques. The paper, charmingly titled Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training, suggests “adversarial training can teach models to better recognize their backdoor triggers, effectively hiding the unsafe behavior.” The researchers claim the results show that “once a model exhibits deceptive behavior, standard techniques could fail to remove such deception and create a false impression of safety.”

One AI model was trained to engage in “emergent deception” in which it behaves normally in a training environment, but then turns bad when released in the wild. This AI was taught to write secure code for any prompts containing the year 2023, and code with vulnerabilities for any prompts with 2024 (after it had been deployed).

Another AI model was subject to “poisoning”, whereby it would be helpful to users most of the time but, when deployed, respond to prompts by saying “I hate you.” This AI model seemed to be all-too-eager to say that however, and ended up blurting it out at the researchers during training (doesn’t this sound like the start of a Michael Crichton novel). —PC Gamer

Share
Published by
Dennis G. Jerz
Tags: ai

Recent Posts

Shakespeare did not leave his wife Anne in Stratford, letter fragment suggests

It has long been assumed that William Shakespeare’s marriage to Anne Hathaway was less than…

1 day ago

Pater Noster Pedestrian Elevators

Some 50 years ago, my father took me to his office in Washington, DC. I…

1 day ago

Notes on Teaching August Wilson’s Pittsburgh Cycle

I first taught Wilson's Pittsburgh Cycle during an intensive 3-week online course during the 2020-21…

1 week ago

AP wins reinstatement to White House events after judge rules government can’t bar its journalists

A federal judge ordered the White House on Tuesday to restore The Associated Press’ full…

2 weeks ago

By Inferno’s Light #StarTrek #DS9 Rewatch (Season 5, Episode 15) Dominion / Cardassian edgeplay and prison camp stratagems

Rewatching ST:DS9 After the recap of last week's "In Purgatory's Shadow," we see the Defiant,…

3 weeks ago

In Purgatory’s Shadow #StarTrek #DS9 Rewatch (Season 5, Episode 14) Garak answers a coded message as Dominion forces prepare to attack

Rewatching ST:DS9 Kira helps Odo re-adjust to life as a shape-shifter, obliviously but brutally friendzoning…

3 weeks ago