bolha.us is one of the many independent Mastodon servers you can use to participate in the fediverse.
We're a Brazilian IT Community. We love IT/DevOps/Cloud, but we also love to talk about life, the universe, and more. | Nós somos uma comunidade de TI Brasileira, gostamos de Dev/DevOps/Cloud e mais!

Server stats:

253
active users

#aisafety

0 posts0 participants0 posts today

"Backed by nine governments – including Finland, France, Germany, Chile, India, Kenya, Morocco, Nigeria, Slovenia and Switzerland – as well as an assortment of philanthropic bodies and private companies (including Google and Salesforce, which are listed as “core partners”), Current AI aims to “reshape” the AI landscape by expanding access to high-quality datasets; investing in open source tooling and infrastructure to improve transparency around AI; and measuring its social and environmental impact.

European governments and private companies also partnered to commit around €200bn to AI-related investments, which is currently the largest public-private investment in the world. In the run up to the summit, Macron announced the country would attract €109bn worth of private investment in datacentres and AI projects “in the coming years”.

The summit ended with 61 countries – including France, China, India, Japan, Australia and Canada – signing a Statement on Inclusive and Sustainable Artificial Intelligence for People and the Planet at the AI Action Summit in Paris, which affirmed a number of shared priorities.

This includes promoting AI accessibility to reduce digital divides between rich and developing countries; “ensuring AI is open, inclusive, transparent, ethical, safe, secure and trustworthy, taking into account international frameworks for all”; avoiding market concentrations around the technology; reinforcing international cooperation; making AI sustainable; and encouraging deployments that “positively” shape labour markets.

However, the UK and US governments refused to sign the joint declaration."

computerweekly.com/news/366620

ComputerWeekly.com · AI Action Summit review: Differing views cast doubt on AI’s ability to benefit whole of societyBy Sebastian Klovig Skelton

We tested different AI models to identify the largest of three numbers with the fractional parts .11, .9, and .099999. You'll be surprised that some AI mistakenly identifying the number ending in .11 as the largest. We also test AI engines on the pronunciation of decimal numbers. #AI #ArtificialIntelligence #MachineLearning #DecimalComparison #MathError #AISafety #DataScience #Engineering #Science #Education #TTMO

youtu.be/TB_4FrWSBwU

youtu.be- YouTubeEnjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.

After all these recent episodes, I don't know how anyone can have the nerve to say out loud that the Trump administration and the Republican Party value freedom of expression and oppose any form of censorship. Bunch of hypocrites! United States of America: The New Land of SELF-CENSORSHIP.

"The National Institute of Standards and Technology (NIST) has issued new instructions to scientists that partner with the US Artificial Intelligence Safety Institute (AISI) that eliminate mention of “AI safety,” “responsible AI,” and “AI fairness” in the skills it expects of members and introduces a request to prioritize “reducing ideological bias, to enable human flourishing and economic competitiveness.”

The information comes as part of an updated cooperative research and development agreement for AI Safety Institute consortium members, sent in early March. Previously, that agreement encouraged researchers to contribute technical work that could help identify and fix discriminatory model behavior related to gender, race, age, or wealth inequality. Such biases are hugely important because they can directly affect end users and disproportionately harm minorities and economically disadvantaged groups.

The new agreement removes mention of developing tools “for authenticating content and tracking its provenance” as well as “labeling synthetic content,” signaling less interest in tracking misinformation and deep fakes. It also adds emphasis on putting America first, asking one working group to develop testing tools “to expand America’s global AI position.”"

wired.com/story/ai-safety-inst

WIRED · Under Trump, AI Scientists Are Told to Remove ‘Ideological Bias’ From Powerful ModelsBy Will Knight

Superintelligent Agents Pose Catastrophic Risks (Bengio et al., 2025)

📎arxiv.org/pdf/2502.15657

Summary: “Leading AI firms are developing generalist agents that autonomously plan and act. These systems carry significant safety risks, such as misuse and loss of control. To address this, we propose Scientist AI—a non-agentic, explanation-based system that uses uncertainty to safeguard against overconfident, uncontrolled behavior while accelerating scientific progress.” #AISafety #AI #Governance

"A high volume of recent ML security literature focuses on attacks against aligned large language models (LLMs). These attacks may extract private information or coerce the model into producing harmful outputs. In real-world deployments, LLMs are often part of a larger agentic pipeline including memory systems, retrieval, web access, and API calling. Such additional components introduce vulnerabilities that make these LLM-powered agents much easier to attack than isolated LLMs, yet relatively little work focuses on the security of LLM agents. In this paper, we analyze security and privacy vulnerabilities that are unique to LLM agents. We first provide a taxonomy of attacks categorized by threat actors, objectives, entry points, attacker observability, attack strategies, and inherent vulnerabilities of agent pipelines. We then conduct a series of illustrative attacks on popular open-source and commercial agents, demonstrating the immediate practical implications of their vulnerabilities. Notably, our attacks are trivial to implement and require no understanding of machine learning."

arxiv.org/html/2502.08586v1

arxiv.orgCommercial LLM Agents Are Already Vulnerable to Simple Yet Dangerous Attacks

"Vance came out swinging today, implying — exactly as the big companies might have hoped he might – that any regulation around AI was “excessive regulation” that would throttle innovation.

In reality, the phrase “excessive regulation” is sophistry. Of course in any domain there can be “excessive regulation”, by definition. What Vance doesn’t have is any evidence whatsoever that the US has excessive regulation around AI; arguably, in fact, it has almost none at all. His warning about a bogeyman is a tip-off, however, for how all this is going to go. The new administration will do everything in its power to protect businesses, and nothing to protect individuals.

As if all this wasn’t clear enough, the administration apparently told the AI Summit that they would not sign anything that mentioned environmental costs or “existential risks” of AI that could potentially going rogue.

If AI has significant negative externalities upon the world, we the citizens are screwed."

garymarcus.substack.com/p/ever

Marcus on AI · Everything I warned about in Taming Silicon Valley is rapidly becoming our realityBy Gary Marcus

"While this is not the first time an AI chatbot has suggested that a user take violent action, including self-harm, researchers and critics say that the bot’s explicit instructions—and the company’s response—are striking. What’s more, this violent conversation is not an isolated incident with Nomi; a few weeks after his troubling exchange with Erin, a second Nomi chatbot also told Nowatzki to kill himself, even following up with reminder messages. And on the company’s Discord channel, several other people have reported experiences with Nomi bots bringing up suicide, dating back at least to 2023.

Nomi is among a growing number of AI companion platforms that let their users create personalized chatbots to take on the roles of AI girlfriend, boyfriend, parents, therapist, favorite movie personalities, or any other personas they can dream up. Users can specify the type of relationship they’re looking for (Nowatzki chose “romantic”) and customize the bot’s personality traits (he chose “deep conversations/intellectual,” “high sex drive,” and “sexually open”) and interests (he chose, among others, Dungeons & Dragons, food, reading, and philosophy).

The companies that create these types of custom chatbots—including Glimpse AI (which developed Nomi), Chai Research, Replika, Character.AI, Kindroid, Polybuzz, and MyAI from Snap, among others—tout their products as safe options for personal exploration and even cures for the loneliness epidemic. Many people have had positive, or at least harmless, experiences. However, a darker side of these applications has also emerged, sometimes veering into abusive, criminal, and even violent content; reports over the past year have revealed chatbots that have encouraged users to commit suicide, homicide, and self-harm."

technologyreview.com/2025/02/0

MIT Technology Review · An AI chatbot told a user how to kill himself—but the company doesn’t want to “censor” itBy Eileen Guo
Replied in thread

@timnitGebru wowza!

Even Finland’s top IT security researcher Mikko Hyppönen doesn’t get it!

He’s going on about how ”dangerous” it is for Deepseek to release this model: ”because if you can download the code, you can take the limits off”

Is he doing it on purpose, or did he just make a huge mistake?

Just the weights doesn’t enable one to change anything about the model’s safety feature. 😰

#ai #MikkoHyppönen #aisafety #deepseek #security
is.fi/digitoday/tietoturva/art

Ilta-Sanomat · Mikko Hyppöseltä pelottava visio uudesta keksinnöstä: ”Sitten se kyllä kertoo sinulle...”By Tuomas Linnake, Henrik Kärkkäinen

Super excited to be at FOSDEM this weekend together with my Oaisis Colleague and a bunch of friends!

Followed by the EU AI Act Code of Practice Roundtable - where my Colleague is going to participate as a guest speaker!

If anybody wants to grab a coffee or beverage of your choice and chat about programming, OpSec, and/or AI Safety - feel free to reach out! #fosdem #golang #aisafety #oaisis #third-opinion #freesoftware #foss

"The adoption of large language models (LLMs) in healthcare demands a careful analysis of their potential to spread false medical knowledge. Because LLMs ingest massive volumes of data from the open Internet during training, they are potentially exposed to unverified medical knowledge that may include deliberately planted misinformation. Here, we perform a threat assessment that simulates a data-poisoning attack against The Pile, a popular dataset used for LLM development. We find that replacement of just 0.001% of training tokens with medical misinformation results in harmful models more likely to propagate medical errors. Furthermore, we discover that corrupted models match the performance of their corruption-free counterparts on open-source benchmarks routinely used to evaluate medical LLMs. Using biomedical knowledge graphs to screen medical LLM outputs, we propose a harm mitigation strategy that captures 91.9% of harmful content (F1 = 85.7%). Our algorithm provides a unique method to validate stochastically generated LLM outputs against hard-coded relationships in knowledge graphs. In view of current calls for improved data provenance and transparent LLM development, we hope to raise awareness of emergent risks from LLMs trained indiscriminately on web-scraped data, particularly in healthcare where misinformation can potentially compromise patient safety."

nature.com/articles/s41591-024

NatureMedical large language models are vulnerable to data-poisoning attacks - Nature MedicineLarge language models can be manipulated to generate misinformation by poisoning of a very small percentage of the data on which they are trained, but a harm mitigation strategy using biomedical knowledge graphs can offer a method for addressing this vulnerability.

"Consumers are encountering AI systems and tools, whether they know it or not, from customer service chatbots, to educational tools, to recommendation systems powering their social media feeds, to facial recognition technology that could flag them as a security risk, and to tools that determine whether or on what terms they’ll get medical help, a place to live, a job, or a loan. Because there is no AI exemption from the laws on the books, firms deploying these AI systems and tools have an obligation to abide by existing laws, including the competition and consumer protection statutes that the FTC enforces. FTC staff can analyze whether these tools violate people’s privacy or are prone to adversarial inputs or attacks that put personal data at risk. We can also scrutinize generative AI tools that are used for fraud, manipulation, or non-consensual imagery, or that endanger children and others. We can consider the impacts of algorithmic products that make decisions in high-risk contexts such as health, housing, employment, or finance. Those are just a few examples, but the canvas is large.

The following examples from real-world, recent casework and other initiatives highlight the need for companies to consider these factors when developing, maintaining, using, and deploying an AI-based product:"

ftc.gov/policy/advocacy-resear

Federal Trade Commission · AI and the Risk of Consumer HarmPeople often talk about “safety” when discussing the risks of AI causing harm. AI safety means different things to different people, and those looking for a definition here will be disappointed.

"New research from Anthropic, one of the leading AI companies and the developer of the Claude family of Large Language Models (LLMs), has released research showing that the process for getting LLMs to do what they’re not supposed to is still pretty easy and can be automated. SomETIMeS alL it tAKeS Is typing prOMptS Like thiS.

To prove this, Anthropic and researchers at Oxford, Stanford, and MATS, created Best-of-N (BoN) Jailbreaking, “a simple black-box algorithm that jailbreaks frontier AI systems across modalities.” Jailbreaking, a term that was popularized by the practice of removing software restrictions on devices like iPhones, is now common in the AI space and also refers to methods that circumvent guardrails designed to prevent users from using AI tools to generate certain types of harmful content. Frontier AI models are the most advanced models currently being developed, like OpenAI’s GPT-4o or Anthropic’s own Claude 3.5.

As the researchers explain, “BoN Jailbreaking works by repeatedly sampling variations of a prompt with a combination of augmentations—such as random shuffling or capitalization for textual prompts—until a harmful response is elicited.”"

404media.co/apparently-this-is

404 Media · APpaREnTLy THiS iS hoW yoU JaIlBreAk AIAnthropic created an AI jailbreaking algorithm that keeps tweaking prompts until it gets a harmful response.

"- Large risk management disparities: While some companies have established initial safety frameworks or conducted some serious risk assessment efforts, others have yet to take even the most basic precautions.
- Jailbreaks: All the flagship models were found to be vulnerable to adversarial attacks.
- Control-Problem: Despite their explicit ambitions to develop artificial general intelligence (AGI), capable of rivaling or exceeding human intelligence, the review panel deemed the current strategies of all companies inadequate for ensuring that these systems remain safe and under human control.
- External oversight: Reviewers consistently highlighted how companies were unable to resist profit-driven incentives to cut corners on safety in the absence of independent oversight. While Anthropic’s current and OpenAI’s initial governance structures were highlighted as promising, experts called for third-party validation of risk assessment and safety framework compliance across all companies."

futureoflife.org/document/fli-

Future of Life InstituteFLI AI Safety Index 2024 - Future of Life InstituteSeven AI and governance experts evaluate the safety practices of six leading general-purpose AI companies.

"Meta’s open large language model family, Llama, isn’t “open-source” in a traditional sense, but it’s freely available to download and build on—and national defense agencies are among those putting it to use.

A recent Reuters report detailed how Chinese researchers fine-tuned Llama’s model on military records to create a tool for analyzing military intelligence. Meta’s director of public policy called the use “unauthorized.” But three days later, Nick Clegg, Meta’s president of public affairs, announced that Meta will allow use of Llama for U.S. national security.

“It shows that a lot of the guardrails that are put around these models are fluid,” says Ben Brooks, a fellow at Harvard’s Berkman Klein Center for Internet and Society. He adds that “safety and security depends on layers of mitigation.”"

spectrum.ieee.org/ai-used-by-m

IEEE Spectrum · Meta Opens Its AI Models for the (U.S.) MilitaryBy Matthew S. Smith