Large Language Model Standards

Loading
loading...

Large Language Model Standards

April 30, 2024
mike@standardsmichigan.com
No Comments

 

Perhaps the World Ends Here | Joy Harjo

 

The world begins at a kitchen table. No matter what, we must eat to live.
The gifts of earth are brought and prepared, set on the table.
So it has been since creation, and it will go on.
We chase chickens or dogs away from it. Babies teethe at the corners. They scrape their knees under it.
It is here that children are given instructions on what it means to be human.
We make men at it, we make women.
At this table we gossip, recall enemies and the ghosts of lovers.
Our dreams drink coffee with us as they put their arms around our children.
They laugh with us at our poor falling-down selves and as we put ourselves back together once again at the table.
This table has been a house in the rain, an umbrella in the sun.
Wars have begun and ended at this table. It is a place to hide in the shadow of terror.
A place to celebrate the terrible victory.
We have given birth on this table, and have prepared our parents for burial here.
At this table we sing with joy, with sorrow. We pray of suffering and remorse. We give thanks.
Perhaps the world will end at the kitchen table, while we are laughing and crying, eating of the last sweet bite.

 

Standards and benchmarks for evaluating large language models (LLMs). Some of the most commonly used benchmarks and standards include:

  1. GLUE (General Language Understanding Evaluation): GLUE is a benchmark designed to evaluate and analyze the performance of models across a diverse range of natural language understanding tasks, such as text classification, sentiment analysis, and question answering.
  2. SuperGLUE: SuperGLUE is an extension of the GLUE benchmark, featuring more difficult language understanding tasks, aiming to provide a more challenging evaluation for models.
  3. CoNLL (Conference on Computational Natural Language Learning): CoNLL has historically hosted shared tasks, including tasks related to coreference resolution, dependency parsing, and other syntactic and semantic tasks.
  4. SQuAD (Stanford Question Answering Dataset): SQuAD is a benchmark dataset for evaluating the performance of question answering systems. It consists of questions posed on a set of Wikipedia articles, where the model is tasked with providing answers based on the provided context.
  5. RACE (Reading Comprehension from Examinations): RACE is a dataset designed to evaluate reading comprehension models. It consists of English exam-style reading comprehension passages and accompanying multiple-choice questions.
  6. WMT (Workshop on Machine Translation): The WMT shared tasks focus on machine translation, providing benchmarks and evaluation metrics for assessing the quality of machine translation systems across different languages.
  7. BLEU (Bilingual Evaluation Understudy): BLEU is a metric used to evaluate the quality of machine-translated text relative to human-translated reference texts. It compares n-gram overlap between the generated translation and the reference translations.
  8. ROUGE (Recall-Oriented Understudy for Gisting Evaluation): ROUGE is a set of metrics used for evaluating automatic summarization and machine translation. It measures the overlap between generated summaries or translations and reference summaries or translations.

These benchmarks and standards play a crucial role in assessing the performance and progress of large language models, helping researchers and developers understand their strengths, weaknesses, and areas for improvement.

Yann Lecun & Lex Fridman: Limits of LLMs

New topic for us; time only to cover the basics.  We have followed language, generally, however — every month — because best practice discovery and promulgation in conceiving, designing, building, occupying and maintaining the architectural character of education settlements depends upon a common vocabulary.  The struggle to agree upon vocabulary presents an outsized challenge to the work we do.

Large language models hold significant potential for the building construction industry by streamlining various processes. They can analyze vast amounts of data to aid in architectural design, structural analysis, and project management. These models can generate detailed plans, suggest optimized construction techniques, and assist in cost estimation. Moreover, they facilitate better communication among stakeholders by providing natural language interfaces for discussing complex concepts. By harnessing the power of large language models, the construction industry can enhance efficiency, reduce errors, and ultimately deliver better-designed and more cost-effective buildings.

Join us today at the usual hour.  Use the login credentials at the upper right of our home page.

Related:

print(“Python”)

Standards January: Language

Standard for Large Language Model Agent Interface

 

Artificial Intelligence Standards

April 30, 2024
mike@standardsmichigan.com
No Comments

On April 29, 2024 NIST released a draft plan for global engagement on AI standards.

Comments are due by June 2. More information is available here.

 

Request for Information Related to NIST’s Assignments

Under Sections 4.1, 4.5 and 11 of the Executive Order Concerning Artificial Intelligence 

The National Institute of Standards and Technology seeks information to assist in carrying out several of its responsibilities under the Executive order on Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence issued on October 30, 2023. Among other things, the E.O. directs NIST to undertake an initiative for evaluating and auditing capabilities relating to Artificial Intelligence (AI) technologies and to develop a variety of guidelines, including for conducting AI red-teaming tests to enable deployment of safe, secure, and trustworthy systems.

Regulations.GOV Filing: NIST-2023-0009-0001_content

Browse Posted Comments (72 as of February 2, 2024 | 12:00 EST)

Standards Michigan Public Comment


Unleashing American Innovation

Federal Agency Conformity Assessment

Time & Frequency Services

Technical Requirements for Weighing & Measuring Devices

Why You Need Standards

Summer Internship Research Fellowship

A Study of Children’s Password Practices

Human Factors Using Elevators in Emergency Evacuation

Cloud Computing Paradigm

What is time?

Readings / Radio Controlled Clocks

Standard Reference Material

Software Engineering Ethics Education

April 30, 2024
mike@standardsmichigan.com
,
No Comments

 

Four Opportunities for SE Ethics Education

Alicia M. Grubb
Smith College, Northampton, Massachusetts

 

Abstract:  Many software engineers direct their talents towards software systems which do not fall into traditional definitions of safety critical systems, but are integral to society (e.g., social media, expert advisor systems). While codes of ethics can be a useful starting point for ethical discussions, codes are often limited in scope to professional ethics and may not offer answers to individuals weighing competing ethical priorities. In this paper, we present our vision for improving ethics education in software engineering. To do this, we consider current and past curricular recommendations, as well as recent efforts within the broader computer science community. We layout challenges with vignettes and assessments in teaching, and give recommendations for incorporating updated examples and broadening the scope of ethics education in software engineering.
CLICK HERE to order complete paper

Smith College | Hampshire County Massachusetts

Sam Altman: OpenAI

There are no generally accepted best practices specifically tailored for Artificial General Intelligence (AGI) development, mainly because AGI remains largely theoretical and hasn’t been achieved yet. However, there are various principles, guidelines, and best practices within the broader field of artificial intelligence and machine learning that could inform AGI development efforts. Some of these include:

Ethical AI Principles: Many organizations and research institutions have proposed ethical principles for AI development, focusing on issues like fairness, transparency, accountability, and safety. These principles could be adapted and extended to AGI development.

Safety Guidelines: Concepts like AI alignment, robustness, and safety engineering are crucial for AGI development to ensure that the system behaves in desirable ways and doesn’t pose risks to humanity.

Interdisciplinary Approach: AGI development may require insights from various fields such as computer science, cognitive science, neuroscience, philosophy, and psychology. Collaborative efforts among experts from different disciplines can help in shaping best practices for AGI.

Research Ethics: Guidelines for conducting ethical research in areas like human subjects research, data privacy, and responsible publication are relevant for AGI development as well, especially considering the potential societal impacts of AGI.

Transparency and Openness: Promoting transparency and open research practices can help in fostering trust and collaboration within the AGI research community. Open access to data, code, and research findings can facilitate progress in AGI development while mitigating risks.

Risk Assessment and Mitigation: AGI researchers should consider potential risks and unintended consequences of their work, such as job displacement, economic disruption, and existential risks. Developing strategies for risk assessment and mitigation is essential.

Continuous Learning and Adaptation: AGI systems are expected to be capable of learning and adapting autonomously. Therefore, best practices for continual learning, model updating, and adaptation in AI systems are relevant for AGI development.

While there may not be specific standards or best practice literature exclusively dedicated to AGI, integrating insights and principles from related fields can guide responsible and effective AGI research and development. Additionally, as progress is made in AI research, new standards and best practices may emerge to address the unique challenges of AGI.

Retrodiction

April 30, 2024
mike@standardsmichigan.com
, ,
No Comments

By design, we do not provide a SEARCH function. We are a niche practice in a subtle, time-sensitive domain with over 30 years of case history in which we have been first movers. We provide links to the most accessed topics in recent days. All queries presented during our “Open Office Hours” every work day, or via email, are gratefully received and prompt a near-immediate response.

Evensong “Brahms – Intermezzo Op. 118, No. 2 in A major”

LIncoln Weather and Climate

Cow to Cone

2028 National Electrical Safety Code

Protecting Animals When Disaster Strikes

United State Air Force Academy Cadet Chapel

“Gelukkige Koningsdag!” Stamppot

Ferma owiec i kóz w żelaznej

Monticello

The Seven Sins of Greenwashing

print(“Python”)

Gallery: Other Ways of Knowing Climate Change

Energy Standard for *Sites* and Buildings

Timon of Athens

Museum Lighting & Lighting for Fine Art

The Kringle, Tulips & Tea

https://standardsmichigan.com/wp-admin/post.php?post=101955&action=edit

Recently in Washington D.C.

Abiit sed non oblitus | Wisconsin

Electrical Resource Adequacy

Protecting Animals When Disaster Strikes

Passover ‘A Cappella’

Entertainment Occupancies

Steeplechase Water Jump

C++

The Best Student-Friendly Brownies

print(“Python”)

Michigan State University

Oxford College Student Center

Sacred Spaces

Laboratory Fume Hood Safety

University of Iowa | Johnson County

2028 National Electrical Safety Code

Национа́льный иссле́довательский То́мский госуда́рственный университе́т

Robie House

Making Greenwich the centre of the world

Roger Scruton Memorial Lectures

Electrical heat tracing: international harmonization-now and in the future


Winter Vegetable Soup

Electrical heat tracing: international harmonization-now and in the future

Brankscom Hall Toronto

Fire Alarm & Signaling Code

Ice Swimming

Uniform Plumbing Code


Banished Words 2024

Ædificare


“It is a truth universally acknowledged, that a single man in possession

of a good fortune, must be in want of a wife.”

Pride and Prejudice by Jane Austen

 

 

print(“Python”)

April 30, 2024
mike@standardsmichigan.com
, , , ,
No Comments

 

 

“Python is the programming equivalent

of a Swiss Army Knife.”

— Some guy

 

The Python Standard Library

Open source standards development is characterized by very open exchange, collaborative participation, rapid prototyping, transparency and meritocracy.   The Python programming language is a high-level, interpreted language that is widely used for general-purpose programming. Python is known for its readability, simplicity, and ease of use, making it a popular choice for beginners and experienced developers alike.  Python has a large and active community of developers, which has led to the creation of a vast ecosystem of libraries, frameworks, and tools that can be used for a wide range of applications. These include web development, scientific computing, data analysis, machine learning, and more.

Another important aspect of Python is its versatility. It can be used on a wide range of platforms, including Windows, macOS, Linux, and even mobile devices. Python is also compatible with many other programming languages and can be integrated with other tools and technologies, making it a powerful tool for software development.  Overall, the simplicity, readability, versatility, and large community support of Python make it a valuable programming language to learn for anyone interested in software development including building automation.

As open source software, anyone may suggest an improvement to Python(3.X) starting at the link below:

Python Enhancement Program

Python Download for Windows

Python can be used to control building automation systems. Building automation systems are typically used to control various systems within a building, such as heating, ventilation, air conditioning, lighting, security, and more. Python can be used to control these systems by interacting with the control systems through the building’s network or other interfaces.

There are several Python libraries available that can be used for building automation, including PyVISA, which is used to communicate with instrumentation and control systems, and PyModbus, which is used to communicate with Modbus devices commonly used in building automation systems. Python can also be used to develop custom applications and scripts to automate building systems, such as scheduling temperature setpoints, turning on and off lights, and adjusting ventilation systems based on occupancy or other variables. Overall, Python’s flexibility and versatility make it well-suited for use in building automation systems.

Subversion®

Building Automation & Control Networks

Pediatric & Daycare

April 29, 2024
mike@standardsmichigan.com

No Comments

“Kindergarten” 1885 Johann Sperl

Join us today when we examine the state of the literature that governs the safety and performance of occupancies designed and operated for the care of children specifically; family support generally.  There is a fair amount of overlap in the safety and performance principles in the titles which frequently reference each other; all of them responding to unintended incidents, innovation and new discoveries.

In hospitals and clinics, the titles we follow — and engage with proposed revisions — are listed below:

  1. NFPA 99: Health Care Facilities Code: NFPA 99 provides specific requirements for the safe and effective operation of healthcare facilities, including those serving pediatric patients.
  2. American Academy of Pediatrics Guidelines: While not legally binding, guidelines provided by organizations like the AAP offer best practices for pediatric care, including safety considerations.
  3. The Joint Commission Standards for the Accreditation of Children’s Hospitals: The Joint Commission sets standards for healthcare organizations and programs in the United States. Compliance with these standards ensures the safety and quality of care provided to pediatric patients.
  4. ISO Healthcare Organization Management 
  5. International Building Codes
  6. IEEE Education & Healthcare Facilities Committee

Since the ASHRAE catalog is growing to encompass every occupancy on earth; we keep pace with it;  There’s never not something happening there is not relevant to our work:

Energy Standard for *Sites* and Buildings

Day Care

Hoover Institution: The De-Population Bomb

To repeat a statement made throughout the Standards Michigan facility: We place the Underwriters Laboratory and ASTM International best practice catalogs at a lower priority because the business models of those organizations deal primarily with product standards — not interoperability standards.   You will see UL and ASTM labels on many, many products within pediatric and daycare environments but, as a user-interest, we do not have the resources to engage with the UL and ASTM suite product-by-product; essential as they may be.

Ensuring the safety of children in daycare centers involves compliance with various codes and standards in the United States. Here are some key ones:

  1. International Fire Code (IFC): The IFC includes provisions for fire prevention and protection measures in buildings, including daycare centers. It addresses fire detection, alarm systems, fire extinguishing equipment, and evacuation planning.
  2. Americans with Disabilities Act (ADA): The ADA sets requirements for accessibility in public accommodations, including daycare centers. It includes provisions for accessible routes, entrances, restrooms, and other facilities to accommodate children with disabilities.
  3. National Fire Protection Association (NFPA) 101: Life Safety Code: NFPA 101 provides requirements for the design, construction, and operation of buildings to protect occupants from fire and other hazards. It covers aspects such as means of egress, fire protection systems, and emergency planning.
  4. NFPA 1: Fire Code: NFPA 1 addresses fire prevention measures in various occupancies, including daycare centers. It includes requirements for fire alarm systems, fire extinguishers, emergency lighting, and other fire safety features.
  5. ASTM F2373 – Standard Consumer Safety Performance Specification for Public Use Play Equipment for Children 6 Months through 23 Months: This standard specifies safety requirements for play equipment commonly found in daycare centers, ensuring the safety of young children during play activities.
  6. National Association for the Education of Young Children (NAEYC) Standards: While not legally binding, NAEYC sets voluntary accreditation standards for childcare programs, focusing on quality, safety, and child development.

Governmental agencies at all levels incorporate these titles — partially or whole cloth — present additional, typically more rigorous requirements.

Of course, the primary hazard we address is the presence of reliable of safe and economical electricity.  All of the foregoing titles depend upon electricity so we deal with the technical literature on electricity on a near-continuous basis.

Use the login credentials at the upper right of our homepage.

 

 

American College of Obstetricians and Gynecologists

April 29, 2024
mike@standardsmichigan.com

No Comments

Founded in 1951, ACOG is a membership organization for obstetrician–gynecologists. The College produces practice guidelines for health care professionals and educational materials for patients, provides practice management and career support, facilitates programs and initiatives to improve women’s health, and advocates for members and patients.

It provides several educational tracks for member certification and licensing largely derived from federal regulations. It also invites proposals from members about organizational priorities; one such linked below:

Abortion Misinformation Campaign

The link above also proves that no matter how well educated an organization’s members, the leadership of the organization is capable of shenanigans with federal law that leaves the regulation of abortion to states; closer to the cultural norms of local communities.

Related:

“A half truth is a full lie” — so goes the adage.  In service of telling the full story — only half of which is told in the RFP linked above — a map of states is linked below.

Interactive Map: Abortion Laws by State

 

LIncoln Weather and Climate

April 29, 2024
mike@standardsmichigan.com

No Comments

Nebraska and U.S. Tornadoes

Nebraska

Storm Shelters

Layout mode
Predefined Skins
Custom Colors
Choose your skin color
Patterns Background
Images Background
Skip to content