Tomas Mikolov, et. al
Google Inc. Mountain View
Abstract. The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling.
An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of “Canada” and “Air” cannot be easily combined to obtain “Air Canada”. Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.
Large Language Models and Infrastructure Technical Standards
Large Language Models (LLMs) are poised to significantly accelerate and reshape the development of infrastructure standards — including engineering codes, technical specifications for civil works, transportation, energy grids, water systems, and related Standards Development Organization (SDO) processes at ASTM, IEEE, ASABE, ISO, and similar bodies. This connection traces back to foundational ideas in distributed representations (Hinton et al., Mikolov’s Word2Vec) that powered the transformer revolution, which in turn enabled modern LLMs and the shift from passive generative AI to active, goal-directed agentic AI.
While LLMs will not replace human expertise, consensus-building, or rigorous validation, they will transform traditionally slow, document-heavy workflows into faster, more collaborative, and data-driven processes.
1. Faster Drafting, Summarization, and Gap Analysis
LLMs can rapidly summarize lengthy documents, extract key requirements, identify inconsistencies across related standards, and generate initial draft sections or comparison tables. This is especially valuable for reviewing historical codes, research papers, regulations, and stakeholder inputs.
Infrastructure example: In renewable energy permitting or grid interconnection standards, LLMs excel at processing complex environmental impact statements and regulatory texts to accelerate reviews.
2. Enhanced Requirements Engineering and Consistency Checking
LLMs support formal requirements extraction, flag ambiguities, suggest measurable criteria, and translate between domains. They help maintain alignment between textual standards and digital implementations such as Building Information Modeling (BIM) or simulation tools.
3. Improved Accessibility, Education, and Stakeholder Participation
LLMs make standards more usable by generating plain-language explanations, FAQs, examples, and tailored training materials. They lower barriers for broader participation in SDO committees by helping non-experts understand and contribute to drafts.
4. Domain-Specific Applications in Infrastructure
- Civil, Structural & Agricultural Engineering: Design ideation, safety analysis, and updating standards for new materials and climate resilience.
- Permitting & Compliance: Summarizing environmental documents and speeding up infrastructure deployment.
- Interoperability & Testing: Verification support for software-heavy systems such as smart grids and autonomous infrastructure.
5. Broader Process Changes for SDOs
- Zero-draft acceleration for preliminary stakeholder review
- Continuous monitoring for maintenance and timely updates
- Multi-agent LLM systems for parallel virtual expert review before human consensus
Limitations and Important Caveats
- “Hallucinations” & Validation: Outputs must always be human-verified, especially in safety-critical areas. Domain-specific fine-tuning and retrieval-augmented generation (RAG) help but are not foolproof.
- Bias, Copyright & Accountability: Standards demand traceability and consensus; LLMs can introduce subtle biases or IP concerns.
- Not a Full Replacement: Human judgment remains essential for risk assessment, ethics, and real-world tradeoffs.
Expect 2–5× faster iteration on drafts, superior knowledge management, and more adaptive standards. Early adopters using LLM assisted tools with proper governance will lead the next generation of infrastructure standards development.



















