newsence
來源篩選

Nemotron-Personas-Singapore: Co-Designed Data for Sovereign AI

Huggingface

NVIDIA, in partnership with AI Singapore (AISG), has released Nemotron-Personas-Singapore, a novel synthetic dataset designed to foster sovereign AI development in Singapore. This privacy-preserving dataset is locally grounded and culturally contextualized, supporting both commercial and public-sector AI initiatives.

newsence

Nemotron-Personas-Singapore:共創數據以實現主權式AI

Huggingface
大約 1 個月前

AI 生成摘要

NVIDIA 與 AI Singapore (AISG) 合作發布了 Nemotron-Personas-Singapore,這是一個創新的合成數據集,旨在促進新加坡的主權式 AI 開發。此數據集注重隱私保護,並具備在地化和文化情境化,可支持商業和公共部門的 AI 專案。

Nemotron-Personas-Singapore: Co-Designed Data for Sovereign AI

Image

Nemotron-Personas-Singapore: Co-Designed Data for Sovereign AI

Image Image Image

Open Data for Singapore AI

Singapore has established itself as a leader in building AI systems that are both innovative and responsibly governed. Through interoperable governance frameworks, applied privacy research, and clear guidance on synthetic data, the country has demonstrated that AI sovereignty is ultimately about trust, transparency, and alignment with local norms.

To support these efforts, NVIDIA is releasing Nemotron-Personas-Singapore, a first-of-its-kind synthetic dataset designed for Singaporean developers and researchers building sovereign AI systems. The dataset provides training and evaluation data that is locally grounded, culturally contextualized, and privacy-preserving.

We are co-launching this initial release with our partner AI Singapore (AISG), a national programme launched by the National Research Foundation (NRF) to scale Singapore’s artificial intelligence capabilities. AISG is also the creator of SEA-LION, an open, multimodal AI model family built to understand Southeast Asia’s languages, cultures, and contexts. Together, we plan to extend this dataset to additional languages across Southeast Asia.

Licensed under CC BY 4.0, Nemotron-Personas-Singapore supports both commercial and public-sector AI development without relying on personally identifiable information (PII). The dataset integrates seamlessly with Nemotron models and other open-source LLMs, enabling developers to fine-tune AI agents and systems for Singapore-specific use cases.

Nemotron-Personas-Singapore extends NVIDIA’s open synthetic personas collection, which includes datasets for the United States, Japan, India, and Brazil.

What’s in the Dataset?

Image

At a glance:

How We Built It

Data Generation Pipeline

Nemotron-Personas-Singapore was built using NeMo Data Designer, NVIDIA’s enterprise-grade synthetic data generation microservice. The pipeline leveraged the following components:

An extended version of Nemotron-Personas-Singapore will be available for use directly within NeMo Data Designer, enabling developers to generate, refine, and extend Singapore-specific personas as part of their own synthetic data pipelines.

Enhanced Cultural Context

To capture the socio-demographic and geographic diversity of Singapore’s population, Nemotron-Personas-Singapore leveraged self-reported, public demographic data from the 2024 Singapore census, as well as English name distribution data from NLB Name Authorities and CEA Salesperson Information on data.gov.sg.

Private by Design

Every persona in the dataset is fully synthetic:

By grounding generation in public statistics rather than personal records, Nemotron-Personas-Singapore enables AI development and evaluation with reduced regulatory friction, supporting alignment with Singapore’s Personal Data Protection Act (PDPA) and emerging global AI governance standards.

Who This Dataset Is For

Nemotron-Personas-Singapore is designed first and foremost for Singaporean model builders developing sovereign AI systems. Global developers may also leverage this data to improve model performance and adoption in Singapore’s diverse, cultural contexts.

Practical AI Applications

Why It Matters

As AI becomes embedded in public services, finance, healthcare, and infrastructure, the question shifts from whether AI is sovereign to how sovereignty is implemented responsibly.

Nemotron-Personas-Singapore supports sovereign AI in three concrete ways:

Start Building with Nemotron-Personas-Singapore

Load the dataset directly from Hugging Face:

Want to learn more about NVIDIA's open data products, or interested in co-designing a future dataset? Join the conversation on NVIDIA's Discord.

About AI Singapore
AI Singapore (AISG) is a national programme launched by the National Research Foundation (NRF), Singapore, to catalyse, synergise and boost Singapore’s artificial intelligence (AI) capabilities to power our future digital economy. AISG will bring together all Singapore-based research institutions and the vibrant ecosystem of AI start-ups and companies developing AI products, to perform use-inspired research, grow the knowledge, create the tools, and develop the talent to power Singapore’s AI efforts.

Community

·
Sign up or
log in to comment