Nemotron-Personas-Singapore: Co-Designed Data for Sovereign AI
Nemotron-Personas-Singapore: Co-Designed Data for Sovereign AI
Open Data for Singapore AI
Singapore has established itself as a leader in building AI systems that are both innovative and responsibly governed. Through interoperable governance frameworks, applied privacy research, and clear guidance on synthetic data, the country has demonstrated that AI sovereignty is ultimately about trust, transparency, and alignment with local norms.
To support these efforts, NVIDIA is releasing Nemotron-Personas-Singapore, a first-of-its-kind synthetic dataset designed for Singaporean developers and researchers building sovereign AI systems. The dataset provides training and evaluation data that is locally grounded, culturally contextualized, and privacy-preserving.
We are co-launching this initial release with our partner AI Singapore (AISG), a national programme launched by the National Research Foundation (NRF) to scale Singapore’s artificial intelligence capabilities. AISG is also the creator of SEA-LION, an open, multimodal AI model family built to understand Southeast Asia’s languages, cultures, and contexts. Together, we plan to extend this dataset to additional languages across Southeast Asia.
Licensed under CC BY 4.0, Nemotron-Personas-Singapore supports both commercial and public-sector AI development without relying on personally identifiable information (PII). The dataset integrates seamlessly with Nemotron models and other open-source LLMs, enabling developers to fine-tune AI agents and systems for Singapore-specific use cases.
Nemotron-Personas-Singapore extends NVIDIA’s open synthetic personas collection, which includes datasets for the United States, Japan, India, and Brazil.
What’s in the Dataset?
At a glance:
How We Built It
Data Generation Pipeline
Nemotron-Personas-Singapore was built using NeMo Data Designer, NVIDIA’s enterprise-grade synthetic data generation microservice. The pipeline leveraged the following components:
An extended version of Nemotron-Personas-Singapore will be available for use directly within NeMo Data Designer, enabling developers to generate, refine, and extend Singapore-specific personas as part of their own synthetic data pipelines.
Enhanced Cultural Context
To capture the socio-demographic and geographic diversity of Singapore’s population, Nemotron-Personas-Singapore leveraged self-reported, public demographic data from the 2024 Singapore census, as well as English name distribution data from NLB Name Authorities and CEA Salesperson Information on data.gov.sg.
Private by Design
Every persona in the dataset is fully synthetic:
By grounding generation in public statistics rather than personal records, Nemotron-Personas-Singapore enables AI development and evaluation with reduced regulatory friction, supporting alignment with Singapore’s Personal Data Protection Act (PDPA) and emerging global AI governance standards.
Who This Dataset Is For
Nemotron-Personas-Singapore is designed first and foremost for Singaporean model builders developing sovereign AI systems. Global developers may also leverage this data to improve model performance and adoption in Singapore’s diverse, cultural contexts.
Practical AI Applications
Why It Matters
As AI becomes embedded in public services, finance, healthcare, and infrastructure, the question shifts from whether AI is sovereign to how sovereignty is implemented responsibly.
Nemotron-Personas-Singapore supports sovereign AI in three concrete ways:
Start Building with Nemotron-Personas-Singapore
Load the dataset directly from Hugging Face:
Want to learn more about NVIDIA's open data products, or interested in co-designing a future dataset? Join the conversation on NVIDIA's Discord.
About AI Singapore
AI Singapore (AISG) is a national programme launched by the National Research Foundation (NRF), Singapore, to catalyse, synergise and boost Singapore’s artificial intelligence (AI) capabilities to power our future digital economy. AISG will bring together all Singapore-based research institutions and the vibrant ecosystem of AI start-ups and companies developing AI products, to perform use-inspired research, grow the knowledge, create the tools, and develop the talent to power Singapore’s AI efforts.
Community
·
Sign up or
log in to comment