@digital4better/data
v1.3.40
Published
Digital4Better Open Data
Readme
Digital4Better Open Data
Open datasets maintained by Digital4Better to describe the environmental footprint of digital services, cloud infrastructure, electricity systems, and AI models.
This repository is meant to be used as a data source, not as developer documentation. The main audience is analysts, sustainability teams, researchers, product teams, and anyone who needs reusable reference data in JSON or CSV.
These reference datasets are used, among other things, by fruggr, Digital4Better's platform for measuring and managing the environmental footprint of digital services.
What You Can Find Here
The repository is organized as a set of reusable data collections:
| Collection | What it covers | Main files |
| --- | --- | --- |
| data/ai | AI model catalog across vendors and cloud providers | models.json |
| data/cloud | Cloud regions, virtual machines, CPUs, accelerators | *-regions.*, *-vms.*, cpus.*, accelerators.* |
| data/country | Countries, regions, continents, and distance referentials | regions.*, countries.*, continents.*, *-distances.* |
| data/energy | Environmental impacts of electricity production technologies | energy-impacts.* |
| data/mix | Electricity mix by geography and time period | world-*, continent-*, country-*, subdivision-* |
| data/factor | Electricity impact factors derived from energy mix data | world-*, continent-*, country-*, subdivision-* |
| data/equipment | Equipment energy and embodied impact reference data | energy.*, embodied.* |
Why This Repository Exists
These datasets are used to:
- estimate the environmental footprint of digital services
- compare cloud infrastructure options across providers and regions
- model electricity-related impacts by country, continent, or subdivision
- enrich internal or public sustainability dashboards
- document AI models and their characteristics in a structured way
Highlights
AI Models
The AI catalog in data/ai/models.json documents model families from providers such as OpenAI, Anthropic, Google, Mistral, Meta, Qwen, DeepSeek, Amazon, Cohere, and others.
This makes it useful for market mapping, observatories, governance, and cloud/AI portfolio analysis.
Main source families:
- cloud provider catalogs such as AWS Bedrock, Azure AI Foundry, Google Cloud Vertex AI, OVHcloud AI Endpoints, and Scaleway Generative APIs
- official model vendor documentation such as OpenAI, Anthropic, Mistral, Qwen, and DeepSeek
- model cards and open model hubs such as Hugging Face
- technical reports and synthesis sources such as LifeArchitect and ApXML Models
Cloud Infrastructure
The cloud referentials in data/cloud provide structured information for major providers including AWS, Azure, GCP, Oracle Cloud Infrastructure, OVHcloud, and Scaleway.
Typical use cases:
- mapping regions and datacenter footprints
- comparing VM families and hardware characteristics
- linking compute infrastructure to sustainability calculations
Main source families:
- provider region and infrastructure documentation from AWS, Microsoft Azure, Google Cloud, Oracle Cloud Infrastructure, OVHcloud, and Scaleway
- manufacturer and hardware reference sources for CPUs and accelerators
- provider sustainability disclosures used for
pue,wue, andref, including AWS regional PUE/WUE, Microsoft regional fact sheets via datacenters.microsoft.com, Google Cloud regional CFE%, Oracle's Corporate Citizenship Report, OVHcloud FY25 KPIs and methodology note, and Scaleway's calculation reference values and impact reports
Current cloud assumptions kept in the datasets:
aws:pueandwuecome from the 2024 AWS regional CSV;refstays at0because AWS public renewable matching disclosures are not used as a region-level factor in this referentialazure: values come from Microsoft regional fact sheets, combining still-live PDFs with previously curated factsheet values for regions whose older PDFs are no longer publicly retrievablegcp:refcomes from regionalCFE%;wuecomes from previously derived values based on Google environmental reporting and is kept until Google publishes a clearer general regional water metricoracle: use a uniformpue = 1.07and provisionalwue = 0until OCI publishes region-level metricsovhcloud: follow FY25 KPI values forpue,wue, andrefscaleway: values come from documented datacenter figures, completed where needed with the provider's impact reports
Electricity Mix And Impact Factors
The datasets in data/mix and data/factor help translate electricity consumption into environmental impacts.
They are available at several levels:
- world
- continent
- country
- subdivision
And across different time granularities:
- yearly
- monthly
Green-only variants are also available through files ending with -green.
Main source families:
- electricity generation data from Ember monthly electricity data and Ember yearly electricity data
- impact factors built from lifecycle assessment literature, including UNECE 2021 - Life cycle assessment of electricity generation options and related academic work such as this Energy paper
Geography And Distances
The datasets in data/country provide geographic referentials used to map countries, continents, subdivisions, and estimated network distances.
Typical use cases:
- geographic normalization
- country and subdivision mapping
- rough estimation of distances between users, countries, regions, and datacenters
Main source families:
- ISO country and subdivision standards
- internally maintained geographic referentials used to derive administrative mappings and distance approximations
Equipment Reference Data
The datasets in data/equipment provide reference values for embodied impacts and operational energy of common digital equipment categories.
Typical use cases:
- footprint modeling at equipment level
- simplified lifecycle modeling for digital services
- comparative analysis of device or infrastructure categories
Main source families:
- Digital4Better internal modeling inputs
- lifecycle assessment literature and equipment reference datasets used for sustainability calculations
Formats
Most collections are published in both formats:
JSONfor structured or nested dataCSVfor tabular exploration, spreadsheets, and BI tools
If a collection is only available in one format, it is usually because that format is the most natural one for the data structure.
Quick Navigation
- AI models:
data/ai/models.json - Cloud regions:
data/cloud - Country and region referentials:
data/country - Energy impacts:
data/energy/energy-impacts.json - Electricity mix:
data/mix - Electricity factors:
data/factor - Equipment data:
data/equipment
Notes On Data Quality
This repository aims to provide transparent and reusable reference data, but some values should be interpreted with care.
- Some fields are derived from public documentation, model cards, technical reports, or literature rather than official disclosures.
- Some collections include explicit uncertainty markers such as
estimated. - AI and cloud catalogs evolve quickly, so historical and legacy entries may coexist with current ones.
- Environmental factors are based on a mix of primary data, literature, and modeling assumptions.
When available, source URLs are kept directly in the data files themselves.
Related Links
- GitHub repository: digital4better/data
- Digital4Better: digital4better.com
- fruggr: fruggr.io
- Contributing guide: CONTRIBUTING.md
License
This repository is published under the ODC Open Database License (ODbL).
