New Public Datasets Added To AWS
Written by Kay Ewbank   
Wednesday, 06 February 2019

Amazon has announced nine new AWS public datasets for researchers and developers interested in machine learning, environmental science, geospatial, astronomy, cybersecurity, and housing.

The AWS Public Dataset Program covers the cost of storage for publicly available high-value cloud-optimized datasets. The datasets within it can be used for analysis on AWS, and the aim is also to develop new cloud-native techniques, formats, and tools that lower the cost of working with data.



The machine learning dataset is a massively multilingual image dataset from the University of Pennsylvania. The dataset contains images paired with the words they represent in 100 languages, and the dataset is doubly parallel: for each language, words are stored parallel to images that represent the word, as well as parallel to the word's translation into English. The image below shows five images for the Indonesian word "kucing", a word with high predicted concreteness, along with its top 4 ranked translations using CNN features:



There are three environmental datasets. The first is a set of atmospheric deterministic and probabilistic forecasts from the UK Meteorological Office. This is actually an update to previously available data, but is now updated daily.

The second environmental dataset is a collection of scientific information for land owners from the Queensland Government. The database is made up of Australian climate data from 1889 to the present.

The third collection of environmental data is air quality and radiation data from Safecast. Safecast was started after the Fukushima Daiichi Nuclear Power Plant meltdown, when volunteers began monitoring radiation levels. Air quality measurements were added later, and the project has spread around the world.

There are two new Geospatial datasets; the USGS 3D elevation data, which contains elevation data in the form of light detection and ranging (LiDAR) data over the United States, Hawaii, and the U.S. territories, with data acquired over an 8-year period; and a set of images collected by the China-Brazil Earth Resources Satellite from AMS Kepler.
In the astronomy sector, there's data from the Transiting Exoplanet Survey Satellite (TESS), a two-year survey looking for exoplanets in orbit around bright stars.
The Open City Model data has also been made available. This is an initiative to provide cityGML data for all the buildings in the United States. By using other open datasets in conjunction with the researchers' own code and algorithms, the intention is to provide 3D geometries for every US building.

The final addition is a collection of datasets from QIIME 2. The Microbiome research user tutorial datasets contains the user documents and datasets for QIIME 2. QIIME is an extensible and decentralized microbiome analysis package with a focus on data and analysis transparency. It enables researchers to start an analysis with raw DNA sequence data and finish with publication-quality figures and statistical results. 



More Information

Massively Multilingual Image Dataset

Learning Translations via Images with a Massively Multilingual Image Dataset

Atmospheric Deterministic and Probabilistic Forecasts

Scientific Information for Land Owners

Safecast Air Quality and Radiation data

USGS 3DEP LiDAR Point Clouds 

China-Brazil Earth Resources Satellite

Transiting Exoplanet Survey Satellite

Open City Model

Microbiome Research User Tutorial Datasets

Related Articles

Amazon Releases Managed Message Broker Service for ActiveMQ

AWS Lambda for the Impatient Part 1

AWS Lambda for the Impatient Part 2

AWS Lambda for the Impatient Part 3

Amazon Adds Game Dev Options To AWS

Amazon Strengthens Data Offerings

New Amazon Elasticsearch Service
Amazon Introduces Quicksight - Cloud BI

New AWS Managed Services


To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on, Twitter, Facebook or Linkedin.


Find A DevFest Near You

October is with us and October is the main month for DevFests although November has almost as many and there are still some to come in December. You might be surprised how many events there are this y [ ... ]

Wing Python Offers Better Remote Dev Connections

Wing Python 7.1.1 has been updated with better handling of remote development connections, and fixes so that Pandas DataFrame and Series values are displayed correctly among a number of improvements.

 [ ... ]

More News





or email your comment to:

Last Updated ( Wednesday, 06 February 2019 )