|New Public Datasets Added To AWS|
|Written by Kay Ewbank|
|Wednesday, 06 February 2019|
Amazon has announced nine new AWS public datasets for researchers and developers interested in machine learning, environmental science, geospatial, astronomy, cybersecurity, and housing.
The AWS Public Dataset Program covers the cost of storage for publicly available high-value cloud-optimized datasets. The datasets within it can be used for analysis on AWS, and the aim is also to develop new cloud-native techniques, formats, and tools that lower the cost of working with data.
The machine learning dataset is a massively multilingual image dataset from the University of Pennsylvania. The dataset contains images paired with the words they represent in 100 languages, and the dataset is doubly parallel: for each language, words are stored parallel to images that represent the word, as well as parallel to the word's translation into English. The image below shows five images for the Indonesian word "kucing", a word with high predicted concreteness, along with its top 4 ranked translations using CNN features:
There are three environmental datasets. The first is a set of atmospheric deterministic and probabilistic forecasts from the UK Meteorological Office. This is actually an update to previously available data, but is now updated daily.
The second environmental dataset is a collection of scientific information for land owners from the Queensland Government. The database is made up of Australian climate data from 1889 to the present.
The third collection of environmental data is air quality and radiation data from Safecast. Safecast was started after the Fukushima Daiichi Nuclear Power Plant meltdown, when volunteers began monitoring radiation levels. Air quality measurements were added later, and the project has spread around the world.
There are two new Geospatial datasets; the USGS 3D elevation data, which contains elevation data in the form of light detection and ranging (LiDAR) data over the United States, Hawaii, and the U.S. territories, with data acquired over an 8-year period; and a set of images collected by the China-Brazil Earth Resources Satellite from AMS Kepler.
In the astronomy sector, there's data from the Transiting Exoplanet Survey Satellite (TESS), a two-year survey looking for exoplanets in orbit around bright stars.
The Open City Model data has also been made available. This is an initiative to provide cityGML data for all the buildings in the United States. By using other open datasets in conjunction with the researchers' own code and algorithms, the intention is to provide 3D geometries for every US building.
The final addition is a collection of datasets from QIIME 2. The Microbiome research user tutorial datasets contains the user documents and datasets for QIIME 2. QIIME is an extensible and decentralized microbiome analysis package with a focus on data and analysis transparency. It enables researchers to start an analysis with raw DNA sequence data and finish with publication-quality figures and statistical results.
or email your comment to: firstname.lastname@example.org
|Last Updated ( Wednesday, 06 February 2019 )|