Louis Bouchard has compiled a great list of research papers covering AI breakthroughs that were published during last year. His introductions and links to the papers and even to code make this a great resource.
I first reported on What's AI, as Loius Bouchard is also known, with his curated list of AI research papers for 2021 and it is good to see he has continued with his mission to explain artificial intelligence in simple terms and share the new research state and applications for everyone.
Of course, 2022 was the year of ChatGPT and DALL·E 2 but that doesn't preclude quality research in other areas too, as seen by examining his list of 32 papers:
[1] Resolution-robust Large Mask Inpainting with Fourier Convolutions
[2] Stitch it in Time: GAN-Based Facial Editing of Real Videos
[3] NeROIC: Neural Rendering of Objects from Online Image Collections
[5]Towards real-world blind face restoration with generative facial prior
[6] D-Net for Learned Multi-Modal Alignment
[7] Instant Neural Graphics Primitives with a Multiresolution Hash Encoding
[8] Hierarchical Text-Conditional Image Generation with CLIP Latents
[9] MyStyle: A Personalized Generative Prior
[10] OPT: Open Pre-trained Transformer Language Models
[11] BlobGAN: Spatially Disentangled Scene Representations
[12] A Generalist Agent
[13] Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
[14] Dalle mini
[15] No Language Left Behind: Scaling Human-Centered Machine Translation
[16] Dual-Shutter Optical Vibration Sensing
[17] Make-a-scene: Scene-based text-to-image generation with human priors
[18] BANMo: Building Animatable 3D Neural Models from Many Casual Videos
[19] High-resolution image synthesis with latent diffusion models
[20] Panoptic Scene Graph Generation
[21] An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion
[22] Expanding Language-Image Pretrained Models for General Video Recognition
[23] Make-a-video: text-to-video generation without text-video data
[24] Robust Speech Recognition via Large-Scale Weak Supervision
[25] DreamFusion: Text-to-3D using 2D Diffusion
[26] Imagic: Text-Based Real Image Editing with Diffusion Models
[27] eDiffi: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
[28] InfiniteNature-Zero: Learning Perpetual View Generation of Natural Scenes from Single Images
[29] Galactica: A Large Language Model for Science
[30] Real-time Neural Radiance Talking Portrait Synthesis via Audio-spatial Decomposition
[31] ChatGPT: Optimizing Language Models for Dialogue
[32] Production-Ready Face Re-Aging for Visual Effects
The 2022 AI Recap on the louisbouchard.ai website includes short explanations behind the research of each paper and videos explaining the concepts. There's a lot to go through but from that list I singled out a few most notable:
Stitch it in Time: GAN-Based Facial Editing of Real Videos Add effects to videos such as smiles or make the subjects look younger or older;all that automatically by using AI-based algorithms, something that previously required expensive software and hardware to do.
Hierarchical Text-Conditional Image Generation with CLIP Latents OpenAI's new model DALL·E 2 is amazing. DALL·E could generate images from text inputs but DALL·E 2 goes beyond that by even editing those images to make them look even better.
MyStyle: A Personalized Generative Prior This new model by Google Research and Tel-Aviv University is incredible. You can create very realistic deepfakes which makes the disinformation campaigns even more scary!
OPT: Open Pre-trained Transformer Language Models GPT-3 is a model developed by OpenAI that you can access through a paid API but have no access to the model itself, but Meta's new model OPT is GPT-3's closest competitor and open source too!
No Language Left Behind: Scaling Human-Centered Machine Translation Meta AI’s most recent model, called “No Language Left Behind” does exactly that: translates across 200 different languages with state-of-the-art quality. A single model can handle 200 languages.
Galactica: A Large Language Model for Science Galactica is a large language model with a size comparable to GPT-3, but specialized on scientific knowledge. The model can write whitepapers, reviews, Wikipedia pages, and code. It knows how to cite and how to write equations.
And more. You can literally spend hours going through that list watching the videos and skimming through the papers. If you however don't feel like delving that deep, there's also a short 8 minutes video that goes through all of the research quickly.
Jamba-Instruct, the instruction-following large language model developed by AI21Labs is now available in Amazon Bedrock. Built for reliable commercial use Jamba-Instruct is a boon for Amazon [ ... ]
Stephen Wolfram has just produced a very long blog post with the title "What’s Really Going On in Machine Learning? Some Minimal Models". Is it possible he knows?