|Google Open Sources Albert NLP|
|Written by Kay Ewbank|
|Tuesday, 21 January 2020|
Google has made ALBERT (A Lite BERT) available in an open source version. ALBERT is a deep-learning natural language processing model that the developers say uses far fewer parameters than BERT without sacrificing accuracy.
Bidirectional Encoder Representations from Transformers, or BERT, is the self-supervised method released by Google in 2018. It has become known for the impressive results the technique has achieved on a range of NLP tasks while relying on un-annotated text drawn from the web. Most similar NLP systems are based on text that has been labeled specifically for a given task.
ALBERT is an upgrade to BERT that offers improved performance on 12 NLP tasks, including the competitive Stanford Question Answering Dataset (SQuAD v2.0) and the SAT-style reading comprehension RACE benchmark. Albert is being released as an open-source implementation on top of TensorFlow, and includes a number of ready-to-use pre-trained language representation models.
According to a paper given by its developers to the International Conference on Learning Representations, ALBERT reduces model sizes in two ways - by sharing parameters across the hidden layers of the network, and by factorizing the embedding layer.
The researchers say that the key to optimizing performance is to allocate the model’s capacity more efficiently. Input-level embeddings need to learn context-independent representations, a representation for the word “bank”, for example. In contrast, hidden-layer embeddings need to refine that into context-dependent representations, so you need a representation for “bank” in the context of financial transactions, and a different representation for “bank” in the context of river-flow management.
Using the two techniques of sharing parameters and factorizing the embedding layer cuts the parameters for a baseline model from BERT's 108M to just 12M. The accuracy does drop from an average of 82.3% to 80.1%, but this is a small amount given the advantages of the cut in parameters. One benefit of the cut in parameters is the option of scaling up the model further. The developers say that assuming that memory size allows, one can scale up the size of the hidden-layer embeddings by 10-20x.
The researchers conclude that:
"The success of ALBERT demonstrates the importance of identifying the aspects of a model that give rise to powerful contextual representations. By focusing improvement efforts on these aspects of the model architecture, it is possible to greatly improve both the model efficiency and performance on a wide range of NLP tasks."
or email your comment to: firstname.lastname@example.org
|Last Updated ( Tuesday, 21 January 2020 )|