Quantcast
Channel: InsidetheRss – La Biblia de la IA – The Bible of AI™ Journal
Viewing all articles
Browse latest Browse all 25

Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP

$
0
0
In this survey, we connect several lines of work from the pre-neural and neural era, by showing how hybrid approaches of words and characters as well as subword-based approaches based on learned segmentation have been proposed and evaluated. We conclude that there is and likely will never be a silver bullet singular solution for all applications and that thinking seriously about tokenization remains important for many applications

Viewing all articles
Browse latest Browse all 25

Trending Articles