Why aren't you using pretrained models?

Why aren’t you using pretrained models?

Pretrained neural networks have reached the point where they are good enough for many applications without further training. Many models trained on billions of parameters are freely available. However, not every company has people with machine learning experience. It is true that domain knowledge and care are required to build and deploy robust ML pipelines for end-users.

There is, however, huge potential in applying simple ML solutions to internal or personal challenges. To show how simply this can be done, let’s build a semantic search function that could be of use for anyone tasked with writing (English) text.

A simple matter of programming, machine learning, pretraining

A dictionary is a valuable tool for writers. However, modern dictionaries that come with popular operating systems have been found to be “dry, functional, almost bureaucratically sapped of color or pop”.

Another issue with dictionaries is that they are one-directional. To find better words and expressions, you need to think of a word, look it up, and then chase references to explore the possibilities. What if, in addition to this forward search, a computer could look in the other direction (from the definitions to the words)?

We can address these points by applying a pretrained model to build a “semantic” version of Webster’s 1913 dictionary. What follows is a quick overview of the idea. Then you may want to look at the code, even if you are not a programmer: we need about 16 lines of Python to load the data, run it through a neural network, index it and start searching.

Building a reverse dictionary

We’ll use a technique called sentence embedding to make the definitions and examples from Webster’s dictionary searchable. This is the semantic part of semantic search. Essentially, the meaning of a phrase is encoded into a vector of numbers, the output of a neural network.

With the Webster-Vectors in memory (we get about 270000 from the dictionary used), we can now query this dataset by encoding a search phrase into a query vector. To search for words, we compare how close the query is to vectors in the dataset.

To find similar vectors, we run a nearest neighbor algorithm. It takes the query vector as input, looks through our dataset, and provides, for example, the top ten closest results. All that’s left to do for us is to return the words associated with these neighbor vectors. This will (ideally) result in a list of words close to the meaning of the search phrase.

As an example, the phrase “I’m lost for words” yields: astoundment, bewildered, blank, confus, distraught, perplexly, stagger, stound. Please find the implementation in this github repo.

Only a few lines of simple code and some compute is needed to do something useful with pretrained models. Not everything needs to be about “big data”, and by applying ML you may find it can add “big meaning” to your daily challenges. As the methods of “AI” (really, neural networks) have reached a point of consolidation, now is a good time to give them a try even if you haven’t worked with ML before.

You can find me on LinkedIn and Github