newsSum is a Google App Engine application that bundle articles from different news sources. To try out ML embeddings, I decided to add a suggestions service.
High-level idea
- There will be no changes to the backend of "newssum". The suggestion service "newssum-sug" will be implemented as a separate service
- The frontend of "newssum" will check if the suggestion service "newssum-sug" is available. If so, it will allow the user to expand an article and query the suggestion service for additional information to display
Implementation of the suggestion service
- Technically, "newssum-sug" could gather suggestions from any sources (e.g. Google search results, a Youtube video etc). But for now, it will process articles from selected "newssum" sources. So, there will be scheduled tasks to collect articles from "newssum" and prepare them for searching.
- Vector embeddings will be used to find similar articles. A machine learning model is used to turn a news headline into a vector of numbers. When a query comes in, an embedding will also be generated from that query. By comparing the distance between vectors, we could find articles that are related to the query.
- The embeddings generated during batch processing are stored in a vector database. The database will also provide the mechanism for searching vectors by distance.
- Since "newssum" is for current news only, embeddings will only be kept for 2 days.
- The suggestion service can also be used for free-text search. But for now, the frontend only uses it for article suggestions.