Clarence's Wicked Mind: December 2024

Friday, December 27, 2024

Trying out vector embeddings

newsSum is a Google App Engine application that bundle articles from different news sources. To try out ML embeddings, I decided to add a suggestions service.

High-level idea

There will be no changes to the backend of "newssum". The suggestion service "newssum-sug" will be implemented as a separate service
The frontend of "newssum" will check if the suggestion service "newssum-sug" is available. If so, it will allow the user to expand an article and query the suggestion service for additional information to display

Implementation of the suggestion service

Technically, "newssum-sug" could gather suggestions from any sources (e.g. Google search results, a Youtube video etc). But for now, it will process articles from selected "newssum" sources. So, there will be scheduled tasks to collect articles from "newssum" and prepare them for searching.
Vector embeddings will be used to find similar articles. A machine learning model is used to turn a news headline into a vector of numbers. When a query comes in, an embedding will also be generated from that query. By comparing the distance between vectors, we could find articles that are related to the query.
The embeddings generated during batch processing are stored in a vector database. The database will also provide the mechanism for searching vectors by distance.
Since "newssum" is for current news only, embeddings will only be kept for 2 days.
The suggestion service can also be used for free-text search. But for now, the frontend only uses it for article suggestions.

While "newssum" is open source, the "newssum-sug" service is still under development in closed source. But the basic functionality has been integrated and available on the demo site.

Tuesday, December 24, 2024

A machine learning model to identify music albums from photos

I was looking for a home automation project to select and play specific music album from stream services. There are similar ideas of using NFC tags. Basically, it means preparing some NFC tags with album/movie cover arts on them. And putting a tag on the reader will trigger the playback of that album/movie. While it brings the joy of handling and selecting physical collections, it costs money and time to prepare those NFC tags and I wanted to avoid that.

Since now we have those machine learning models and classifiers, I thought I can just train up a model to look at a webcam photo of a record / CD and tell me the Spotify link to play that album.

BTW, I know Microsoft co-pilot (or maybe OpenAI too) can do it without any special training, but then I don't want to pay extra for that and just wanted to host the model on my own machines.

I imagine it will be something like this:

I put an album in front of a webcam...

... and the model will tell me the Spotify URL to pass on to the music streamer

Long story short, my model can a identify my music collection with a 98% correctness (more on that later). If you are interested in the technical details and the scripts used to train the model, they are available on github: https://github.com/kitsook/AlbumSpotter

But eventually I didn't integrate this into my home automation, which is kind of related to the correctness. When I got a new CD / vinyl record, I always add that to my collection on Spotify. So I can just get the cover arts from Spotify to train my model. But then I discovered there are at least two problems that will affect the correctness:

there are many editions of the same album. e.g. I could have a physical CD of the standard edition but my Spotify collection has the extended edition with a different song list
nowadays artists tend to release an album with several "special" cover arts. My physical copy could look totally different from the one on Spotify

That means I will need to cleanup the data for a more accurate result. As procrastination kicks in, I am stopping the project with just the machine learning model and the home automation part will be a future project.