CosmosDB Query Optimization: Case-(in)sensitive Searches

October 23, 2018
cosmosdb

I recently added some performance logging to my CosmosDB data access layer and I discovered that a query was using over 2000 RUs. The offending query was something that seemed pretty innocent, merely converting an email field to lowercase to see if a user with that email existing in the system.

CosmosDB doesn’t have support for case-insensitive queries so converting to lowercase seemed like a simple solution for ensuring lookups could successfully match whatever case the value was stored in. After seeing that there would be an ever-increasing penalty as more records are added to the collection I knew something needed to be done. I decided to change my peristence layer to always store the email field in lowercase. This allows me to not have to convert each record to lowercase during the query and saves a ton of RUs. The average cost for this query dropped from ~2090 RUs to ~4 RUs, which is a crazy amount of savings. I did have to write a bit of migration code to convert all existing email fields to lowercase but now everything is much more efficient.

I had a similar issue with a query on a site administration page that would look allow searching for users that partially match a given string within their email, first name, or last name. This also was naively converting all three fields to lowercase to perform the lookup. Since the email field is already lowercase (after the migration), I needed to manage the first and last name fields as well. It didn’t make sense to force these fields to be lowercase in the database because we’d lose any special casing the users would have in their names (e.g. McCormick, etc.). I got around this by adding more fields to my object (firstNameLower and lastNameLower) which could be used in the query and not incur the cost of lowercase conversion for proper matching.

This is further proof that instrumenting your applications is important because the collected data can help test hypotheses and find issues you didn’t know existed. This can translate into real world savings of both time and money as you ensure things are performing as optimimally as possible.