Posts
- Django and manual data prefetch
- Django, window functions and paginator
- How I discovered MPV
- Pake - wrapping websites
- Accordion with animated height
- Nginx: HTTP and HTTPS on a single port
- NeoMutt: new Thunderbird
- How I solved the dilemma of personal notes
- Django model constraints
- Simple keyboard udev rule
- Kitchen LEDs and Home Assistant
- Home Assistant Nginx proxy server
- Rust and it's ability to insert/replace string into another string onto specific position
- Python HTML table parser without dependencies
- Python defaultdict with data
Django and manual data prefetch
Django supports prefething related data based on 1:N or M:N relations. This technique works well and saves database from frequent N+1 queries while all the data are prefetched at once and then kept in memory. But what if you have a complex class - let's say a availability calculator - that operates on many set's of data. Such datasets are related one to each other and you work with them in a bunch of loops. Is it better to fetch all the data at once? Or in batches? Or don't prefetch data at all? Let's look onto this.
Data model
Let's say you have 3 entities
- cars
- trips
- defects
A car can make a trip. Cars are checked regularly and can have some defects which mechanics need to repair.
Now let's say you have a transportation company that owns 100 cars and each car makes 100 trips a month. Also, a car usually has 5 arbitrary defects a week. All of that creates an interesting dataset that with 100k trips and 2k defects a month.
Availability calculator
Let's say we need some type of calculator that takes a car or a list of cars and calculates if the car(s) are operable or not. You don't wanna broken cars transport cargo and this calculator helps you with that.
The calculator needs to work with all 3 types of entities. We know that there is 100 * 100 trips per month. Also 5 * 4 defects per car fer month. Now it's time to think about how to fetch inside the calculator class in regard later data manipulation.
All-at-once fetch
One method how to fetch data is to fetch cars and prefetch trips and defects.
Car.objects.prefetch_related("trip_set", "defect_set").all()
That would give you pretty much all you need but there are some downsides:
- data might be big than expected thus ORM mapping might be slow
- data are nested - trips and defects are hidden under a car
- you cannot work with all defects and trips as a whole
The last point is especially inconvenient. You might need to query trips as a whole over all cars. That's impossible if data are nested. The same is true for defects.
Fetch in batches
Quite unusual (or one might think an oldschool) technique is to fetch all the needed cars, trips and defects. The thing here is to re-store the records in memory under a key that you are going to use the most - probably car ID.
from collections import defaultdict cars = [] # list of car IDs trips = defaultdict(list) for t in Trip.objects.filter(car__in=cars): trips[t.car.pk].append(t)
This small piece of code gives you the possibility to fetch all trips for all the cars at once + gives you the flexibility having them all in one heap yet you can still identify then by it's relation. You can do the same for defects. To query trips or defects is simple:
car_trips = trips[car.pk]
To fetch some specific trips/defects for a car or even multiple cars I recommend using filter():
filter(trips, lamdda t: t.destination = "London") filter(defects[car.pk], lamdda d: d.severity = "light")