13 KiB
13 KiB
| name | description | argument-hint | allowed-tools |
|---|---|---|---|
| lp-query-optimizer | Optimizes Django ORM queries for league-planner with select_related, prefetch_related, only/defer, bulk operations, and N+1 detection. Use when fixing slow queries. | <model-or-view-name> | Read, Write, Edit, Glob, Grep |
League-Planner Query Optimizer
Optimizes Django 6 ORM queries following league-planner patterns: proper relationship loading, avoiding N+1 queries, using bulk operations, and leveraging PostgreSQL-specific features.
When to Use
- Fixing slow database queries identified in logs or profiling
- Optimizing views that load multiple related objects
- Implementing bulk operations for large data sets
- Reviewing query patterns before deployment
Prerequisites
- Django Debug Toolbar or Silk profiler enabled for query analysis
- Understanding of the model relationships in the app
- Access to query logs (
DEBUG=Trueor logging configured)
Instructions
Step 1: Identify the Problem
Enable query logging in settings:
# leagues/settings.py (development)
LOGGING = {
'handlers': {
'console': {
'class': 'logging.StreamHandler',
},
},
'loggers': {
'django.db.backends': {
'level': 'DEBUG',
'handlers': ['console'],
},
},
}
Or use Django Debug Toolbar / Silk to identify:
- Number of queries per request
- Duplicate queries (N+1 problem)
- Slow queries (>100ms)
Step 2: Apply Optimization Patterns
Pattern 1: select_related for ForeignKey/OneToOne
# BAD: N+1 queries
teams = Team.objects.all()
for team in teams:
print(team.season.league.name) # 2 extra queries per team!
# GOOD: 1 query with JOINs
teams = Team.objects.select_related(
'season',
'season__league',
'country',
).all()
for team in teams:
print(team.season.league.name) # No extra queries
Pattern 2: prefetch_related for ManyToMany/Reverse FK
# BAD: N+1 queries
scenarios = Scenario.objects.all()
for scenario in scenarios:
for match in scenario.matches.all(): # Extra query per scenario
print(match.home_team.name) # Extra query per match!
# GOOD: 2 queries total (scenarios + prefetched matches with teams)
scenarios = Scenario.objects.prefetch_related(
Prefetch(
'matches',
queryset=Match.objects.select_related('home_team', 'away_team', 'day')
)
).all()
Pattern 3: Prefetch with Filtering
from django.db.models import Prefetch
# Prefetch only active matches with their related data
seasons = Season.objects.prefetch_related(
Prefetch(
'scenarios',
queryset=Scenario.objects.filter(is_active=True).only('id', 'name', 'season_id')
),
Prefetch(
'teams',
queryset=Team.objects.select_related('country').filter(is_active=True)
),
)
Pattern 4: only() and defer() for Partial Loading
# Load only needed fields (reduces memory and transfer)
teams = Team.objects.only(
'id', 'name', 'abbreviation', 'country_id'
).select_related('country')
# Defer heavy fields
scenarios = Scenario.objects.defer(
'description', # Large text field
'settings_json', # Large JSON field
).all()
# Combining with values() for aggregation queries
team_stats = Match.objects.filter(
scenario_id=scenario_id
).values(
'home_team_id'
).annotate(
total_home_matches=Count('id'),
total_goals=Sum('home_goals'),
)
Pattern 5: Bulk Operations
from django.db import transaction
# BAD: N individual INSERTs
for data in items:
Match.objects.create(**data)
# GOOD: Single bulk INSERT
with transaction.atomic():
Match.objects.bulk_create([
Match(**data) for data in items
], batch_size=1000)
# Bulk UPDATE
Match.objects.filter(
scenario_id=scenario_id,
is_confirmed=False
).update(
is_confirmed=True,
updated_at=timezone.now()
)
# Bulk UPDATE with different values
from django.db.models import Case, When, Value
Match.objects.filter(id__in=match_ids).update(
status=Case(
When(id=1, then=Value('confirmed')),
When(id=2, then=Value('cancelled')),
default=Value('pending'),
)
)
# bulk_update for different values per object (Django 4.0+)
matches = list(Match.objects.filter(id__in=match_ids))
for match in matches:
match.status = calculate_status(match)
Match.objects.bulk_update(matches, ['status'], batch_size=500)
Step 3: Model-Level Optimizations
Add these to your models for automatic optimization:
class Scenario(models.Model):
# ... fields ...
class Meta:
# Indexes for common query patterns
indexes = [
models.Index(fields=['season', 'is_active']),
models.Index(fields=['created_at']),
# Partial index for active scenarios only
models.Index(
fields=['name'],
condition=models.Q(is_active=True),
name='idx_active_scenario_name'
),
]
# Default manager with common prefetches
@classmethod
def get_with_matches(cls, pk: int):
"""Get scenario with all matches pre-loaded."""
return cls.objects.prefetch_related(
Prefetch(
'matches',
queryset=Match.objects.select_related(
'home_team', 'away_team', 'day', 'kick_off_time'
).order_by('day__number', 'kick_off_time__time')
)
).get(pk=pk)
@classmethod
def get_list_optimized(cls, season_id: int):
"""Optimized query for listing scenarios."""
return cls.objects.filter(
season_id=season_id
).select_related(
'season'
).prefetch_related(
Prefetch(
'matches',
queryset=Match.objects.only('id', 'scenario_id')
)
).annotate(
match_count=Count('matches'),
confirmed_count=Count('matches', filter=Q(matches__is_confirmed=True)),
).order_by('-created_at')
Query Optimization Patterns
Common Relationships in League-Planner
League
└─ select_related: (none - top level)
└─ prefetch_related: seasons, managers, spectators
Season
└─ select_related: league
└─ prefetch_related: teams, scenarios, memberships
Scenario
└─ select_related: season, season__league
└─ prefetch_related: matches, optimization_runs
Match
└─ select_related: scenario, home_team, away_team, day, kick_off_time, stadium
└─ prefetch_related: (usually none)
Team
└─ select_related: season, country, stadium
└─ prefetch_related: home_matches, away_matches, players
Draw
└─ select_related: season
└─ prefetch_related: groups, groups__teams, constraints
Group
└─ select_related: super_group, super_group__draw
└─ prefetch_related: teams, teamsingroup
Aggregation Patterns
from django.db.models import Count, Sum, Avg, Max, Min, F, Q
# Count related objects
seasons = Season.objects.annotate(
team_count=Count('teams'),
active_scenarios=Count('scenarios', filter=Q(scenarios__is_active=True)),
)
# Subquery for complex aggregations
from django.db.models import Subquery, OuterRef
latest_run = OptimizationRun.objects.filter(
scenario=OuterRef('pk')
).order_by('-created_at')
scenarios = Scenario.objects.annotate(
latest_score=Subquery(latest_run.values('score')[:1]),
latest_run_date=Subquery(latest_run.values('created_at')[:1]),
)
# Window functions (PostgreSQL)
from django.db.models import Window
from django.db.models.functions import Rank, RowNumber
teams = Team.objects.annotate(
season_rank=Window(
expression=Rank(),
partition_by=F('season'),
order_by=F('points').desc(),
)
)
PostgreSQL-Specific Optimizations
# Array aggregation
from django.contrib.postgres.aggregates import ArrayAgg, StringAgg
seasons = Season.objects.annotate(
team_names=ArrayAgg('teams__name', ordering='teams__name'),
team_list=StringAgg('teams__name', delimiter=', '),
)
# JSON aggregation
from django.db.models.functions import JSONObject
from django.contrib.postgres.aggregates import JSONBAgg
scenarios = Scenario.objects.annotate(
match_summary=JSONBAgg(
JSONObject(
id='matches__id',
home='matches__home_team__name',
away='matches__away_team__name',
),
filter=Q(matches__is_final=True),
)
)
# Full-text search
from django.contrib.postgres.search import SearchVector, SearchQuery, SearchRank
teams = Team.objects.annotate(
search=SearchVector('name', 'city', 'stadium__name'),
).filter(
search=SearchQuery('bayern munich')
)
Caching Query Results
from django.core.cache import cache
def get_season_teams(season_id: int) -> list:
"""Get teams with caching."""
cache_key = f'season:{season_id}:teams'
teams = cache.get(cache_key)
if teams is None:
teams = list(
Team.objects.filter(season_id=season_id)
.select_related('country')
.values('id', 'name', 'country__name', 'country__code')
)
cache.set(cache_key, teams, timeout=300) # 5 minutes
return teams
# Invalidate on changes
def invalidate_season_cache(season_id: int):
cache.delete(f'season:{season_id}:teams')
cache.delete(f'season:{season_id}:scenarios')
Examples
Example 1: Optimizing Schedule View
# BEFORE: ~500 queries for 300 matches
def schedule_view(request, scenario_id):
scenario = Scenario.objects.get(pk=scenario_id)
matches = scenario.matches.all() # N+1 for each match's teams, day, etc.
context = {'matches': matches}
return render(request, 'schedule.html', context)
# AFTER: 3 queries total
def schedule_view(request, scenario_id):
scenario = Scenario.objects.select_related(
'season',
'season__league',
).get(pk=scenario_id)
matches = Match.objects.filter(
scenario=scenario
).select_related(
'home_team',
'home_team__country',
'away_team',
'away_team__country',
'day',
'kick_off_time',
'stadium',
).order_by('day__number', 'kick_off_time__time')
context = {
'scenario': scenario,
'matches': matches,
}
return render(request, 'schedule.html', context)
Example 2: Bulk Match Creation
# BEFORE: Slow - one INSERT per match
def create_matches(scenario, match_data_list):
for data in match_data_list:
Match.objects.create(scenario=scenario, **data)
# AFTER: Fast - bulk INSERT
from django.db import transaction
def create_matches(scenario, match_data_list):
matches = [
Match(
scenario=scenario,
home_team_id=data['home_team_id'],
away_team_id=data['away_team_id'],
day_id=data.get('day_id'),
kick_off_time_id=data.get('kick_off_time_id'),
)
for data in match_data_list
]
with transaction.atomic():
Match.objects.bulk_create(matches, batch_size=500)
return len(matches)
Example 3: Complex Reporting Query
def get_season_report(season_id: int) -> dict:
"""Generate comprehensive season report with optimized queries."""
from django.db.models import Count, Avg, Sum, Q, F
from django.db.models.functions import TruncDate
# Single query for team statistics
team_stats = Team.objects.filter(
season_id=season_id
).annotate(
home_matches=Count('home_matches'),
away_matches=Count('away_matches'),
total_distance=Sum('away_matches__distance'),
).values('id', 'name', 'home_matches', 'away_matches', 'total_distance')
# Single query for scenario comparison
scenarios = Scenario.objects.filter(
season_id=season_id,
is_active=True,
).annotate(
match_count=Count('matches'),
confirmed_pct=Count('matches', filter=Q(matches__is_confirmed=True)) * 100.0 / Count('matches'),
avg_distance=Avg('matches__distance'),
).values('id', 'name', 'match_count', 'confirmed_pct', 'avg_distance')
# Matches by day aggregation
matches_by_day = Match.objects.filter(
scenario__season_id=season_id,
scenario__is_active=True,
).values(
'day__number'
).annotate(
count=Count('id'),
).order_by('day__number')
return {
'teams': list(team_stats),
'scenarios': list(scenarios),
'matches_by_day': list(matches_by_day),
}
Common Pitfalls
- select_related on M2M: Only use for FK/O2O; use prefetch_related for M2M
- Chained prefetch: Remember prefetched data is cached; re-filtering creates new queries
- values() with related: Use
values('related__field')carefully; it can create JOINs - Forgetting batch_size: bulk_create/bulk_update without batch_size can cause memory issues
- Ignoring database indexes: Ensure fields in WHERE/ORDER BY have proper indexes
Verification
Use Django Debug Toolbar or these queries to verify:
from django.db import connection, reset_queries
from django.conf import settings
settings.DEBUG = True
reset_queries()
# Run your code here
result = my_function()
# Check queries
print(f"Total queries: {len(connection.queries)}")
for q in connection.queries:
print(f"{q['time']}s: {q['sql'][:100]}...")
Or with Silk profiler at /silk/ when USE_SILK=True.