--- name: lp-query-optimizer description: Optimizes Django ORM queries for league-planner with select_related, prefetch_related, only/defer, bulk operations, and N+1 detection. Use when fixing slow queries. argument-hint: allowed-tools: Read, Write, Edit, Glob, Grep --- # League-Planner Query Optimizer Optimizes Django 5.2 ORM queries following league-planner patterns: proper relationship loading, avoiding N+1 queries, using bulk operations, and leveraging PostgreSQL-specific features. ## When to Use - Fixing slow database queries identified in logs or profiling - Optimizing views that load multiple related objects - Implementing bulk operations for large data sets - Reviewing query patterns before deployment ## Prerequisites - Django Debug Toolbar or Silk profiler enabled for query analysis - Understanding of the model relationships in the app - Access to query logs (`DEBUG=True` or logging configured) ## Instructions ### Step 1: Identify the Problem Enable query logging in settings: ```python # leagues/settings.py (development) LOGGING = { 'handlers': { 'console': { 'class': 'logging.StreamHandler', }, }, 'loggers': { 'django.db.backends': { 'level': 'DEBUG', 'handlers': ['console'], }, }, } ``` Or use Django Debug Toolbar / Silk to identify: - Number of queries per request - Duplicate queries (N+1 problem) - Slow queries (>100ms) ### Step 2: Apply Optimization Patterns #### Pattern 1: select_related for ForeignKey/OneToOne ```python # BAD: N+1 queries teams = Team.objects.all() for team in teams: print(team.season.league.name) # 2 extra queries per team! # GOOD: 1 query with JOINs teams = Team.objects.select_related( 'season', 'season__league', 'country', ).all() for team in teams: print(team.season.league.name) # No extra queries ``` #### Pattern 2: prefetch_related for ManyToMany/Reverse FK ```python # BAD: N+1 queries scenarios = Scenario.objects.all() for scenario in scenarios: for match in scenario.matches.all(): # Extra query per scenario print(match.home_team.name) # Extra query per match! # GOOD: 2 queries total (scenarios + prefetched matches with teams) scenarios = Scenario.objects.prefetch_related( Prefetch( 'matches', queryset=Match.objects.select_related('home_team', 'away_team', 'day') ) ).all() ``` #### Pattern 3: Prefetch with Filtering ```python from django.db.models import Prefetch # Prefetch only active matches with their related data seasons = Season.objects.prefetch_related( Prefetch( 'scenarios', queryset=Scenario.objects.filter(is_active=True).only('id', 'name', 'season_id') ), Prefetch( 'teams', queryset=Team.objects.select_related('country').filter(is_active=True) ), ) ``` #### Pattern 4: only() and defer() for Partial Loading ```python # Load only needed fields (reduces memory and transfer) teams = Team.objects.only( 'id', 'name', 'abbreviation', 'country_id' ).select_related('country') # Defer heavy fields scenarios = Scenario.objects.defer( 'description', # Large text field 'settings_json', # Large JSON field ).all() # Combining with values() for aggregation queries team_stats = Match.objects.filter( scenario_id=scenario_id ).values( 'home_team_id' ).annotate( total_home_matches=Count('id'), total_goals=Sum('home_goals'), ) ``` #### Pattern 5: Bulk Operations ```python from django.db import transaction # BAD: N individual INSERTs for data in items: Match.objects.create(**data) # GOOD: Single bulk INSERT with transaction.atomic(): Match.objects.bulk_create([ Match(**data) for data in items ], batch_size=1000) # Bulk UPDATE Match.objects.filter( scenario_id=scenario_id, is_confirmed=False ).update( is_confirmed=True, updated_at=timezone.now() ) # Bulk UPDATE with different values from django.db.models import Case, When, Value Match.objects.filter(id__in=match_ids).update( status=Case( When(id=1, then=Value('confirmed')), When(id=2, then=Value('cancelled')), default=Value('pending'), ) ) # bulk_update for different values per object (Django 4.0+) matches = list(Match.objects.filter(id__in=match_ids)) for match in matches: match.status = calculate_status(match) Match.objects.bulk_update(matches, ['status'], batch_size=500) ``` ### Step 3: Model-Level Optimizations Add these to your models for automatic optimization: ```python class Scenario(models.Model): # ... fields ... class Meta: # Indexes for common query patterns indexes = [ models.Index(fields=['season', 'is_active']), models.Index(fields=['created_at']), # Partial index for active scenarios only models.Index( fields=['name'], condition=models.Q(is_active=True), name='idx_active_scenario_name' ), ] # Default manager with common prefetches @classmethod def get_with_matches(cls, pk: int): """Get scenario with all matches pre-loaded.""" return cls.objects.prefetch_related( Prefetch( 'matches', queryset=Match.objects.select_related( 'home_team', 'away_team', 'day', 'kick_off_time' ).order_by('day__number', 'kick_off_time__time') ) ).get(pk=pk) @classmethod def get_list_optimized(cls, season_id: int): """Optimized query for listing scenarios.""" return cls.objects.filter( season_id=season_id ).select_related( 'season' ).prefetch_related( Prefetch( 'matches', queryset=Match.objects.only('id', 'scenario_id') ) ).annotate( match_count=Count('matches'), confirmed_count=Count('matches', filter=Q(matches__is_confirmed=True)), ).order_by('-created_at') ``` ## Query Optimization Patterns ### Common Relationships in League-Planner ``` League └─ select_related: (none - top level) └─ prefetch_related: seasons, managers, spectators Season └─ select_related: league └─ prefetch_related: teams, scenarios, memberships Scenario └─ select_related: season, season__league └─ prefetch_related: matches, optimization_runs Match └─ select_related: scenario, home_team, away_team, day, kick_off_time, stadium └─ prefetch_related: (usually none) Team └─ select_related: season, country, stadium └─ prefetch_related: home_matches, away_matches, players Draw └─ select_related: season └─ prefetch_related: groups, groups__teams, constraints Group └─ select_related: super_group, super_group__draw └─ prefetch_related: teams, teamsingroup ``` ### Aggregation Patterns ```python from django.db.models import Count, Sum, Avg, Max, Min, F, Q # Count related objects seasons = Season.objects.annotate( team_count=Count('teams'), active_scenarios=Count('scenarios', filter=Q(scenarios__is_active=True)), ) # Subquery for complex aggregations from django.db.models import Subquery, OuterRef latest_run = OptimizationRun.objects.filter( scenario=OuterRef('pk') ).order_by('-created_at') scenarios = Scenario.objects.annotate( latest_score=Subquery(latest_run.values('score')[:1]), latest_run_date=Subquery(latest_run.values('created_at')[:1]), ) # Window functions (PostgreSQL) from django.db.models import Window from django.db.models.functions import Rank, RowNumber teams = Team.objects.annotate( season_rank=Window( expression=Rank(), partition_by=F('season'), order_by=F('points').desc(), ) ) ``` ### PostgreSQL-Specific Optimizations ```python # Array aggregation from django.contrib.postgres.aggregates import ArrayAgg, StringAgg seasons = Season.objects.annotate( team_names=ArrayAgg('teams__name', ordering='teams__name'), team_list=StringAgg('teams__name', delimiter=', '), ) # JSON aggregation from django.db.models.functions import JSONObject from django.contrib.postgres.aggregates import JSONBAgg scenarios = Scenario.objects.annotate( match_summary=JSONBAgg( JSONObject( id='matches__id', home='matches__home_team__name', away='matches__away_team__name', ), filter=Q(matches__is_final=True), ) ) # Full-text search from django.contrib.postgres.search import SearchVector, SearchQuery, SearchRank teams = Team.objects.annotate( search=SearchVector('name', 'city', 'stadium__name'), ).filter( search=SearchQuery('bayern munich') ) ``` ### Caching Query Results ```python from django.core.cache import cache def get_season_teams(season_id: int) -> list: """Get teams with caching.""" cache_key = f'season:{season_id}:teams' teams = cache.get(cache_key) if teams is None: teams = list( Team.objects.filter(season_id=season_id) .select_related('country') .values('id', 'name', 'country__name', 'country__code') ) cache.set(cache_key, teams, timeout=300) # 5 minutes return teams # Invalidate on changes def invalidate_season_cache(season_id: int): cache.delete(f'season:{season_id}:teams') cache.delete(f'season:{season_id}:scenarios') ``` ## Examples ### Example 1: Optimizing Schedule View ```python # BEFORE: ~500 queries for 300 matches def schedule_view(request, scenario_id): scenario = Scenario.objects.get(pk=scenario_id) matches = scenario.matches.all() # N+1 for each match's teams, day, etc. context = {'matches': matches} return render(request, 'schedule.html', context) # AFTER: 3 queries total def schedule_view(request, scenario_id): scenario = Scenario.objects.select_related( 'season', 'season__league', ).get(pk=scenario_id) matches = Match.objects.filter( scenario=scenario ).select_related( 'home_team', 'home_team__country', 'away_team', 'away_team__country', 'day', 'kick_off_time', 'stadium', ).order_by('day__number', 'kick_off_time__time') context = { 'scenario': scenario, 'matches': matches, } return render(request, 'schedule.html', context) ``` ### Example 2: Bulk Match Creation ```python # BEFORE: Slow - one INSERT per match def create_matches(scenario, match_data_list): for data in match_data_list: Match.objects.create(scenario=scenario, **data) # AFTER: Fast - bulk INSERT from django.db import transaction def create_matches(scenario, match_data_list): matches = [ Match( scenario=scenario, home_team_id=data['home_team_id'], away_team_id=data['away_team_id'], day_id=data.get('day_id'), kick_off_time_id=data.get('kick_off_time_id'), ) for data in match_data_list ] with transaction.atomic(): Match.objects.bulk_create(matches, batch_size=500) return len(matches) ``` ### Example 3: Complex Reporting Query ```python def get_season_report(season_id: int) -> dict: """Generate comprehensive season report with optimized queries.""" from django.db.models import Count, Avg, Sum, Q, F from django.db.models.functions import TruncDate # Single query for team statistics team_stats = Team.objects.filter( season_id=season_id ).annotate( home_matches=Count('home_matches'), away_matches=Count('away_matches'), total_distance=Sum('away_matches__distance'), ).values('id', 'name', 'home_matches', 'away_matches', 'total_distance') # Single query for scenario comparison scenarios = Scenario.objects.filter( season_id=season_id, is_active=True, ).annotate( match_count=Count('matches'), confirmed_pct=Count('matches', filter=Q(matches__is_confirmed=True)) * 100.0 / Count('matches'), avg_distance=Avg('matches__distance'), ).values('id', 'name', 'match_count', 'confirmed_pct', 'avg_distance') # Matches by day aggregation matches_by_day = Match.objects.filter( scenario__season_id=season_id, scenario__is_active=True, ).values( 'day__number' ).annotate( count=Count('id'), ).order_by('day__number') return { 'teams': list(team_stats), 'scenarios': list(scenarios), 'matches_by_day': list(matches_by_day), } ``` ## Common Pitfalls - **select_related on M2M**: Only use for FK/O2O; use prefetch_related for M2M - **Chained prefetch**: Remember prefetched data is cached; re-filtering creates new queries - **values() with related**: Use `values('related__field')` carefully; it can create JOINs - **Forgetting batch_size**: bulk_create/bulk_update without batch_size can cause memory issues - **Ignoring database indexes**: Ensure fields in WHERE/ORDER BY have proper indexes ## Verification Use Django Debug Toolbar or these queries to verify: ```python from django.db import connection, reset_queries from django.conf import settings settings.DEBUG = True reset_queries() # Run your code here result = my_function() # Check queries print(f"Total queries: {len(connection.queries)}") for q in connection.queries: print(f"{q['time']}s: {q['sql'][:100]}...") ``` Or with Silk profiler at `/silk/` when `USE_SILK=True`.