Start_Kernel 2 (Week 1)
Start_Kernel 2 (Week 1)

Start_Kernel 2 (Week 1)

카테고리
⚙️ Start Kernel
작성자
박용성박용성
작성일
2024년 09월 05일
태그
C
Linux
Slug
start-kernel2-1
floatFirstTOC: right

dd

/* * This is the main entry point to direct page reclaim. * * If a full scan of the inactive list fails to free enough memory then we * are "out of memory" and something needs to be killed. * * If the caller is !__GFP_FS then the probability of a failure is reasonably * high - the zone may be full of dirty or under-writeback pages, which this * caller can't do much about. We kick the writeback threads and take explicit * naps in the hope that some of these pages can be written. But if the * allocating task holds filesystem locks which prevent writeout this might not * work, and the allocation attempt will fail. * * returns: 0, if no pages reclaimed * else, the number of pages reclaimed */ static unsigned long do_try_to_free_pages(struct zonelist *zonelist, struct scan_control *sc) { int initial_priority = sc->priority; pg_data_t *last_pgdat; struct zoneref *z; struct zone *zone; retry: delayacct_freepages_start(); if (!cgroup_reclaim(sc)) __count_zid_vm_events(ALLOCSTALL, sc->reclaim_idx, 1); do { if (!sc->proactive) vmpressure_prio(sc->gfp_mask, sc->target_mem_cgroup, sc->priority); sc->nr_scanned = 0; shrink_zones(zonelist, sc); if (sc->nr_reclaimed >= sc->nr_to_reclaim) break; if (sc->compaction_ready) break; /* * If we're getting trouble reclaiming, start doing * writepage even in laptop mode. */ if (sc->priority < DEF_PRIORITY - 2) sc->may_writepage = 1; } while (--sc->priority >= 0); last_pgdat = NULL; for_each_zone_zonelist_nodemask(zone, z, zonelist, sc->reclaim_idx, sc->nodemask) { if (zone->zone_pgdat == last_pgdat) continue; last_pgdat = zone->zone_pgdat; snapshot_refaults(sc->target_mem_cgroup, zone->zone_pgdat); if (cgroup_reclaim(sc)) { struct lruvec *lruvec; lruvec = mem_cgroup_lruvec(sc->target_mem_cgroup, zone->zone_pgdat); clear_bit(LRUVEC_CGROUP_CONGESTED, &lruvec->flags); } } delayacct_freepages_end(); if (sc->nr_reclaimed) return sc->nr_reclaimed; /* Aborted reclaim to try compaction? don't OOM, then */ if (sc->compaction_ready) return 1; /* * We make inactive:active ratio decisions based on the node's * composition of memory, but a restrictive reclaim_idx or a * memory.low cgroup setting can exempt large amounts of * memory from reclaim. Neither of which are very common, so * instead of doing costly eligibility calculations of the * entire cgroup subtree up front, we assume the estimates are * good, and retry with forcible deactivation if that fails. */ if (sc->skipped_deactivate) { sc->priority = initial_priority; sc->force_deactivate = 1; sc->skipped_deactivate = 0; goto retry; } /* Untapped cgroup reserves? Don't OOM, retry. */ if (sc->memcg_low_skipped) { sc->priority = initial_priority; sc->force_deactivate = 0; sc->memcg_low_reclaim = 1; sc->memcg_low_skipped = 0; goto retry; } return 0; }
 

 
__GFP_FS • 메모리 할당을 하는 동안 File System calls 가능하도록 요청합니다.
This is the main entry point to direct page reclaim. If a full scan of the inactive list fails to free enough memory then we are "out of memory" and something needs to be killed.
메모리 할당 중 여유 메모리가 부족할때, 메모리 할당 플래그에 따라서 메모리 할당을 요청한 스레드가 직접 페이지 프레임을 회수하는 direct reclaim 작업을 수행한다. 이 함수는 direct page reclaim 을 위한 main entry 라고 주석으로 설명되어있다.

매개변수

@zonelist @sc 메모리 페이지 회수에 필요한 정보 (몇 개의 페이지를 회수해야하는가, 어떤 페이지만 스캔할 것인가 등등

반환값

회수한 페이지가 없으면 0, 그렇지 않다면 회수한 페이지의 개수

콜 스택

...()
-> __perform_reclaim()
-> try_to_free_pages()
-> throttle_direct_reclaim()
-> do_try_to_free_pages()
-> shrink_zones()
 
6201 lines 의 반복문 priority 를 0까지 줄여가면서 페이지 회수 시도 실제 회수는 shrink_zones() 에서 실행
do { if (!sc->proactive) vmpressure_prio(sc->gfp_mask, sc->target_mem_cgroup, sc->priority); sc->nr_scanned = 0; shrink_zones(zonelist, sc); if (sc->nr_reclaimed >= sc->nr_to_reclaim) break; if (sc->compaction_ready) break; /* * If we're getting trouble reclaiming, start doing * writepage even in laptop mode. */ if (sc->priority < DEF_PRIORITY - 2) sc->may_writepage = 1; } while (--sc->priority >= 0);
6202 lines
if (!sc->proactive) vmpressure_prio(sc->gfp_mask, sc->target_mem_cgroup, sc->priority);
이 메모리 회수가 예방? 이 아니라면 vmpressure_prio() 호출 vmpressure_prio() 는 priority가 변화시 호출하는 함수라고 주석으로 설명되어있음
 
6205 lines
sc->nr_scanned = 0; shrink_zones(zonelist, sc);
scan_control::nr_scanned() Incremented by the number of inactive pages that were scanned shrink_zones()shrink_zone() 에서 스캔한 inactive pages의 개수를 0으로 초기화 [개인적인 생각: 어차피 초기화할거면 shrink_zones() 에서 초기화하면 안되나 생각합니다. shrink_zones() 를 호출하는 함수는 여기 뿐이라 문제 없을 것 같습니다.] shrink_zones() 는 while 루프를 모두 다루고 보겠습니다
 
6208 lines
if (sc->nr_reclaimed >= sc->nr_to_reclaim) break; if (sc->compaction_ready) break;
sc->nr_reclaimed >= sc->nr_to_reclaim : 원하는만큼 inactive 페이지를 회수하였다면 (목표 달성)
sc->compaction_ready : memory compact 를 통해서 원하는 페이지를 확보할 수 있다면 (목표 달성 가능)
 
6214 lines
/* * If we're getting trouble reclaiming, start doing * writepage even in laptop mode. */ if (sc->priority < DEF_PRIORITY - 2) sc->may_writepage = 1;
잘 모르겠습니다… 주석만 보고 추측하면 리눅스에서 디스크 내용을 메모리에 캐싱하는데, 이를 페이지 캐시라고 합니다. laptop 모드에서는 원래 메모리가 부족하더라도 여기에서 줄이지 않는가 봅니다. 그래서 priority 를 2번 이상 줄여서 탐색했는데도 메모리를 확보 못하면 laptop 모드에서도 페이지 캐시를 줄이는 시도를 하게 만드나봅니다. (추측)
 

shrink_zones()

6075 lines < ?
/* * If the number of buffer_heads in the machine exceeds the maximum * allowed level, force direct reclaim to scan the highmem zone as * highmem pages could be pinning lowmem pages storing buffer_heads */ orig_mask = sc->gfp_mask; if (buffer_heads_over_limit) { sc->gfp_mask |= __GFP_HIGHMEM; sc->reclaim_idx = gfp_zone(sc->gfp_mask); }
 
6086 lines
for_each_zone_zonelist_nodemask(zone, z, zonelist, sc->reclaim_idx, sc->nodemask) { ... }
해당 foreach 매크로는 이름만 보고 거의 추측되듯, zonelist 에서 각 zone 들을 순서대로 순회하는 동작을 위해 있습니다.
mmzone.h
/** * for_each_zone_zonelist_nodemask - helper macro to iterate over valid zones in a zonelist at or below a given zone index and within a nodemask * @zone: The current zone in the iterator * @z: The current pointer within zonelist->_zonerefs being iterated * @zlist: The zonelist being iterated * @highidx: The zone index of the highest zone to return * @nodemask: Nodemask allowed by the allocator * * This iterator iterates though all zones at or below a given zone index and * within a given nodemask */ #define for_each_zone_zonelist_nodemask(zone, z, zlist, highidx, nodemask) \ for (z = first_zones_zonelist(zlist, highidx, nodemask), zone = zonelist_zone(z); \ zone; \ z = next_zones_zonelist(++z, highidx, nodemask), \ zone = zonelist_zone(z)) #define for_next_zone_zonelist_nodemask(zone, z, highidx, nodemask) \ for (zone = z->zone; \ zone; \ z = next_zones_zonelist(++z, highidx, nodemask), \ zone = zonelist_zone(z))
 
shrink_zones() 로 돌아와서…
6088 lines (for-each 내부)
/* * Take care memory controller reclaiming has small influence * to global LRU. */ if (!cgroup_reclaim(sc)) {
cgroup_reclaim 은 메모리 자원 컨트롤러(Memory Resource Controller) 세팅이 되어있지 않으면 false를 리턴 (CONFIG_MEMCG) 해당 설정이 없다면, 혹은 global reclaim 이면 if문 내로 진입합니다
/* * The memory cgroup that hit its limit and as a result is the * primary target of this reclaim invocation. */ struct mem_cgroup *target_mem_cgroup;
(위는 scan_control::target_mem_cgroup)
6093 lines
if (!cpuset_zone_allowed(zone, GFP_KERNEL | __GFP_HARDWALL)) continue;
cpuset 에서 허용하지 않는 zone이라면 스킵 cpuset은 각 task에 cpu와 memory node 를 할당하는 매커니즘 6097 lines
/* * If we already have plenty of memory free for * compaction in this zone, don't free any more. * Even though compaction is invoked for any * non-zero order, only frequent costly order * reclamation is disruptive enough to become a * noticeable problem, like transparent huge * page allocations. */ if (IS_ENABLED(CONFIG_COMPACTION) && sc->order > PAGE_ALLOC_COSTLY_ORDER && compaction_ready(zone, sc)) { sc->compaction_ready = true; continue;
CONFIG_COMPACTION 이 켜져있고 sc->order > PAGE_ALLOC_COSTLY_ORDER 이고 해당 zone이 compaction_ready 라면 (compaction 가능하다면), 해당 zone은 페이지를 free하지 않고 compact한다
6113 lines
/* * Shrink each node in the zonelist once. If the * zonelist is ordered by zone (not the default) then a * node may be shrunk multiple times but in that case * the user prefers lower zones being preserved. */ if (zone->zone_pgdat == last_pgdat) continue;
해당 zone 에서 직전 reclaim 단계(루프)에서 이미 reclaim 되었다면 스킵 6122 lines
/* * This steals pages from memory cgroups over softlimit * and returns the number of reclaimed pages and * scanned pages. This works for global memory pressure * and balancing, not for a memcg's limit. */ nr_soft_scanned = 0; nr_soft_reclaimed = mem_cgroup_soft_limit_reclaim(zone->zone_pgdat, sc->order, sc->gfp_mask, &nr_soft_scanned); sc->nr_reclaimed += nr_soft_reclaimed; sc->nr_scanned += nr_soft_scanned; /* need some check for avoid more shrink_zone() */
memory cgroup 에서 페이지를 회수한다는 듯 합니다 6147 lines
if (!first_pgdat) first_pgdat = zone->zone_pgdat; /* See comment about same check for global reclaim above */ if (zone->zone_pgdat == last_pgdat) continue; last_pgdat = zone->zone_pgdat; shrink_node(zone->zone_pgdat, sc);
/* See comment about same check for global reclaim above */ 위에서와 동일한 이유의 if 문 이제 shrink_node() 를 통해 해당 zone에서 reclaim을 수행합니다
6147 lines
if (first_pgdat) consider_reclaim_throttle(first_pgdat, sc); /* * Restore to original mask to avoid the impact on the caller if we * promoted it to __GFP_HIGHMEM. */ sc->gfp_mask = orig_mask;
여기 못봤어요! 참고 자료:
memory compact (compaction) https://woodz.tistory.com/42
 

댓글

guest