mirror of
				git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
				synced 2025-09-04 20:19:47 +08:00 
			
		
		
		
	 3a7452c5a7
			
		
	
	
		3a7452c5a7
		
	
	
	
	
		
			
			Let's document the magic a bit, especially why device_hotplug_lock is required when adding/removing memory and how it all play together with requests to online/offline memory from user space. [ rppt: moved the text to Documentation/core-api/memory-hotplug.rst ] Link: http://lkml.kernel.org/r/20180925091457.28651-7-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Pavel Tatashin <pavel.tatashin@microsoft.com> Reviewed-by: Rashmica Gupta <rashmica.g@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michal Hocko <mhocko@suse.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: John Allen <jallen@linux.vnet.ibm.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Juergen Gross <jgross@suse.com> Cc: Kate Stewart <kstewart@linuxfoundation.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Len Brown <lenb@kernel.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Mathieu Malaterre <malat@debian.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michael Neuling <mikey@neuling.org> Cc: Nathan Fontenot <nfont@linux.vnet.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Philippe Ombredanne <pombredanne@nexb.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: YASUAKI ISHIMATSU <yasu.isimatu@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
		
			
				
	
	
		
			126 lines
		
	
	
		
			4.7 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			126 lines
		
	
	
		
			4.7 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| .. _memory_hotplug:
 | |
| 
 | |
| ==============
 | |
| Memory hotplug
 | |
| ==============
 | |
| 
 | |
| Memory hotplug event notifier
 | |
| =============================
 | |
| 
 | |
| Hotplugging events are sent to a notification queue.
 | |
| 
 | |
| There are six types of notification defined in ``include/linux/memory.h``:
 | |
| 
 | |
| MEM_GOING_ONLINE
 | |
|   Generated before new memory becomes available in order to be able to
 | |
|   prepare subsystems to handle memory. The page allocator is still unable
 | |
|   to allocate from the new memory.
 | |
| 
 | |
| MEM_CANCEL_ONLINE
 | |
|   Generated if MEM_GOING_ONLINE fails.
 | |
| 
 | |
| MEM_ONLINE
 | |
|   Generated when memory has successfully brought online. The callback may
 | |
|   allocate pages from the new memory.
 | |
| 
 | |
| MEM_GOING_OFFLINE
 | |
|   Generated to begin the process of offlining memory. Allocations are no
 | |
|   longer possible from the memory but some of the memory to be offlined
 | |
|   is still in use. The callback can be used to free memory known to a
 | |
|   subsystem from the indicated memory block.
 | |
| 
 | |
| MEM_CANCEL_OFFLINE
 | |
|   Generated if MEM_GOING_OFFLINE fails. Memory is available again from
 | |
|   the memory block that we attempted to offline.
 | |
| 
 | |
| MEM_OFFLINE
 | |
|   Generated after offlining memory is complete.
 | |
| 
 | |
| A callback routine can be registered by calling::
 | |
| 
 | |
|   hotplug_memory_notifier(callback_func, priority)
 | |
| 
 | |
| Callback functions with higher values of priority are called before callback
 | |
| functions with lower values.
 | |
| 
 | |
| A callback function must have the following prototype::
 | |
| 
 | |
|   int callback_func(
 | |
|     struct notifier_block *self, unsigned long action, void *arg);
 | |
| 
 | |
| The first argument of the callback function (self) is a pointer to the block
 | |
| of the notifier chain that points to the callback function itself.
 | |
| The second argument (action) is one of the event types described above.
 | |
| The third argument (arg) passes a pointer of struct memory_notify::
 | |
| 
 | |
| 	struct memory_notify {
 | |
| 		unsigned long start_pfn;
 | |
| 		unsigned long nr_pages;
 | |
| 		int status_change_nid_normal;
 | |
| 		int status_change_nid_high;
 | |
| 		int status_change_nid;
 | |
| 	}
 | |
| 
 | |
| - start_pfn is start_pfn of online/offline memory.
 | |
| - nr_pages is # of pages of online/offline memory.
 | |
| - status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemask
 | |
|   is (will be) set/clear, if this is -1, then nodemask status is not changed.
 | |
| - status_change_nid_high is set node id when N_HIGH_MEMORY of nodemask
 | |
|   is (will be) set/clear, if this is -1, then nodemask status is not changed.
 | |
| - status_change_nid is set node id when N_MEMORY of nodemask is (will be)
 | |
|   set/clear. It means a new(memoryless) node gets new memory by online and a
 | |
|   node loses all memory. If this is -1, then nodemask status is not changed.
 | |
| 
 | |
|   If status_changed_nid* >= 0, callback should create/discard structures for the
 | |
|   node if necessary.
 | |
| 
 | |
| The callback routine shall return one of the values
 | |
| NOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOP
 | |
| defined in ``include/linux/notifier.h``
 | |
| 
 | |
| NOTIFY_DONE and NOTIFY_OK have no effect on the further processing.
 | |
| 
 | |
| NOTIFY_BAD is used as response to the MEM_GOING_ONLINE, MEM_GOING_OFFLINE,
 | |
| MEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stops
 | |
| further processing of the notification queue.
 | |
| 
 | |
| NOTIFY_STOP stops further processing of the notification queue.
 | |
| 
 | |
| Locking Internals
 | |
| =================
 | |
| 
 | |
| When adding/removing memory that uses memory block devices (i.e. ordinary RAM),
 | |
| the device_hotplug_lock should be held to:
 | |
| 
 | |
| - synchronize against online/offline requests (e.g. via sysfs). This way, memory
 | |
|   block devices can only be accessed (.online/.state attributes) by user
 | |
|   space once memory has been fully added. And when removing memory, we
 | |
|   know nobody is in critical sections.
 | |
| - synchronize against CPU hotplug and similar (e.g. relevant for ACPI and PPC)
 | |
| 
 | |
| Especially, there is a possible lock inversion that is avoided using
 | |
| device_hotplug_lock when adding memory and user space tries to online that
 | |
| memory faster than expected:
 | |
| 
 | |
| - device_online() will first take the device_lock(), followed by
 | |
|   mem_hotplug_lock
 | |
| - add_memory_resource() will first take the mem_hotplug_lock, followed by
 | |
|   the device_lock() (while creating the devices, during bus_add_device()).
 | |
| 
 | |
| As the device is visible to user space before taking the device_lock(), this
 | |
| can result in a lock inversion.
 | |
| 
 | |
| onlining/offlining of memory should be done via device_online()/
 | |
| device_offline() - to make sure it is properly synchronized to actions
 | |
| via sysfs. Holding device_hotplug_lock is advised (to e.g. protect online_type)
 | |
| 
 | |
| When adding/removing/onlining/offlining memory or adding/removing
 | |
| heterogeneous/device memory, we should always hold the mem_hotplug_lock in
 | |
| write mode to serialise memory hotplug (e.g. access to global/zone
 | |
| variables).
 | |
| 
 | |
| In addition, mem_hotplug_lock (in contrast to device_hotplug_lock) in read
 | |
| mode allows for a quite efficient get_online_mems/put_online_mems
 | |
| implementation, so code accessing memory can protect from that memory
 | |
| vanishing.
 |