I wanted to know if the multiprocessing library in Python copies the sys.modules dict.

from multiprocessing import Pool
import os
import sys

def f(_x):
    pid = os.getpid()
    print(pid, id(sys))

if __name__ == '__main__':
    with Pool(5) as p:
        p.map(f, [1, 2, 3])
➜  ~ python multi.py
73924 4458196488
73925 4458196488
73926 4458196488

Looks like the location of the sys package is at the same location in memory for each process. I wonder if we have to grab an actual module in sys.modules to see a change in location ...

from multiprocessing import Pool
import os
import sys

def f(_x):
    pid = os.getpid()
    print(pid, id(sys.modules['os']))

if __name__ == '__main__':
    with Pool(5) as p:
        p.map(f, [1, 2, 3])
➜  ~ python multi.py
74407 4366832248
74408 4366832248
74409 4366832248

Nope. No change. I wonder if there is some magic going on an all python processes of a certain flavor share memory for the core packages in sys.modules? Let's try importing something that is not automatically populated in sys.modules on startup ...

from multiprocessing import Pool
import os
import sys
import statistics

def f(_x):
    pid = os.getpid()
    print(pid, id(sys.modules['statistics']))

if __name__ == '__main__':
    with Pool(5) as p:
        p.map(f, [1, 2, 3])
➜  ~ python multi.py
74550 4465539176
74551 4465539176
74552 4465539176

Nope. sys.modules['statistics'] shares the same location in memory too. Is that because we imported it in the first python process? If we were to import statistics in a "pooled" python process, would we get a different location in memory?

from multiprocessing import Pool
import os
import sys

def f(_x):
    import statistics
    pid = os.getpid()
    print(pid, id(sys.modules['statistics']))

if __name__ == '__main__':
    with Pool(5) as p:
        p.map(f, [1, 2, 3])
➜  ~ python multi.py
74625 4526424936
74624 4526424776
74626 4526425096

Yes! Look at that! So a pool of Python processes share their "parent's" memory until something needs to be added to that process's memory. Then, I assume that the children ask for more personalized memory to contain new imports. Now I wonder, can I force a change in the location of memory of a package that one one of the pooled processes inherited?

from multiprocessing import Pool
import os
import sys
import statistics

def f(_x):
    pid = os.getpid()
    sys.modules['statistics'] = 12345678902345678
    print(pid, id(sys.modules['statistics']))

if __name__ == '__main__':
    with Pool(5) as p:
        p.map(f, [1, 2, 3])
➜  ~ python multi.py
75027 4388645936
75028 4388645936
75029 4388645936

Well, that didn't do it. But maybe it did. Maybe the sys.modules['statistics'] was changed for each pooled process equally? Maybe if we used a different number for each process, we'd get different locations in memory for sys.modules['statistics']. Let's see ...

from multiprocessing import Pool
import os
import sys
import statistics

def f(x):
    pid = os.getpid()
    n = 1234 ** x
    sys.modules['statistics'] = n
    print(pid, id(sys.modules['statistics']))

if __name__ == '__main__':
    with Pool(5) as p:
        p.map(f, [1, 2, 3])
➜  ~ python multi.py
75507 4388120016
75508 4388120144
75509 4388120080

Yup. Looks like copy on write. https://en.wikipedia.org/wiki/Copy-on-write