Quite some numpy routines release GIL, so they can be efficiently parallel in threads (info). Maybe you don't need to do anything special!
You can use this question to find whether the routines you need are among the ones that release GIL. In short, search for ALLOW_THREADS
or nogil
in the source.
(Also note that MKL has the ability to use multiple threads for a routine, so that's another easy way to get parallelism, although possibly not the fastest kind).