Solved

CPU usage is low during optimization


Question:

Why doesn't OpticStudio utilize 100% of my CPU during optimization to speed up the process?

icon

Best answer by Zach Derocher 8 May 2019, 22:52

View original

3 replies

Answer:

This is reasonably common and usually not a problem. There are a couple of reasons this might be, and the exact answer depends largely on the machine hardware, the .ZMX file itself, the number of variables, and the number and complexity of Merit Function Operands. The important thing to remember is that OpticStudio is always trying to use your resources as efficiently as possible (and this may mean using less than 100% of the available cores).

Firstly, based on the .ZMX file in question and the hardware, there may be memory limitations which prevent or would slow down the system if all cores were utilized. There is overhead in multi-threading; for each core used in optimization, OpticStudio has to copy over and store the optical system in memory. If you have a memory-intensive system (some complex CAD objects, a high-density grid sag surface, etc.) then it might be slow to create this copy, or it is possible that there's simply not have enough system memory for each of the cores. 

Secondly, it's important to consider how the optimizer is threaded. In optimization, OpticStudio will only use as many cores as you have variables assigned. So, if you have a 16 core machine but only 3 optimization variables, ZOS will only ever use 3 (at most) cores for the optimization.

Finally, the complexity of the Merit Function is important. As I mentioned before, there is overhead in launching threads. If the Merit Function is simple and easy to compute (i.e. a Gaussian Quadrature Wavefront MF might be a few dozen or hundred ray traces), then it might be more efficient to simply run the optimization on a single core or just a couple cores. 

Over the past few years, there have been a couple of releases in which there were improvements to the algorithm to decide how many threads to launch for optimization. For this reason, you might observe differences in CPU usage for the same file (and MF, variables) between versions. However, it should be the case that optimization in the newer versions is more efficient and runs more quickly than older iterations. 

Not sure where to put this information, but this seems to fit with the discussion.

I have an AMD Ryzen Threadripper 3990X 64-Core Processer with 128 threads. I opened two instances of a file and started to hammer optimize on both at the same time thinking I would get around the 64 thread limitation of OpticStudio. I noticed that I was only getting up to 67% processor utilization. When I looked at the NUMA nodes I saw that both instances of OpticStudio were being run on the same NUMA node.

After some fiddling around I found that if I run OpticStudio as an administrator then I can change the affinity of the two instances (in Task Manager → Details) with one assigned to node 0 and the other to node 1, all cores. I believe you have to start hammer optimizing one of the instances first and then change the affinity of the second one before starting to hammer optimize it also. Unfortunately once you start hammer optimizing you will no longer have access to change the affinity so don’t stop the hammer optimizing until you’re done! If you do stop then when you go to start hammer optimizing again, that instance of OpticStudio will be reassigned to the same NUMA node as the other instance.

Is there a way Zemax could fix this?

Anyway, for deep optimization this isn’t too much of a pain. Currently running at 94% utilization for the CPU, 64 threads on NUMA node 0 and 42 threads on NUMA node 1. Easily could have done 64 and 64, but wanted to leave some threads open for other tasks.

 

Update on utilizing 100% of a 64 Core/ 128 Thread CPU during Hammer Optimization:

The exact procedure to get this to work every time is as follows.

  1. Open Task Manager on the Details tab, sort by Name.
  2. Run two instances of Zemax as an administrator for the same design (recommend renaming one of the files something else).
  3. Start hammer optimization on both instances for 64 threads in rapid succession.
  4. In Task Manager on the Details tab, right click on the second instance of OpticStudio and click Set Affinity (not sure if it matters if you click the first or second instance first?).
  5. Set Affinity to NUMA Node 1 and click all cores.
  6. Go to the Task Manager Performance tab.
  7. Right click and “Change graph to” NUMA nodes. You should see both NUMA nodes being fully utilized.

Notes:

  • As soon as both instances of OpticStudio are loaded you have roughly one minute before you won’t be able to Set Affinity on either instance. Don’t know why, but it’s something I observed.
  • If you would like to more actively optimize with one of the instances you will need to identify which instance is on NUMA node 0, preferably before you start hammer optimizing (this should be the second instance you opened).
    • Feel free to start and stop optimizing on the instance assigned to NUMA node 0 as much as you like. As soon as you stop hammer optimizing on the instance assigned to NUMA node 1 you will need to close out of both instances of OpticStudio and redo all the steps above to utilize 100% of the CPU again.
  • I have only done this with a Merit Function that takes several seconds for all 64 threads to start optimizing. On simpler merit functions this procedure may not be possible and/or beneficial.
  • This may take a bit of trial and error to be setting affinity for the correct instance on the first time. But the benefits are significant. I improved my CPU utilization from 67% (overclocking first 64 threads) to utilizing 100%. Deep optimizations that take 12+ hours are now searching through 50% more designs.

Best of luck!

 

Reply