Program speed of ZPL vs ZOS-API

  • 12 July 2023
  • 9 replies
  • 217 views

Userlevel 2

Hello,

 

During my efforts to speed up my simulation of moderately complex NSC system I noticed that if I write two identical raytracing programs in ZPL and Python ZOS-API the one written in ZPL will run much faster.

 

For test case I wrote a program to move 2 elements and than raytrace in a nested loop.

# of Analysis Rays: 1E4

# Cores: 3

ZOS-API Runtime: 220 seconds

ZPL Runtime: 70 seconds

 

My machine spec:

CPU: i9-13900KF

GPU GEFORE RTX 4080

RAM: 128GB

 

Did you notice such behavior as well? what is the cause for this difference?


9 replies

Userlevel 3

Hi Oran,

In general I would expect the API to be a bit slower as this is an additional layer of communication which takes a bit of processing time (but it also brings a lot of flexibility and other features). The exact amount of overhead would depend on the amount API calls and amount of data that has to be transferred.

That being said, the factor of three that you observe kind of surprises me, as in my experience the ray tracing it self is generally the most time consuming step. Could you share the code you used to perform the comparison, so we can have a better look?

Userlevel 2

Hi Oran,

In general I would expect the API to be a bit slower as this is an additional layer of communication which takes a bit of processing time (but it also brings a lot of flexibility and other features). The exact amount of overhead would depend on the amount API calls and amount of data that has to be transferred.

That being said, the factor of three that you observe kind of surprises me, as in my experience the ray tracing it self is generally the most time consuming step. Could you share the code you used to perform the comparison, so we can have a better look?

Hi,

Thanks for your reply, that significant difference in runtime is what made me post this case.

Have a look at the snippets bellow:

Pyhton ZOS-API

for ii, x0 in enumerate(X0):
Object_0.X = x0
for jj, x1 in enumerate(X1):
Object_1.X = x1
NSCRayTrace = TheSystem.Tools.OpenNSCRayTrace()
NSCRayTrace.SplitNSCRays = True
NSCRayTrace.UsePolarization = True
NSCRayTrace.ScatterNSCRays = False
NSCRayTrace.IgnoreErrors = True
NSCRayTrace.NumberOfCores = 3
NSCRayTrace.ClearDetectors(0)
NSCRayTrace.IgnoreErrors = True
NSCRayTrace.RunAndWaitForCompletion();
NSCRayTrace.Close();
Y[ii, jj] = TheNCE.GetDetectorData(detector_num, pixel, data, 0)[1]

ZPL

FOR II, 1, NUM_STEPS_1, 1
X_1(II) = X_I_1 + STEP_SIZE_1 * (II - 1)

SETNSCPOSITION 1, OBJ_NUM_1, POSITION_CODE_1, X_1(II)
FOR JJ, NUM_STEPS_2, 1, -1
X_2(JJ) = X_F_2 - STEP_SIZE_2 * (JJ - 1)
SETNSCPOSITION 1, OBJ_NUM_2, POSITION_CODE_2, X_2(JJ)

clear = NSDD(1, 0, 0, 0)
NSTR 1, 18, 1, 0, 1, 1, 0
RESULT(II,JJ) = NSDD(1, OUTPUT_DET_NUM, 0, 0)
PRINT RESULT(II,JJ), " ",
NEXT
NEXT

 

Best regards,

Oran

Userlevel 7
Badge +3

I think it will need someone from Zemax to really get to the bottom of it, but I suspect that the ZOS_API case is not threading properly. You have three cores, and ZPL is 3x faster...that’s where I’d look anyway.

 

  • Mark
Userlevel 6
Badge +2

Hi Oran,

Two things that jump out immediately:

  • Python is notoriously slow with basic for loops.  I would suggest to either convert to compiled code (C#) or to run your speed tests based purely off of the ZOS-API (i.e., remove the for loop and just look at tracing rays).  
  • I would pull the TheSystem.Tools.OpenNSCRayTrace() outside of the loops...there is no reason to reinitialize the same tool over every iteration of for loops.
Userlevel 2

I think it will need someone from Zemax to really get to the bottom of it, but I suspect that the ZOS_API case is not threading properly. You have three cores, and ZPL is 3x faster...that’s where I’d look anyway.

 

  • Mark

 

I Forgot to add in the ZPL snippet the operand:

SYSP 901, 3

As is the best number of cores for this use-case.

 

 

Hi Oran,

Two things that jump out immediately:

  • Python is notoriously slow with basic for loops.  I would suggest to either convert to compiled code (C#) or to run your speed tests based purely off of the ZOS-API (i.e., remove the for loop and just look at tracing rays).  
  • I would pull the TheSystem.Tools.OpenNSCRayTrace() outside of the loops...there is no reason to reinitialize the same tool over every iteration of for loops.

Thanks for your proposal the results for 1x Raytrace Runtime (1E4 rays, 3 Cores:

Python ZOS-API: 3.35 sec

ZPL: 0.87 sec

Regular UI: 1.18 sec

 

Since the ZPL is seemingly faster than the UI I suspect that it is because the NSTR operand operate somewhat different? The Python raytrace is yet significantly slower.

I took notice about the FOR loops and initializations.

 

Thanks for your responses,

Oran

Userlevel 7
Badge +3

Can you see if the ZOS-API ray trace is using more than one core?

Userlevel 2

Can you see if the ZOS-API ray trace is using more than one core?

Yes, definitely multi-threads are operating. If I’ll increase the #Cores parameter then even more of them would work. I chose 3 as this is optimized for this simple task, I guess its a low number since my threads are very fast.

Userlevel 3
Badge

Hi @Oran ,

Did you connect to OpticStudio in extension mode, or did you create a new standalone application? This has a significant impact on run times.

I did a few quick tests with the ZOS-API (with the connection managed by ZOSPy). I performed a single ray trace on 10 different sequential eye models, for 18 different input beams. For each model, the optical system was completely reinitialized. In standalone mode, these simulations took 10 seconds; in extension mode, they took 118 seconds. The only difference is that you see a lot of stuff happening in the GUI when connected as extension, which apparently takes most of the time.

This difference is even larger than the difference you observe. You probably don't reinitialize your model as often as I did, and I expect the non-sequential ray trace simulations to take longer than the sequential ones (which means less GUI actions overall).

@MichaelHI don't think the impact of for loops is significant here, since the calculations are offloaded to OpticStudio. It would be a different story if these were implemented in Python as well. Your point about reinitializing the NSC Ray Trace Tool is very useful; the results from my tests show that this is likely to decrease some overhead from the GUI (when connected as extension).

Userlevel 2

Hi @Oran ,

Did you connect to OpticStudio in extension mode, or did you create a new standalone application? This has a significant impact on run times.

I did a few quick tests with the ZOS-API (with the connection managed by ZOSPy). I performed a single ray trace on 10 different sequential eye models, for 18 different input beams. For each model, the optical system was completely reinitialized. In standalone mode, these simulations took 10 seconds; in extension mode, they took 118 seconds. The only difference is that you see a lot of stuff happening in the GUI when connected as extension, which apparently takes most of the time.

This difference is even larger than the difference you observe. You probably don't reinitialize your model as often as I did, and I expect the non-sequential ray trace simulations to take longer than the sequential ones (which means less GUI actions overall).

@MichaelHI don't think the impact of for loops is significant here, since the calculations are offloaded to OpticStudio. It would be a different story if these were implemented in Python as well. Your point about reinitializing the NSC Ray Trace Tool is very useful; the results from my tests show that this is likely to decrease some overhead from the GUI (when connected as extension).

 

Hi,

The tests was done in the Interactive extension mode, mostly because I want to be able to visually monitor the simulation once in a while, but I guess that once I finish debugging working with Standalone mode would be quicker. 

 

Oran

Reply