Hi Oran,
In general I would expect the API to be a bit slower as this is an additional layer of communication which takes a bit of processing time (but it also brings a lot of flexibility and other features). The exact amount of overhead would depend on the amount API calls and amount of data that has to be transferred.
That being said, the factor of three that you observe kind of surprises me, as in my experience the ray tracing it self is generally the most time consuming step. Could you share the code you used to perform the comparison, so we can have a better look?
Hi Oran,
In general I would expect the API to be a bit slower as this is an additional layer of communication which takes a bit of processing time (but it also brings a lot of flexibility and other features). The exact amount of overhead would depend on the amount API calls and amount of data that has to be transferred.
That being said, the factor of three that you observe kind of surprises me, as in my experience the ray tracing it self is generally the most time consuming step. Could you share the code you used to perform the comparison, so we can have a better look?
Hi,
Thanks for your reply, that significant difference in runtime is what made me post this case.
Have a look at the snippets bellow:
Pyhton ZOS-API
for ii, x0 in enumerate(X0):
Object_0.X = x0
for jj, x1 in enumerate(X1):
Object_1.X = x1
NSCRayTrace = TheSystem.Tools.OpenNSCRayTrace()
NSCRayTrace.SplitNSCRays = True
NSCRayTrace.UsePolarization = True
NSCRayTrace.ScatterNSCRays = False
NSCRayTrace.IgnoreErrors = True
NSCRayTrace.NumberOfCores = 3
NSCRayTrace.ClearDetectors(0)
NSCRayTrace.IgnoreErrors = True
NSCRayTrace.RunAndWaitForCompletion();
NSCRayTrace.Close();
Y[ii, jj] = TheNCE.GetDetectorData(detector_num, pixel, data, 0)[1]
ZPL
FOR II, 1, NUM_STEPS_1, 1
X_1(II) = X_I_1 + STEP_SIZE_1 * (II - 1)
SETNSCPOSITION 1, OBJ_NUM_1, POSITION_CODE_1, X_1(II)
FOR JJ, NUM_STEPS_2, 1, -1
X_2(JJ) = X_F_2 - STEP_SIZE_2 * (JJ - 1)
SETNSCPOSITION 1, OBJ_NUM_2, POSITION_CODE_2, X_2(JJ)
clear = NSDD(1, 0, 0, 0)
NSTR 1, 18, 1, 0, 1, 1, 0
RESULT(II,JJ) = NSDD(1, OUTPUT_DET_NUM, 0, 0)
PRINT RESULT(II,JJ), " ",
NEXT
NEXT
Best regards,
Oran
I think it will need someone from Zemax to really get to the bottom of it, but I suspect that the ZOS_API case is not threading properly. You have three cores, and ZPL is 3x faster...that’s where I’d look anyway.
Hi Oran,
Two things that jump out immediately:
- Python is notoriously slow with basic for loops. I would suggest to either convert to compiled code (C#) or to run your speed tests based purely off of the ZOS-API (i.e., remove the for loop and just look at tracing rays).
- I would pull the TheSystem.Tools.OpenNSCRayTrace() outside of the loops...there is no reason to reinitialize the same tool over every iteration of for loops.
I think it will need someone from Zemax to really get to the bottom of it, but I suspect that the ZOS_API case is not threading properly. You have three cores, and ZPL is 3x faster...that’s where I’d look anyway.
I Forgot to add in the ZPL snippet the operand:
SYSP 901, 3
As is the best number of cores for this use-case.
Hi Oran,
Two things that jump out immediately:
- Python is notoriously slow with basic for loops. I would suggest to either convert to compiled code (C#) or to run your speed tests based purely off of the ZOS-API (i.e., remove the for loop and just look at tracing rays).
- I would pull the TheSystem.Tools.OpenNSCRayTrace() outside of the loops...there is no reason to reinitialize the same tool over every iteration of for loops.
Thanks for your proposal the results for 1x Raytrace Runtime (1E4 rays, 3 Cores:
Python ZOS-API: 3.35 sec
ZPL: 0.87 sec
Regular UI: 1.18 sec
Since the ZPL is seemingly faster than the UI I suspect that it is because the NSTR operand operate somewhat different? The Python raytrace is yet significantly slower.
I took notice about the FOR loops and initializations.
Thanks for your responses,
Oran
Can you see if the ZOS-API ray trace is using more than one core?
Can you see if the ZOS-API ray trace is using more than one core?
Yes, definitely multi-threads are operating. If I’ll increase the #Cores parameter then even more of them would work. I chose 3 as this is optimized for this simple task, I guess its a low number since my threads are very fast.
Hi @Oran ,
Did you connect to OpticStudio in extension mode, or did you create a new standalone application? This has a significant impact on run times.
I did a few quick tests with the ZOS-API (with the connection managed by ZOSPy). I performed a single ray trace on 10 different sequential eye models, for 18 different input beams. For each model, the optical system was completely reinitialized. In standalone mode, these simulations took 10 seconds; in extension mode, they took 118 seconds. The only difference is that you see a lot of stuff happening in the GUI when connected as extension, which apparently takes most of the time.
This difference is even larger than the difference you observe. You probably don't reinitialize your model as often as I did, and I expect the non-sequential ray trace simulations to take longer than the sequential ones (which means less GUI actions overall).
@MichaelHI don't think the impact of for loops is significant here, since the calculations are offloaded to OpticStudio. It would be a different story if these were implemented in Python as well. Your point about reinitializing the NSC Ray Trace Tool is very useful; the results from my tests show that this is likely to decrease some overhead from the GUI (when connected as extension).
Hi @Oran ,
Did you connect to OpticStudio in extension mode, or did you create a new standalone application? This has a significant impact on run times.
I did a few quick tests with the ZOS-API (with the connection managed by ZOSPy). I performed a single ray trace on 10 different sequential eye models, for 18 different input beams. For each model, the optical system was completely reinitialized. In standalone mode, these simulations took 10 seconds; in extension mode, they took 118 seconds. The only difference is that you see a lot of stuff happening in the GUI when connected as extension, which apparently takes most of the time.
This difference is even larger than the difference you observe. You probably don't reinitialize your model as often as I did, and I expect the non-sequential ray trace simulations to take longer than the sequential ones (which means less GUI actions overall).
@MichaelHI don't think the impact of for loops is significant here, since the calculations are offloaded to OpticStudio. It would be a different story if these were implemented in Python as well. Your point about reinitializing the NSC Ray Trace Tool is very useful; the results from my tests show that this is likely to decrease some overhead from the GUI (when connected as extension).
Hi,
The tests was done in the Interactive extension mode, mostly because I want to be able to visually monitor the simulation once in a while, but I guess that once I finish debugging working with Standalone mode would be quicker.
Oran