Encryption Projects as SU group

sdrfgs · Feb 21, 2024

30 series cards use compute level 8.6
40 series cards use compute level 8.9

The enhancements of compute level 8.9 over compute level 8.6 are as follows:

FP32 Operations: Devices of compute capability 8.9 have 2x more FP32 operations per cycle per SM than devices of compute capability 8.6 1 2. While a binary compiled for 8.0 will run as-is on 8.9, it is recommended to compile explicitly for 8.9 to benefit from the increased FP32 throughput 1 2.
Tensor Core Operations: The NVIDIA Ada GPU architecture includes new Ada Fourth Generation Tensor Cores featuring the Hopper FP8 Transformer Engine 2.
Memory System: Compute capability 8.9 has an increased L2 capacity 2.

perhaps a new version needs to be compiled that supports only compute level 8.9 and upwards to be more optimized for the 40 series?

Lak7 · Feb 21, 2024

Compiled to run on Compute Capability 8.0, 8.6, 8.9, 9.0.
When launched, it checks for the compute compatibility, and used the appropriate code. That's why the size of the build changes - the less different cards included, the smaller the app (in megabytes).
If I included all cards, the app would be around 14MB.
The only real difference between the 2 is v12_3_VS22 has the number of registers limited to 167, and v12_3_64 is set to Max (255)
Just a difference is the SM architecture from 3xxx to 4xxx. Same happened from 1xxx to 2xxx.

Lak7 · Feb 21, 2024

Test using 4xxx cards ....

CudaBiss12_3_VS22_4xxx

MediaFire is a simple to use free service that lets you put all your photos, documents, music, and video in a single place so you can access them anywhere and share them everywhere.

www.mediafire.com

SHA256: 785AB8A8E810E7433D5203DB1BCDC8F5C92C737232A4F345A6325174B1076E00

moonbase · Feb 21, 2024

Lak7 said:
The only real difference between the 2 is v12_3_VS22 has the number of registers limited to 167, and v12_3_64 is set to Max (255)
Just a difference is the SM architecture from 3xxx to 4xxx. Same happened from 1xxx to 2xxx.

The SM architecture change from 3xxx to 4xxx cards has not impacted the CudaBISS speed with an RTX 4090 card until v12_3_VS22.
With each advance in version release up to v12_3_64 the RTX ran faster and peaked with v12_3_64.
Moving on to release v12_3_VS22 was the first time a speed reduction was observed compared to the preceding version.

Could the reduced number of registers in v12_3_VS22 be causing it to run slower with an RTX 4090 compared to v12_3_64?

moonbase · Feb 21, 2024

Lak7 said:
Test using 4xxx cards ....

CudaBiss12_3_VS22_4xxx

MediaFire is a simple to use free service that lets you put all your photos, documents, music, and video in a single place so you can access them anywhere and share them everywhere.

www.mediafire.com

Thank you for sharing the new version.
Is this version specifically for 4xxx series cards or will it run on 3xxx series cards and earlier series?

Lak7 · Feb 21, 2024

moonbase said:
Thank you for sharing the new version.
Is this version specifically for 4xxx series cards or will it run on 3xxx series cards and earlier series?

It will run, just a little slower on 3xxx cards.

Lak7 · Feb 21, 2024

moonbase said:
Could the reduced number of registers in v12_3_VS22 be causing it to run slower with an RTX 4090 compared to v12_3_64?

Yes, exactly. It's a setting I forgot about.

moonbase · Feb 21, 2024

Lak7 said:
Test using 4xxx cards ....

A quick test shows that with an RTX 4090 card the new CudaBISS version of v12_3_VS22 (4xxx) runs faster than v12_3_64 with the same card.
For an RTX 4090, v12_3_VS22 (4xxx) is the fastest version to date.

Lak7 · Feb 21, 2024

moonbase said:
A quick test shows that with an RTX 4090 card the new CudaBISS version of v12_3_VS22 (4xxx) runs faster than v12_3_64 with the same card.
For an RTX 4090, v12_3_VS22 (4xxx) is the fastest version to date.

About 10% or so?

moonbase · Feb 21, 2024

Lak7 said:
About 10% or so?

Less than 10%

sdrfgs · Feb 23, 2024

4xxx also confirmed slower than the vs22 for 3000 series

spot · Feb 26, 2024

May I ask which Cuda biss Build you use for 3060 Ti card ? If so where could I finf it,,Thank you

moonbase · Feb 26, 2024

spot said:
May I ask which Cuda biss Build you use for 3060 Ti card ? If so where could I finf it,,Thank you

Reply 256 of this topic contains a link to the 3xxx card build.
Will you be reporting back to share your results?

spot · Feb 26, 2024

moonbase said:
Reply 256 of this topic contains a link to the 3xxx card build.
Will you be reporting back to share your results?

I saw that before but the link doesn't work for me. Thanks I found a file to use elsewhere. About 4.8B running 2 instances.

sdrfgs · Feb 27, 2024

spot said:
I saw that before but the link doesn't work for me. Thanks I found a file to use elsewhere. About 4.8B running 2 instances.

please breakdown your result with more info
system specs etc

sdrfgs · Feb 27, 2024

sdrfgs said:
please breakdown your result with more info
system specs etc

including the input file used

spot · Feb 29, 2024

I'm using a Intel i7-12700k/Msi z690 pro-a /32 gig ram /msi 3060 ti / corsair 850 PS.
109999999999
C00000000000
474065926C7C1938C608F122FF3AE37E
4740659CAED1E977D78AB1AE1BCF68C4
4740659AF73EB8CA49C814B3B4599447
1
1

moonbase · Feb 29, 2024

spot said:
I'm using a Intel i7-12700k/Msi z690 pro-a /32 gig ram /msi 3060 ti / corsair 850 PS.

That PC spec is probably as fast as there is for CudaBISS. The GPU can run at full PCIe 4.0 speed as the board should have PCIe 5.0 speed on PCIe x16 slot 1.

sdrfgs · Feb 29, 2024

Thanks for the extra data, What speed is your ram?

sdrfgs · Mar 7, 2024

Well this thread has died again ,iis their nothing to report?