GPGPU using Opencl

Me2019H

Registered
Messages
101


My changes

#Selection of Device
DETECTDEVICEENABLE:1
SELECTEDDEVICE:10
LOOPSPERTHREAD:1024 16384 #128 1024 2048 4096 8192 16384 32768 65536
LOCALTHREADS:256 64 0 256
GLOBALTHREADS:1536 1536

Reading from OCLBiss.cfg...Done...


Inspecting System for OpenCL devices...Platforms found: 2
00 Intel(R) Corporation Intel(R) HD Graphics 4400 GPU
01 Intel(R) Corporation Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz CPU
10 NVIDIA Corporation GeForce 820M GPU


Today is Fri Sep 15 16:15:10 2023
Connected to device:NVIDIA Corporation => GeForce 820M

Device Kernel properties:
number of cores: 2
recommended work group size (local threads): 32
max work group size: 512


Number Loops per thread: 1024
Number of keys per thread(x32/4096): 1024
Local threads: 256
Global threads: 1536
Keys per kernel: 0x180000(1572864)

*****Important: To stop program and save current status*******
PRESS a LONG [ESC] Key and then say 'Y'

Scroll Lock LED in KeyBoard will be used to Signal Key Found event.
Please ensure it is off before continuing

File Log from Last Searched present: 11C16F204790101
Do you what to continue from here?
If you type 'N' or 'n' will start from config.ini value: 0000385F624F0000
Press Enter to Continue (Y)y

base = 385F904F0000
Looking for 0x385F62F94FCABCD5

BruteForcing for:
SB01: 3C EB DC 17 3C 2B D6 4F 65 16 88 F2 58 D5 97 05 3C EB DC 17 3C 2B D6 4F 65 16 88 F2 58 D5 97 05

Range: 0000000000180000

Loop From To kernel Time Keys per seconds #Keys Found
00000001 0000385F904F0000 0000385F9066FFFF [...] 16:15:16 kps:2.06e+006 00000000 Total 0:
00000002 0000385F90670000 0000385F907EFFFF [...] 16:15:17 kps:2.35e+006 00000000 Total 0:
00000003 0000385F907F0000 0000385F9096FFFF [...] 16:15:17 kps:2.62e+006 00000000 Total 0:
00000004 0000385F90970000 0000385F90AEFFFF [...] 16:15:18 kps:2.68e+006 00000000 Total 0:
00000005 0000385F90AF0000 0000385F90C6FFFF [...] 16:15:19 kps:2.67e+006 00000000 Total 0:
00000006 0000385F90C70000 0000385F90DEFFFF [...] 16:15:19 kps:2.68e+006 00000000 Total 0:
00000007 0000385F90DF0000 0000385F90F6FFFF [...] 16:15:20 kps:2.67e+006 00000000 Total 0:
00000008 0000385F90F70000 0000385F910EFFFF [...] 16:15:20 kps:2.68e+006 00000000 Total 0:
00000009 0000385F910F0000 0000385F9126FFFF [...] 16:15:21 kps:2.67e+006 00000000 Total 0:
0000000A 0000385F91270000 0000385F913EFFFF [...] 16:15:22 kps:2.68e+006 00000000 Total 0:
0000000B 0000385F913F0000 0000385F9156FFFF [...] 16:15:22 kps:2.68e+006 00000000 Total 0:
0000000C 0000385F91570000 0000385F916EFFFF [...] 16:15:23 kps:2.68e+006 00000001 Key 001:38 5F 91 28 64 D7 7A B5

0000000D 0000385F916F0000 0000385F9186FFFF [...] 16:15:23 kps:2.68e+006 00000000 Total 1:
0000000E 0000385F91870000 0000385F919EFFFF [...] 16:15:24 kps:2.68e+006 00000000 Total 1:
0000000F 0000385F919F0000 0000385F91B6FFFF [...] 16:15:25 kps:2.67e+006 00000000 Total 1:
00000010 0000385F91B70000 0000385F91CEFFFF [...] 16:15:25 kps:2.67e+006 00000000 Total 1:
00000011 0000385F91CF0000 0000385F91E6FFFF [...] 16:15:26 kps:2.68e+006 00000000 Total 1:
00000012 0000385F91E70000 0000385F91FEFFFF [...] 16:15:26 kps:2.68e+006 00000000 Total 1:
00000013 0000385F91FF0000 0000385F9216FFFF [...] 16:15:27 kps:2.68e+006 00000000 Total 1:

***********************************************************************************
Paused: You pressed [ESC] Key. Do you Really want to Quit? PRESS 'Y' or 'y'


I tried LOOPSPERTHREAD:8192 16384 #128 1024 2048 4096 8192 16384 32768 65536

it works but after 4 or more sec the program hang
 

C0der

Senior Member
Messages
270
Lets compare:
820M: 96 CUDA cores 700 MHz kps:2.68e+006 -> 25074 clocks/key
4090: 16384 CUDA cores 2235 MHz kps:5.711+008 -> 64242 clocks/key
 

cayoenrique

Senior Member
Messages
476
C0der he Me2019H is running the Test program. It does not do CSA it only adds a set of64 random numbers. It was just a test I posted at the beginning to show how my opencl skeleton/template can be modified to run any kind of program. Do not worry he may test soon the real program.

Now the question, I assume that Nvidia 820M should be a lot faster that HD4400. Does it?

Now I know the logic in my program is not totally correct. Where you see it say:
Code:
number of cores: 								6
That is incorrect it should read compute units.
See there is no query command that will report cores/Shading Units
Instead the only we have is Compute Units/ TMOS. Conclusion without manually adding how How many cores per CU there is no easy way to do a better estimate of how global threads to launch. I need to add this function in my config. And recently I found that we can also send NULL and GPU will chose values for us. Now NULL is not valid for a split kernel like CSA, as there will be no warranties that GPU will assign equal total of threads. They need to be equal in order to xor them!!

Now look as
Code:
Intel HD Graphics 4400
Render Config
Shading Units    	160 
TMUs    			20 
ROPs				2 
Execution Units    	20 
Theoretical Performance
Pixel Rate    		2.300 GPixel/s 
Texture Rate    	23.00 GTexel/s 
FP32 (float)    	368.0 GFLOPS 
FP64 (double)		92.00 GFLOPS (1:4)

Code:
Dell GeForce 820M 2 GB
Render Config
Shading Units    	96 
TMUs    			16 
ROPs    			8 
SM Count    		2 
L1 Cache    		64 KB (per SM) 
L2 Cache    		128 KB 
Theoretical Performance
Pixel Rate    		2.500 GPixel/s 
Texture Rate    	10.00 GTexel/s 
FP32 (float)    	240.0 GFLOPS 
FP64 (double)    	20.00 GFLOPS (1:12)

It seems Intel GPU inside CPU is more powerful than the Discrete alternative the Nvidia.
Why will a company add and extra costly part in a laptop, if it will provide less power that the default it has?
 

cayoenrique

Senior Member
Messages
476
The last one I posted is

OCL_TEST_02.zip (55.06 KB)

Code:
https://workupload.com/file/QQbyQ6fSVb6

Code:
LOOPSPERTHREAD:4096 1024 16384   #128 1024 2048 4096 8192 16384 32768 65536
LOCALTHREADS:0 64 0 256
GLOBALTHREADS:0 1536

You GPU do not have many cores.
1rst you test as it is?
If you see rounds are slower that 1 per second then start lowering LOOPSPERTHREAD. I always do it in multiple of 2. But this kernel accepts any value.
Here a god rule table multiple of 2 # 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536

Once you have a cadence, you look at the screen out out, just as your images.

Code:
Local threads:                        32
Global threads:                     1024

That screen gave you an Idea what are the values sugested by the GPU. Now you go to config and start playing with those

The you can start putting values in and re run program
Code:
LOCALTHREADS: 32
GLOBALTHREADS:4096

NOTE: when you change config you DO NOT NEED to recompile program. The config is there for reading by the same program.

As you see, you need to play with those to get you GPU busy enoght so that final KPS gets bigger that means it is faster.

Now your Nvidia GPU have only 96 coreswitch is a multiple of 32. I guess 32 is a good multiple in the hope that all 96 will do work.


In the other hand, if you install nvidia drivers from DELL. You may get an option to switch GPU. If you plan to use NVIDIA, you tell your driver to use INTEL. So Nvidia is FREE for use ONLY by you.

But if in reality Intell HD4400 is faster. Then you go to your Nvidia driver and tell PC to use Nvidai for Screen so tha Intell is FREE for you to use.


If you use same GPU to both Screen and OpenCL you will notice that when your Opencl take longer the GPU will reset BLACK SCREEN then the OoenCL program will be Kill.
 
Last edited:

Me2019H

Registered
Messages
101
when i run the OCLBiss.exe(OCLBiss_014) it use the cpu!!
when i changed
#Selection of Device
DETECTDEVICEENABLE:1
SELECTEDDEVICE:10
Reading from OCLBiss.cfg...Done...


Inspecting System for OpenCL devices...Platforms found: 2
00 Intel(R) Corporation Intel(R) HD Graphics 4400 GPU
01 Intel(R) Corporation Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz CPU
10 NVIDIA Corporation GeForce 820M GPU


Today is Sat Sep 16 00:50:37 2023
Connected to device:NVIDIA Corporation => GeForce 820M
<kernel>:195:13: error: 'long long' type is not supported
t = A & 0x2018004200LL;
^
<kernel>:198:13: error: 'long long' type is not supported
t = A & 0x4201480000LL;
^
<kernel>:201:13: error: 'long long' type is not supported
t = A & 0x8040122000LL;
^
<kernel>:204:13: error: 'long long' type is not supported
t = A & 0x1082010040LL;
^
<kernel>:207:13: error: 'long long' type is not supported
t = A & 0x0004a00180LL;
^
<kernel>:210:13: error: 'long long' type is not supported
t = A & 0x0100048820LL;
^
<kernel>:213:13: error: 'long long' type is not supported
t = A & 0x0c20001400LL;
^


Process returned 1 (0x1) execution time : 1.812 s
Press any key to continue.
When i run it as it is i saw Black screen and it stopped
Reading from OCLBiss.cfg...Done...

Today is Sat Sep 16 00:58:49 2023
Connected to device:Intel(R) Corporation => Intel(R) HD Graphics 4400

Device Kernel properties:
number of cores: 20
recommended work group size (local threads): 8
max work group size: 512


Number Loops per thread: 4096
Number of keys per thread: 4096
Local threads: 512
Global threads: 10240
Keys per kernel: 0x2800000(41943040)

*****Important: To stop program and save current status*******
PRESS a LONG [ESC] Key and then say 'Y'

Scroll Lock LED in KeyBoard will be used to Signal Key Found event.
Please ensure it is off before continuing

File Log from Last Searched present: 11C16EC8A790101
Do you what to continue from here?
If you type 'N' or 'n' will start from config.ini value: 0000385F624F0000
Press Enter to Continue (Y)y

base = 385F68AF0000
Looking for 0x385F62F94FCABCD5

BruteForcing for:
SB01: 3C EB DC 17 3C 2B D6 4F 65 16 88 F2 58 D5 97 05

Range: 0000000002800000

Loop From To kernel Time Keys per seconds #Keys Found
00000001 0000385F68AF0000 0000385F6B2EFFFF [...] 00:59:11 kps:3.52e+006 00000002 Key 001:38 5F 69 00 80 07 67 EE
Key 002:38 5F 69 00 D5 E2 D4 8B

00000002 0000385F6B2F0000 0000385F6DAEFFFF [...] 00:59:23 kps:3.52e+006 00000001 Key 001:38 5F 6B 02 3B 33 8D FB

00000003 0000385F6DAF0000 0000385F702EFFFF [...] 00:59:35 kps:3.52e+006 00000001 Key 001:38 5F 6F 06 AB 2C 37 0E

00000004 0000385F702F0000 0000385F72AEFFFF [..

I lowed LOOPSPERTHREAD to 128 it is the best for me

Reading from OCLBiss.cfg...Done...

Today is Sat Sep 16 01:11:01 2023
Connected to device:Intel(R) Corporation => Intel(R) HD Graphics 4400

Device Kernel properties:
number of cores: 20
recommended work group size (local threads): 8
max work group size: 512


Number Loops per thread: 128
Number of keys per thread: 128
Local threads: 512
Global threads: 10240
Keys per kernel: 0x140000(1310720)

*****Important: To stop program and save current status*******
PRESS a LONG [ESC] Key and then say 'Y'

Scroll Lock LED in KeyBoard will be used to Signal Key Found event.
Please ensure it is off before continuing

File Log from Last Searched present: 11C16EC8A790101
Do you what to continue from here?
If you type 'N' or 'n' will start from config.ini value: 0000385F624F0000
Press Enter to Continue (Y)y

base = 385F68AF0000
Looking for 0x385F62F94FCABCD5

BruteForcing for:
SB01: 3C EB DC 17 3C 2B D6 4F 65 16 88 F2 58 D5 97 05

Range: 0000000000140000

Loop From To kernel Time Keys per seconds #Keys Found
00000001 0000385F68AF0000 0000385F68C2FFFF [...] 01:11:06 kps:3.98e+006 00000000 Total 0:
00000002 0000385F68C30000 0000385F68D6FFFF [...] 01:11:06 kps:3.99e+006 00000000 Total 0:
00000003 0000385F68D70000 0000385F68EAFFFF [...] 01:11:06 kps:4.36e+006 00000000 Total 0:
00000004 0000385F68EB0000 0000385F68FEFFFF [...] 01:11:07 kps:4.31e+006 00000000 Total 0:
00000005 0000385F68FF0000 0000385F6912FFFF [...] 01:11:07 kps:3.96e+006 00000000 Total 0:
00000006 0000385F69130000 0000385F6926FFFF [...] 01:11:07 kps:4.03e+006 00000000 Total 0:
00000007 0000385F69270000 0000385F693AFFFF [...] 01:11:08 kps:4.05e+006 00000000 Total 0:
00000008 0000385F693B0000 0000385F694EFFFF [...] 01:11:08 kps:4.69e+006 00000000 Total 0:
00000009 0000385F694F0000 0000385F6962FFFF [...] 01:11:08 kps:3.98e+006 00000000 Total 0:
0000000A 0000385F69630000 0000385F6976FFFF [...] 01:11:08 kps:4.04e+006 00000000 Total 0:
0000000B 0000385F69770000 0000385F698AFFFF [...] 01:11:09 kps:4.04e+006 00000001 Key 001:38 5F 69 00 80 07 67 EE

0000000C 0000385F698B0000 0000385F699EFFFF [...] 01:11:09 kps:4.34e+006 00000000 Total 1:
0000000D 0000385F699F0000 0000385F69B2FFFF [...] 01:11:09 kps:4.69e+006 00000000 Total 1:
0000000E 0000385F69B30000 0000385F69C6FFFF [...] 01:11:10 kps:4.33e+006 00000000 Total 1:
0000000F 0000385F69C70000 0000385F69DAFFFF [...] 01:11:10 kps:3.97e+006 00000001 Key 001:38 5F 69 00 D5 E2 D4 8B

00000010 0000385F69DB0000 0000385F69EEFFFF [...] 01:11:10 kps:4.35e+006 00000000 Total 2:
00000011 0000385F69EF0000 0000385F6A02FFFF [...] 01:11:11 kps:4.69e+006 00000000 Total 2:
00000012 0000385F6A030000 0000385F6A16FFFF [...] 01:11:11 kps:3.95e+006 00000000 Total 2:
00000013 0000385F6A170000 0000385F6A2AFFFF [...] 01:11:11 kps:3.97e+006 00000000 Total 2:
00000014 0000385F6A2B0000 0000385F6A3EFFFF [...] 01:11:12 kps:3.98e+006 00000000 Total 2:
00000015 0000385F6A3F0000 0000385F6A52FFFF [...] 01:11:12 kps:4.30e+006 00000000 Total 2:
00000016 0000385F6A530000 0000385F6A66FFFF [...] 01:11:12 kps:4.04e+006 00000000 Total 2:
00000017 0000385F6A670000 0000385F6A7AFFFF [...] 01:11:13 kps:3.98e+006 00000000 Total 2:
00000018 0000385F6A7B0000 0000385F6A8EFFFF [...] 01:11:13 kps:4.34e+006 00000000 Total 2:
00000019 0000385F6A8F0000 0000385F6AA2FFFF [...] 01:11:13 kps:3.98e+006 00000000 Total 2:
0000001A 0000385F6AA30000 0000385F6AB6FFFF [...] 01:11:14 kps:3.98e+006 00000000 Total 2:
0000001B 0000385F6AB70000 0000385F6ACAFFFF [...] 01:11:14 kps:3.98e+006 00000000 Total 2:
0000001C 0000385F6ACB0000 0000385F6ADEFFFF [...] 01:11:14 kps:4.34e+006 00000000 Total 2:
0000001D 0000385F6ADF0000 0000385F6AF2FFFF [...] 01:11:15 kps:4.33e+006 00000000 Total 2:
0000001E 0000385F6AF30000 0000385F6B06FFFF [...] 01:11:15 kps:3.98e+006 00000000 Total 2:
0000001F 0000385F6B070000 0000385F6B1AFFFF [...] 01:11:15 kps:3.99e+006 00000000 Total 2:
00000020 0000385F6B1B0000 0000385F6B2EFFFF [...] 01:11:16 kps:4.04e+006 00000000 Total 2:
00000021 0000385F6B2F0000 0000385F6B42FFFF [...] 01:11:16 kps:4.04e+006 00000001 Key 001:38 5F 6B 02 3B 33 8D FB

00000022 0000385F6B430000 0000385F6B56FFFF [...] 01:11:16 kps:4.69e+006 00000000 Total 3:
00000023 0000385F6B570000 0000385F6B6AFFFF [...] 01:11:17 kps:4.69e+006 00000000 Total 3:
00000024 0000385F6B6B0000 0000385F6B7EFFFF [...] 01:11:17 kps:4.33e+006 00000000 Total 3:
00000025 0000385F6B7F0000 0000385F6B92FFFF [...] 01:11:17 kps:4.03e+006 00000000 Total 3:
00000026 0000385F6B930000 0000385F6BA6FFFF [...] 01:11:18 kps:4.04e+006 00000000 Total 3:
00000027 0000385F6BA70000 0000385F6BBAFFFF [...] 01:11:18 kps:4.67e+006 00000000 Total 3:
00000028 0000385F6BBB0000 0000385F6BCEFFFF [...] 01:11:18 kps:4.04e+006 00000000 Total 3:
00000029 0000385F6BCF0000 0000385F6BE2FFFF [...] 01:11:19 kps:4.17e+006 00000000 Total 3:
0000002A 0000385F6BE30000 0000385F6BF6FFFF [...] 01:11:19 kps:4.27e+006 00000000 Total 3:
0000002B 0000385F6BF70000 0000385F6C0AFFFF [...] 01:11:19 kps:4.04e+006 00000000 Total 3:
0000002C 0000385F6C0B0000 0000385F6C1EFFFF [...] 01:11:20 kps:4.04e+006 00000000 Total 3:
0000002D 0000385F6C1F0000 0000385F6C32FFFF [...] 01:11:20 kps:4.25e+006 00000000 Total 3:
0000002E 0000385F6C330000 0000385F6C46FFFF [...] 01:11:20 kps:3.98e+006 00000000 Total 3:
0000002F 0000385F6C470000 0000385F6C5AFFFF [...] 01:11:21 kps:4.09e+006 00000000 Total 3:
00000030 0000385F6C5B0000 0000385F6C6EFFFF [...] 01:11:21 kps:4.03e+006 00000000 Total 3:
00000031 0000385F6C6F0000 0000385F6C82FFFF [...] 01:11:21 kps:4.27e+006 00000000 Total 3:
00000032 0000385F6C830000 0000385F6C96FFFF [...] 01:11:22 kps:4.35e+006 00000000 Total 3:
00000033 0000385F6C970000 0000385F6CAAFFFF [...] 01:11:22 kps:3.99e+006 00000000 Total 3:
00000034 0000385F6CAB0000 0000385F6CBEFFFF [...] 01:11:22 kps:4.69e+006 00000000 Total 3:
00000035 0000385F6CBF0000 0000385F6CD2FFFF [...] 01:11:22 kps:4.64e+006 00000000 Total 3:
00000036 0000385F6CD30000 0000385F6CE6FFFF [...] 01:11:23 kps:4.04e+006 00000000 Total 3:
00000037 0000385F6CE70000 0000385F6CFAFFFF [...] 01:11:23 kps:4.10e+006 00000000 Total 3:
00000038 0000385F6CFB0000 0000385F6D0EFFFF [...] 01:11:23 kps:4.03e+006 00000000 Total 3:
00000039 0000385F6D0F0000 0000385F6D22FFFF [...] 01:11:24 kps:3.98e+006 00000000 Total 3:
0000003A 0000385F6D230000 0000385F6D36FFFF [...] 01:11:24 kps:4.03e+006 00000000 Total 3:
0000003B 0000385F6D370000 0000385F6D4AFFFF [...] 01:11:24 kps:4.04e+006 00000000 Total 3:
0000003C 0000385F6D4B0000 0000385F6D5EFFFF [...] 01:11:25 kps:4.65e+006 00000000 Total 3:

***********************************************************************************
Paused: You pressed [ESC] Key. Do you Really want to Quit? PRESS 'Y' or 'y'
 

cayoenrique

Senior Member
Messages
476
@Me2019H
Code:
<kernel>:195:13: error: 'long long' type is not supported

It seems it does not like to see LL. I should be able to change that, so that it does not complain. Ignore for the moment. Not this time as I already uploaded the new sample. But next one.


SOOOoooo... Back to school.

Here the sample with split kernels. I did learn a lot with it.

OCLBiss_021.zip (38.38 KB)
Code:
https://workupload.com/file/bgyvPEHNyzT
Pass:www.sat-universe.com

Now Listen carefully. This is my experience on my Laptop. I can not say that it will work for you 100%. But try it.

Remember the settings
Code:
#Main Speed Adjustments
LOCALTHREADS:64 0 64 128 256        # 1) Set recommended, multiple of 32 or 64, new fast GPU can do 256
GLOBALTHREADS:6144 0 1536 3072 6144 # 2) Set Recommended, take not of how many CU.  Then 1rst suggest value CU x 256. Then multiples of 2.  In my case 6 * 256 = 1536, then multiples 1536 3072 6144
LOOPSPERTHREAD:1024                 # 3) Adjust LOOPSPERTHREAD to guess a cadence of about 1 second

See how I pace some notes. 1 2 3.

0) is you set to 0 so that you get what GPU prefers.
Code:
LOCALTHREADS:0
GLOBALTHREADS:0
LOOPSPERTHREAD:1024

And you watch 1rst screen. Here @dvlajkovic sample.
Code:
Device Kernel properties:
    number of cores:                 128
    recommended work group size (local threads):     32
    max work group size:                256

1) We all can see recommended=32 Maximum=256. As his GPU is good I guess 256 is OK. You may want to start with 32 or 64.


2) number of cores: should had read number of Compute Units ( CU ): So this number is 128 for him.

Read 1) #CU x 256 = 128 x 256 = 32768 Wao a big number!!! may be he should start with lower! In any case I remove the 65536 limit.
Now he can start with 32768 , then 65536 ... 131072 262144 524288 ...

This important because you want to have enough work to keep his CU busy, if not those CU's will go to the next shop to have some beers.

3) Adjust LOOPSPERTHREAD to have a nice cadence of 1 per seconds.

Now with this setup I get my 10X improvement. I suspect @dvlajkovic will be happier this time.

This was my result with 480 cores
Code:
Today is Sat Sep 16 14:08:23 2023
Connected Platform:									Advanced Micro Devices, Inc.
Connected Device:									Turks
PROGRAM_FILE selected:								csa_decrypt_1block_006.cl
Device Kernel properties:
	number of cores: 								6
	recommended work group size (local threads): 	64
	max work group size:							256
	Global Memory size:				536870912
Number Loops per thread: 							512
Number of keys per thread: 							512
Local threads: 										64
Global threads: 									16384
Keys per kernel: 									8388608(800000)
OS:													Linux mumble 4.19.0-19-amd64 #1 SMP Debian 4.19.232-1 (2022-03-07) x86_64 GNU/Linux
File Log from Last Searched present:				(11C16EC6EF80101)
Start from config.ini value:						(0000385F624F0000)
Base Selected:										(385F624F0000)
Looking for:										(0x385F62F94FCABCD5)

BruteForcing for:
SB01:		3C EB DC 17 3C 2B D6 4F 65 16 88 F2 58 D5 97 05 

Range: 0000000000800000

Loop             From             To                kernel Time  Keys per seconds        #Keys Found 

00000001 00385F624F0000 00385F62CEFFFF [...] 14:08:26  kps:1.11e+07 00000001          Key 001:38 5F 62 F9 4F CA BC D5  Total 1:
00000002 00385F62CF0000 00385F634EFFFF [...] 14:08:27  kps:1.13e+07 00000000  Total 1:
00000003 00385F634F0000 00385F63CEFFFF [...] 14:08:27  kps:1.13e+07 00000001          Key 001:38 5F 63 FA 58 04 89 E5  Total 2:
00000004 00385F63CF0000 00385F644EFFFF [...] 14:08:28  kps:1.13e+07 00000001          Key 001:38 5F 63 FA E9 6B 60 B4  Total 3:
00000005 00385F644F0000 00385F64CEFFFF [...] 14:08:29  kps:1.13e+07 00000001          Key 001:38 5F 64 FB AF 01 B0 60  Total 4:
00000006 00385F64CF0000 00385F654EFFFF [...] 14:08:30  kps:1.13e+07 00000001          Key 001:38 5F 65 FC 40 06 0F 55  Total 5:
00000007 00385F654F0000 00385F65CEFFFF [...] 14:08:30  kps:1.13e+07 00000000  Total 5:
00000008 00385F65CF0000 00385F664EFFFF [...] 14:08:31  kps:1.13e+07 00000000  Total 5:
00000009 00385F664F0000 00385F66CEFFFF [...] 14:08:32  kps:1.13e+07 00000000  Total 5:
0000000A 00385F66CF0000 00385F674EFFFF [...] 14:08:33  kps:1.13e+07 00000000  Total 5:         Key 001:38 5F 67 FE 7F 08 D6 5D          Key 001:38 5F 68 FF 3F 9E 6A 47          Key 001:38 5F 69 00 80 07 67 EE          Key 001:38 5F 69 00 D5 E2 D4 8B          Key 001:38 5F 6B 02 3B 33 8D FB          Key 001:38 5F 6F 06 AB 2C 37 0E          Key 001:38 5F 70 07 8C FE B9 43          Key 001:38 5F 71 08 AF 65 B3 C7 Key 002:38 5F 71 08 C5 4A 37 46 
00000020 00385F71CF0000 00385F724EFFFF [...] 14:08:49  kps:1.13e+07 00000000  Total 14:
00000021 00385F724F0000 00385F72CEFFFF [...] 14:08:50  kps:1.14e+07 00000000  Total 14:
00000022 00385F72CF0000 00385F734EFFFF [...] 14:08:50  kps:1.13e+07 00000000  Total 14:
00000023 00385F734F0000 00385F73CEFFFF [...] 14:08:51  kps:1.13e+07 00000000  Total 14:         Key 001:38 5F 75 0C 15 F6 4B 56          Key 001:38 5F 76 0D 23 BB 58 36          Key 001:38 5F 76 0D CA C6 CA 5A          Key 001:38 5F 77 0E 5C E4 31 71 Key 002:38 5F 77 0E 76 5F 8F 64          Key 001:38 5F 78 0F 55 15 39 A3          Key 001:38 5F 7A 11 DC 4A 79 9F


If this works for you. Then go back to last old OCLBiss_014. And try those settings, You will see that that old version will improve too.

Enjoy
 

dvlajkovic

Senior Member
Messages
498
If I try to build OCLBiss_021 it gives the following error:

||=== Build: Debug in OCLBiss (compiler: OpenCl_MSYS2) ===|
C:\Apps\home\OCLBiss_021_test\OCLBiss.c||In function 'main':|
C:\Apps\home\OCLBiss_021_test\OCLBiss.c|534|error: 'CL_MEM_HOST_NO_ACCESS' undeclared (first use in this function); did you mean 'CL_MEM_HOST_PTR'?|
C:\Apps\home\OCLBiss_021_test\OCLBiss.c|534|note: each undeclared identifier is reported only once for each function it appears in|
||=== Build finished: 1 error(s), 0 warning(s) (0 minute(s), 0 second(s)) ===|
 

dvlajkovic

Senior Member
Messages
498
Here are the results of OCLBiss_014 with new settings:
Code:
Today is Sun Sep 17 12:24:12 2023
Connected to device:NVIDIA Corporation => NVIDIA GeForce RTX 4090

Device Kernel properties:
    number of cores:                 128
    recommended work group size (local threads):     32
    max work group size:                256


Number Loops per thread:                 8192
Number of keys per thread:                 8192
Local threads:                         256
Global threads:                     1073741824
Keys per kernel:                     0x80000000000(8796093022208)

 base = A013E8AF0000
 Looking for 0x385F62F94FCABCD5

BruteForcing for:
SB01:        3C EB DC 17 3C 2B D6 4F 65 16 88 F2 58 D5 97 05

Range: 0000080000000000

Loop             From             To                kernel Time  Keys per seconds        #Keys Found

00000001 00A013E8AF0000 00A813E8AEFFFF [...] 12:24:20  kps:8.04e+012 00000032          Key 001:A0 14 03 B7 C2 41 20 23 ... Total 32:
00000002 00A813E8AF0000 00B013E8AEFFFF [...] 12:24:21  kps:9.99e+012 00000015          Key 001:A8 14 05 C1 87 E2 55 BE ... Total 47:
00000003 00B013E8AF0000 00B813E8AEFFFF [...] 12:24:22  kps:1.00e+013 00000011          Key 001:B0 13 E9 AC CD 83 59 A9 ... Total 68:
00000005 00C013E8AF0000 00C813E8AEFFFF [...] 12:24:23  kps:1.00e+013 00000031          Key 001:C0 14 07 DB BF 61 5E 7E ... Total 99:
00000006 00C813E8AF0000 00D013E8AEFFFF [...] 12:24:24  kps:1.00e+013 00000019          Key 001:C8 14 07 E3 57 A1 3F 37 ... Total 118:
00000007 00D013E8AF0000 00D813E8AEFFFF [...] 12:24:25  kps:9.98e+012 00000017          Key 001:D0 13 FF E2 5A 81 9F 7A ... Total 135:
00000008 00D813E8AF0000 00E013E8AEFFFF [...] 12:24:26  kps:9.96e+012 00000019          Key 001:D8 14 06 F2 68 60 4E 16 ... Total 154:
00000009 00E013E8AF0000 00E813E8AEFFFF [...] 12:24:27  kps:1.00e+013 00000023          Key 001:E0 13 FE F1 F4 C2 83 39 ... Total 177:
0000000A 00E813E8AF0000 00F013E8AEFFFF [...] 12:24:28  kps:1.00e+013 00000024          Key 001:E8 14 02 FE 2E 00 51 7F ... Total 201:
0000000B 00F013E8AF0000 00F813E8AEFFFF [...] 12:24:29  kps:9.98e+012 00000027          Key 001:F0 14 01 05 CE 00 B6 84 ... Key 027:F0 13 F1 F4 11 FD 67 75
Finish

I would go with Global threads:2147483648 but the app could not accept it, reseting to 256.
Dunno was it due to some hw or sw limit, but the highest figure I've reached was Global threads:1073741824.
The KPS is as above.

ikJQJsm.gif
 

cayoenrique

Senior Member
Messages
476
Just to stop Rumors to even start.
This is not TRUE. There was a limit on the old programs that will stop cores to go higher.
What will happen is that Our main Program Believe All it submitted request are succeeded. So it does calculation of time wrong. But in reality cores see that they are over 65536 limit and they quit ,do nothing. So time for computation is false as you ask them to work and limit say do not work just return.

Sorry..... But he got improve

@C0der
Sorry if I mislead. 10X is from my original. I was already in about 5X. So extra gain was 2X. Now that is also confusing as % of occupancy was low. That is why I start looking into how to make then feel busy.

Now what we learn. wee need to submit more jobs, so that Scheduler had no opportunity to send its CU to the 10 minute Break, and workers go to the happy hour for beers. ;)
 
Last edited:

azboxgo

Registered
Messages
9
Hi guys, I'm playing on my 4090 on Ubuntu 22.04 (I downloaded OCLBiss_021.zip and I had to add #define CL_VERSION_1_2 to find CL_MEM_HOST_NO_ACCESS and fix size_t kernelLocalMemSize = 0;) but it's reporting less kps than on your systems (I'm reusing Mr. dvlajkovic's configuration below):

oop From To kernel Time Keys per seconds #Keys Found
00000001 0000385F81E04000 0000385F81FF7FFF [...] 17:51:08 kps:3.76e+06 00000000 Total 0:
00000002 0000385F81FF8000 0000385F821EBFFF [...] 17:51:08 kps:3.90e+06 00000000 Total 0:
00000003 0000385F821EC000 0000385F823DFFFF [...] 17:51:09 kps:3.90e+06



PROGRAM_FILE:"csa_decrypt_1block_004.cl"
#PROGRAM_FILE:"csa_decrypt_1block_005.cl"
#PROGRAM_FILE:"csa_decrypt_1block_006.cl"
KERNEL_FUNC:"csa_block_sb0"
KERNEL_FUNC2:"csa_stream_sb0";
#
#Selection of Device
DETECTDEVICEENABLE:1
SELECTEDDEVICE:00#

Parallel OpenCL Parameters #32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 262144 524288
16384
SINGLETHREADENABLE:0

#Main Speed Adjustments
LOCALTHREADS: 256 64 128 256 # 1) Set recomended, multiple of 32 or 64, new fast GPU can do 256
GLOBALTHREADS: 10737418240 1536 3072 6144 # 2) Set Recomended, take not of how many CU. Then 1rst sugest value CU x 256. Then multiples of 2. In my case 6 * 256 = 1536, then multiples 1536 302 6144
LOOPSPERTHREAD: 8000 # 3) Adjust LOOPSPERTHREAD to gest a cadence of about 1 second

SPEEDTESTENABLE:0
SPEEDTESTKEY:00007578FAD3B720 00007578FAD3B728 00007578FAD30000
 
Last edited:

azboxgo

Registered
Messages
9
Sorry, discard my previous message, I cannot edit it anymore.
The following is what I wanted to edit.

Hi guys, I'm playing on my 4090 on Ubuntu 22.04 (I downloaded OCLBiss_021.zip and I had to add #define CL_VERSION_1_2 to find CL_MEM_HOST_NO_ACCESS and fix size_t kernelLocalMemSize = 0) but it's reporting less kps than on your systems (I'm reusing Mr. dvlajkovic's configuration below):
and I'm obtaining these results:
Inspecting System for OpenCL devices...Platforms found: 1
00 NVIDIA Corporation NVIDIA GeForce RTX 4090 GPU


Today is Sun Sep 17 18:36:42 2023
Connected Platform: NVIDIA Corporation
Connected Device: NVIDIA GeForce RTX 4090

PROGRAM_FILE selected: csa_decrypt_1block_004.cl

Device Kernel properties:
number of Compute Units (CU): 128
recommended work group size (local threads): 32
max work group size: 256
Global Memory size: 25390809088

Number Loops per thread: 8000
Number of keys per thread: 8000
Local threads: 256
Global threads: 16128
Keys per kernel: 129024000(7B0C000)

*****Important: To stop program and save current status*******
PRESS [CTL]+C and then say 'Y'

Scroll Lock LED in KeyBoard will be used to Signal Key Found event.
Please ensure it is off before continuing

File Log from Last Searched present: (11C1937911F0101)
Do you what to continue from here?
If you type 'N' or 'n' will start from config.ini value: (0000385F624F0000)
Press Y/N + [ENTER] to Continue (Y)

base = (3861B903C000)
Looking for (0x385F62F94FCABCD5)

BruteForcing for:
SB01: 3C EB DC 17 3C 2B D6 4F 65 16 88 F2 58 D5 97 05

Range: 0000000007B0C000

Loop From To kernel Time Keys per seconds #Keys Found
00000001 00003861B903C000 00003861C0B47FFF [...] 18:36:43 kps:2.38e+08 00000009 Total 9: Key 001:38 61 BE 57 2E 05 EC 1F Key 002:38 61 C0 59 0F 1B E0 0A Key 003:38 61 BA 53 5E 77 1B F0 Key 004:38 61 BA 53 A0 49 FF E8 Key 005:38 61 BA 53 E5 04 AE 97 Key 006:38 61 BA 53 74 5B 49 18 Key 007:38 61 BF 58 E0 6E 8A D8 Key 008:38 61 BA 53 C0 C9 68 F1 Key 009:38 61 BF 58 39 D4 18 25
00000002 00003861C0B48000 00003861C8653FFF [...] 18:36:44 kps:2.46e+08 00000011 Total 20: Key 001:38 61 C3 5C 9D BF D7 33 Key 002:38 61 C7 60 46 65 C8 73 Key 003:38 61 C0 59 D0 1C B8 A4 Key 004:38 61 C7 60 19 DE D0 C7 Key 005:38 61 C5 5E 2F 84 86 39 Key 006:38 61 C2 5B 09 61 90 FA Key 007:38 61 C6 5F B9 3A 25 18 Key 008:38 61 C3 5C 6D BB 53 7B Key 009:38 61 C6 5F D9 D4 81 2E Key 010:38 61 C7 60 C3 41 0C 10 Key 011:38 61 C1 5A 4B 9E C6 AF
00000003 00003861C8654000 00003861D015FFFF [...] 18:36:44 kps:2.47e+08 00000005 Total 25: Key 001:38 61 CC 65 E6 54 3E 78 Key 002:38 61 CB 64 C2 76 E7 1F Key 003:38 61 CD 66 47 87 9D 6B Key 004:38 61 CB 64 DB 27 84 86 Key 005:38 61 C8 61 EA 8C C4 3A
00000004 00003861D0160000 00003861D7C6BFFF [...] 18:36:45 kps:2.46e+08 00000007 Total 32: Key 001:38 61 D0 69 AB 17 8B 4D Key 002:38 61 D2 6B FB 26 87 A8 Key 003:38 61 D7 70 83 53 59 2F Key 004:38 61 D3 6C AE 5F 48 55 Key 005:38 61 D2 6B 59 8D 33 19 Key 006:38 61 D1 6A 9E E8 A3 29 Key 007:38 61 D1 6A 5D 25 2B AD
00000005 00003861D7C6C000 00003861DF777FFF [...] 18:36:45 kps:2.46e+08 00000002 Total 34: Key 001:38 61 DD 76 93 BF 20 72 Key 002:38 61 D7 70 F5 FB C7 B7

...
 

cayoenrique

Senior Member
Messages
476
I need to something that is important to me. Hopefully important to you too.

I know moonbase is going to get upset with me.

We ALL depend in CSA to survive. If CSA is broken there will be no more FREE TV!!!!! Even this forum will have no use. Do you want to loose CSA?

So If any one of you find a way to crack CSA in hours. PLEASE do not post the results. ALL users here depend in you keeping it secrete so that CSA lives. Long live CSA....
Now what you do with it privately is up to you.

Regards, posted results. I already mentioned. But as I place my past comment I will repeat.

014 has a line that reads if (get_global_id(0) > 65536) return; So the cores see that and just stop working. You are just measuring time to load kernel and read results. Not measuring any time spent looking for a key as no time was do doing that!!

Now comment that line and retest for speed.

Yess yo can go into OCLBiss.c and change at the top

Code:
#define CL_TARGET_OPENCL_VERSION 120
#define CL_USE_DEPRECATED_OPENCL_1_2_APIS

And you should get over 'CL_MEM_HOST_NO_ACCESS' error. Will work for most of you. Those that may have OpenCl 1.1 will not.
 
Last edited:

cayoenrique

Senior Member
Messages
476
@dvlajkovic
Code:
I would go with Global threads:2147483648 but the app could not accept it, reseting to 256.
Dunno was it due to some hw or sw limit, but the highest figure I've reached was Global threads:1073741824.

On new OCLBiss_021 "1073741824" 1Gb would close to your real maximum for 4090. Why?
Because every key has to be save in Glogal memory as 1 64 bit, This is 8 bytes. You have 24 Gbytes memory, but you need 8 per saving. Than means considering ONLY this save you can do as much of (24/8)3GB. But we do use other memory and the scheduler too need memory. So I can guess that 1/3 of the memory could be close to what you can do. See the scheduler need to reserve almost all you memory, that is a task that require Time. Witch make it slower. So it is like stupid. We try to go faster, but the mechanism we use finally makes it slower for other reasons. This is why we need to keep testing many different ways to do it. Until you find the one that best fits your card.


Now OCLBiss_014 do not save temporary value to memory. Once you comment //if (get_global_id(0) > 65536) return; Or just delete that line. kernel should be fine.

For OCLBiss_021
at top of OCLBiss.c just leave
Code:
#define CL_TARGET_OPENCL_VERSION 120
#define CL_USE_DEPRECATED_OPENCL_1_2_APIS
It seems 'CL_MEM_HOST_NO_ACCESS' need at least Opencl 1.2 and I left indicating I want it 1.1.
 

cayoenrique

Senior Member
Messages
476
@azboxgo
Welcome friend.
I told @dvlajkovic that once he post that numbers where high the Silence City will be no more silent. Or phantoms will start showing up. See this phantom has been always there but in silence. Yes just like the UFO.... I know I have a missing screw in my head.

And @azboxgo has some knowledge he found himself the solution for the kernels. Congratulations.

Now I was in the hope to see you guys getting closer to output of cudabiss. Where closer does not mean equal nor faster. Like 10 times what you seen. What I mean approaching 10^9.

Do not worry, we go back to OCLBiss_014 no memory and we should find out a way to schedule more work before reading. So instead of saving every half result we save the Fake findings in Global memory. Once we have enough we submit a different kernel, one that can read those fake keys and evaluate if they can clear a second 16 byte Pusy TS. At this time we save results, and expect to read output on PC.

Now I have try this approach in the past and fail. See the compiler think he is smarter than us. As soon that he see the results is not needed, then the smart m07h3r f*** delete the code, he thinks that if we do not read the output is code that is not needed!! In the past when I tried this I got same result as @dvlajkovic 10^12 wao I was happier that a two tail dog. And all was happening because the kernels where doing nothing thanks to the optimize code by compiler!! I think I should have better luck this time.
 
Top