20210806, 21:07  #3499 
"Seth"
Apr 2019
369_{10} Posts 
Math is hard
Code:
$ cat worktodo.txt Factor=N/A,960477823,66,67 Factor=N/A,960477823,67,68 $ ./mfaktc.exe mfaktc v0.21 (64bit built) ... got assignment: exp=960477823 bit_min=66 bit_max=67 (0.02 GHzdays) Starting trial factoring M960477823 from 2^66 to 2^67 (0.02 GHzdays) k_min = 38411594760 k_max = 76823196254 Using GPU kernel "barrett76_mul32_gs" M960477823 has a factor: 147602823780943516039 found 1 factor for M960477823 from 2^66 to 2^67 [mfaktc 0.21 barrett76_mul32_gs] WARNING: ignoring line 1 in "worktodo.txt"! Reason: doesn't begin with Factor= WARNING: ignoring line 2 in "worktodo.txt"! Reason: doesn't begin with Factor= got assignment: exp=960477823 bit_min=67 bit_max=68 (0.03 GHzdays) Starting trial factoring M960477823 from 2^67 to 2^68 (0.03 GHzdays) k_min = 76823194140 k_max = 153646392509 M960477823 has a factor: 147602823780943516039 found 1 factor for M960477823 from 2^67 to 2^68 [mfaktc 0.21 barrett76_mul32_gs] $ python c 'import math; print(math.log2(147602823780943516039))' 67.00028221952357 
20210807, 00:47  #3500 
"Viliam FurÃk"
Jul 2018
Martin, Slovakia
1010101011_{2} Posts 
Yes, there is an overlap in the k_max (76823196254) of the 67bit range and k_min (76823194140) of the 68bit range. So the k of this composite factor, being 76838225853, can be found in both ranges.

20210812, 15:45  #3501 
Mar 2014
2^{2}·13 Posts 
Attempting to set up new system
I have just received a shiny new laptop, and am having some trouble getting it set up.
I have run mfaktc and mfakto, once each, on previous machines, and remember it mostly being a matter of having drivers up to date and picking the right one of mfaktc or mfakto. This time has been harder. Would appreciate some advice: Windows 10, it10750H CPU@2.60Ghz. 32GB RAM. NVIDIA GeForce GTX 1660 Ti video card. I downloaded and unzipped mfaktc0.21.win_cuda11.22047.zip. I downloaded and installed NVIDIA's latest set of tools (cuda_11.4.1_471.41_win10.exe), and when that didn't work, uninstalled it and tried again with cuda_11.2.0_460.89_win10.exe. I grabbed cudart64_110.dll off the web (I think off this forum!) and tried placing it various places  in the system32 directory, in the mfaktc directory, in the same place as the other NVIDIA dlls. Each time the selftest exits with error 209: no kernel image is available for execution on the device. Any suggestions what to try next welcome. Is there a cudart64_112.dll I need? (I didn't run across one on the web.) A different directory I need to place the cudart file in? Something else obvious I did wrong? Complete selftest result pasted below: D:\grb\math\mfaktc>mfaktcwin64 st mfaktc v0.21 (64bit built) Compiletime options THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 193154bits SIEVE_SPLIT 250 MORE_CLASSES enabled Runtime options SievePrimes 25000 SievePrimesAdjust 1 SievePrimesMin 5000 SievePrimesMax 100000 NumStreams 3 CPUStreams 3 GridSize 3 GPU Sieving enabled GPUSievePrimes 82486 GPUSieveSize 2047Mi bits GPUSieveProcessSize 16Ki bits Checkpoints enabled CheckpointDelay 30s WorkFileAddDelay 600s Stages enabled StopAfterFactor bitlevel PrintMode full V5UserID (none) ComputerID (none) AllowSleep no TimeStampInResults no CUDA version info binary compiled for CUDA 11.20 CUDA runtime version 11.20 CUDA driver version 11.20 CUDA device info name GeForce GTX 1660 Ti compute capability 7.5 max threads per block 1024 max shared memory per MP 65536 byte number of multiprocessors 24 clock rate (CUDA cores) 1590MHz memory clock rate: 6001MHz memory bus width: 192 bit Automatic parameters threads per grid 786432 GPUSievePrimes (adjusted) 82486 GPUsieve minimum exponent 1055144 ########## testcase 1/2867 ########## Starting trial factoring M50804297 from 2^67 to 2^68 (0.59 GHzdays) Using GPU kernel "75bit_mul32_gs" Date Time  class Pct  time ETA  GHzd/day Sieve Wait Aug 12 09:26  3387 0.1%  0.001 n.a.  n.a. 82485 n.a.% ERROR: cudaGetLastError() returned 209: no kernel image is available for execution on the device D:\grb\math\mfaktc> 
20210812, 16:12  #3502 
"James Heinrich"
May 2004
exNorthern Ontario
3506_{10} Posts 

20210812, 16:12  #3503  
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
5,813 Posts 
Quote:
Use the reference info, such as https://www.mersenneforum.org/showpo...18&postcount=1 https://www.mersenneforum.org/showpo...1&postcount=11 to better understand compatibility requirements. Good luck. Code:
mfaktc v0.21 (64bit built) Compiletime options THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 193154bits SIEVE_SPLIT 250 MORE_CLASSES enabled Runtime options SievePrimes 25000 SievePrimesAdjust 1 SievePrimesMin 5000 SievePrimesMax 100000 NumStreams 4 CPUStreams 3 GridSize 3 GPUSievePrimes 92000 GPUSieveSize 2047Mi bits GPUSieveProcessSize 32Ki bits Checkpoints enabled CheckpointDelay 600s WorkFileAddDelay 3600s Stages enabled StopAfterFactor bitlevel PrintMode full V5UserID kriesel ComputerID asrockgtx1650Super ProgressHeader "Date Time  class Pct  time ETA  GHzd/day Sieve Wait" ProgressFormat "%d %T  %C %p%%  %t %e  %g %s %W%%" AllowSleep yes TimeStampInResults yes CUDA version info binary compiled for CUDA 10.0 CUDA runtime version 10.0 CUDA driver version 11.0 CUDA device info name GeForce GTX 1650 SUPER compute capability 7.5 max threads per block 1024 max shared memory per MP 65536 byte number of multiprocessors 20 clock rate (CUDA cores) 1740MHz memory clock rate: 6001MHz memory bus width: 128 bit Automatic parameters threads per grid 655360 random selftest offset 11535 GPUSievePrimes (adjusted) 92726 GPUsieve minimum exponent 1197042 running a simple selftest... Selftest statistics number of tests 107 successfull tests 107 selftest PASSED! 

20210812, 16:26  #3504 
Mar 2014
2^{2}×13 Posts 
Thanks, james and kriesel.
I am up and running with the cuda 10 version. GRB 
20210827, 21:54  #3505  
"Seth"
Apr 2019
3^{2}×41 Posts 
Quote:
Quote:
I want to check if <factor> mod (2 * k * exponent + 1) == 0 where <factor> doesn't fit in a int64 I break factor up to it's base 10 representation (which is what I have in the char*): digit * 10^0 + digit_2 * 10^1 + digit_3 * 10^3 + digit_4 * 10^3... I sum each digit * (10^n mod (2*k*exponent+1) to get a congruent sum. I only need to check a handful of divisions to remove 99% of composites. I tested with st and st2 and also verified that a bunch of previously found "factors" are no longer found. Let me know how I can help get this committed. 

20210828, 03:53  #3506 
"Seth"
Apr 2019
3^{2}×41 Posts 
I tested this over a wider range of assignments and realized a mistake. I mistakenly assumed k had to be odd.
And my patch needs this tiny change.  a/src/output.c +++ b/src/output.c @@ 403,8 +403,7 @@ int is_small_composite(uint64_t exponent, char *factor) * composites > (4 * 10^8 * exponent^2) can pass, but require much high bitlevels. */ int len = strlen(factor);   for (uint64_t k = 1; k <= 10000; k += 2) + for (uint64_t k = 1; k <= 10000; k++) 
20210918, 00:04  #3507 
Aug 2002
5×1,663 Posts 

20211022, 05:57  #3508 
Bemusing Prompter
"Danny"
Dec 2002
California
2×17×71 Posts 
I found a paper on GPU modular exponentiation that I don't think has been mentioned here before: https://eprint.iacr.org/2007/187.pdf
However, the paper is 14 years old. Does it contain anything that may be useful for us, or is it only stuff we already know? 
20211022, 11:31  #3509  
Sep 2006
The Netherlands
5^{2}×31 Posts 
Quote:
In short it can run completely trivial embarassingly parallel at the gpu. Supersimple. 

Thread Tools  
Similar Threads  
Thread  Thread Starter  Forum  Replies  Last Post 
mfakto: an OpenCL program for Mersenne prefactoring  Bdot  GPU Computing  1680  20210913 17:01 
The P1 factoring CUDA program  firejuggler  GPU Computing  753  20201212 18:07 
grmfaktc: a CUDA program for generalized repunits prefactoring  MrRepunit  GPU Computing  32  20201111 19:56 
mfaktc 0.21  CUDA runtime wrong  keisentraut  Software  2  20200818 07:03 
World's seconddumbest CUDA program  fivemack  Programming  112  20150212 22:51 