Are there checkpoints?

Questions and Answers : Bugs : Are there checkpoints?
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Vester
Avatar

Send message
Joined: 13 May 22
Posts: 19
Credit: 146,233
RAC: 0
Message 126 - Posted: 14 May 2022, 23:00:12 UTC

I have restarted my computer several times, and tasks all begin at 00.000% at startup instead of from a checkpoint . Some tasks fail at restart as well.

Windows 11 Pro (x64)
BOINC 7.16.20
LODA 2205.14
Intel i9-10850
ID: 126 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Christian Krause
Project administrator

Send message
Joined: 9 May 22
Posts: 250
Credit: 449,267
RAC: 198
Message 133 - Posted: 15 May 2022, 6:46:06 UTC - in response to Message 126.  

The mining is stateless. Tasks don't need to finish to produce results. All findings are sent immediately to the server. So it is no problem if you restart in between.
ID: 133 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Catchercradle

Send message
Joined: 13 May 22
Posts: 28
Credit: 16,466
RAC: 0
Message 139 - Posted: 15 May 2022, 7:32:44 UTC - in response to Message 133.  

The mining is stateless. Tasks don't need to finish to produce results. All findings are sent immediately to the server. So it is no problem if you restart in between.
Except that if you exit BOINC, shutting down the client, the task then starts again from the beginning. It might not be a problem for the project but it does mean that for the tasks I had running yesterday evening, over twelve hours total of computation time was wasted. Adding checkpoints would allow tasks to be resumed from the point they had reached before the com;puter was shut down for the night.
ID: 139 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Christian Krause
Project administrator

Send message
Joined: 9 May 22
Posts: 250
Credit: 449,267
RAC: 198
Message 145 - Posted: 15 May 2022, 15:03:41 UTC - in response to Message 139.  

Yes, you're right, it's a valid feature request. For now what you could do is to use the BOINC feature "No new tasks" when you plan to shut down your machine. Regarding the checkpoints: we can implement this, but I don't know how to do it in a secure way. Hackers could manipulate checkpoints to do a "fast-forward": it could be used to let miners jump from 0% to 100% without having any work done. I don't know how to prevent this.

PS: the tasks should take only about 4h.
ID: 145 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
crashtech

Send message
Joined: 17 May 22
Posts: 8
Credit: 1,773,690
RAC: 1
Message 232 - Posted: 21 May 2022, 0:44:04 UTC
Last modified: 21 May 2022, 0:44:54 UTC

In my experience, lack of checkpointing can deter users from participating. I also tend to think that implementing checkpoints shows best regard for volunteer's time and resources. These are just my opinions. I will participate for a while, but as a cruncher who has several computers, I get discouraged when, say a power outage causes several hundred hours of computing to be lost. This has happened to me in the past. It may seem trivial to some, but more important to others.

Most mature projects have checkpointing. I do not know how they implement it in a tamper-proof way. If it can't be implemented in LODA, then perhaps shorter task durations would mitigate the loss to volunteers caused by unscheduled reboots.
ID: 232 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Christian Krause
Project administrator

Send message
Joined: 9 May 22
Posts: 250
Credit: 449,267
RAC: 198
Message 249 - Posted: 22 May 2022, 10:13:29 UTC - in response to Message 232.  

As announced in the last News item, we plan to add checkpoints to LODA. See also this issue on Github:
https://github.com/loda-lang/loda-cpp/issues/143
ID: 249 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Catchercradle

Send message
Joined: 13 May 22
Posts: 28
Credit: 16,466
RAC: 0
Message 250 - Posted: 22 May 2022, 10:35:34 UTC - in response to Message 249.  

As announced in the last News item, we plan to add checkpoints to LODA. See also this issue on Github:
https://github.com/loda-lang/loda-cpp/issues/143


I checked link, it says, "No description provided."
ID: 250 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Christian Krause
Project administrator

Send message
Joined: 9 May 22
Posts: 250
Credit: 449,267
RAC: 198
Message 253 - Posted: 22 May 2022, 13:55:57 UTC - in response to Message 250.  

There is no description in the ticket, but the goal is to implement checkpoints. We plan to add it in the next app version.
ID: 253 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Christian Krause
Project administrator

Send message
Joined: 9 May 22
Posts: 250
Credit: 449,267
RAC: 198
Message 272 - Posted: 23 May 2022, 19:58:00 UTC - in response to Message 253.  

The new app version (220523) supports checkpointing.
ID: 272 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
fzs600

Send message
Joined: 13 May 22
Posts: 9
Credit: 10,580,204
RAC: 3,043
Message 279 - Posted: 24 May 2022, 6:25:29 UTC - in response to Message 272.  

The new app version (220523) supports checkpointing.

thank you
ID: 279 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum

Send message
Joined: 23 May 22
Posts: 11
Credit: 462,342
RAC: 0
Message 282 - Posted: 24 May 2022, 14:27:05 UTC

So far BoincTasks has yet to indicate a checkpoint has been made on 2205.23 WUs.
Sometimes the checkpointing does not report properly but it is actually checkpointing.
I've requested Suspend on Checkpoint but have yet to see a WU suspend.
How frequent are the checkpoints?
ID: 282 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
[AF>Le_Pommier] Jerome_C2005

Send message
Joined: 13 May 22
Posts: 18
Credit: 1,192,438
RAC: 1,253
Message 283 - Posted: 24 May 2022, 16:19:40 UTC

Depending on the project, you can try to open the folder containing the files in the slot directory of your task and you may find some txt (or xml) files including more information on the on-going checkpointing.

But I don't know how it is managed for LODA (and don't have access to a computer with LODA running on it at the moment).
ID: 283 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum

Send message
Joined: 23 May 22
Posts: 11
Credit: 462,342
RAC: 0
Message 284 - Posted: 24 May 2022, 17:38:16 UTC - in response to Message 283.  

Depending on the project, you can try to open the folder containing the files in the slot directory of your task and you may find some txt (or xml) files including more information on the on-going checkpointing.

I just clicked through all my slot folders and I didn't find anything for LODA. I also looked through the LODA project folder.
ID: 284 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Werinbert

Send message
Joined: 14 May 22
Posts: 7
Credit: 100,055
RAC: 0
Message 285 - Posted: 24 May 2022, 20:41:53 UTC

Checkpointing may have been implemented but tasks are failing if they have been suspended and then restarted.
ID: 285 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Christian Krause
Project administrator

Send message
Joined: 9 May 22
Posts: 250
Credit: 449,267
RAC: 198
Message 292 - Posted: 25 May 2022, 6:54:12 UTC - in response to Message 285.  

We had an unplanned downtime of our API server. It is now up and running again. Sorry for the inconvenience.
ID: 292 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum

Send message
Joined: 23 May 22
Posts: 11
Credit: 462,342
RAC: 0
Message 295 - Posted: 25 May 2022, 12:34:52 UTC

Still have not seen a checkpoint after restart stabilization. Elapsed Time = Time Since Last Checkpoint.
ID: 295 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Christian Krause
Project administrator

Send message
Joined: 9 May 22
Posts: 250
Credit: 449,267
RAC: 198
Message 301 - Posted: 25 May 2022, 19:39:55 UTC - in response to Message 295.  

When you restart the BOINC Manager, do the tasks resume from 0% or from where you left off? They should continue from the previous percentage.
ID: 301 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Aurum

Send message
Joined: 23 May 22
Posts: 11
Credit: 462,342
RAC: 0
Message 310 - Posted: 26 May 2022, 13:48:55 UTC - in response to Message 301.  

When you restart the BOINC Manager, do the tasks resume from 0% or from where you left off? They should continue from the previous percentage.

I suspended 3 WUs: 00:38:47 15.8%, 00:43:48 11.1%, and 00:01:53 0.4%.
Then rebooted computer and unsuspended those 3 WUs. They restarted with elapsed time at zero and the same percent complete they suspended with: 00:00:00 15.8%, 00:00:00 11.1%, and 00:00:00 0.4%.
Does being "stateless" mean you don't have to write a snapshot of some break-point to storage?
ID: 310 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
Christian Krause
Project administrator

Send message
Joined: 9 May 22
Posts: 250
Credit: 449,267
RAC: 198
Message 312 - Posted: 27 May 2022, 13:49:14 UTC - in response to Message 310.  

We write checkpoints and it is working also on your machine. The computation of the work units is resumed. The elapsed time is computed by BOINC directly (not our app).
ID: 312 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
frankhagen

Send message
Joined: 14 May 22
Posts: 6
Credit: 220
RAC: 0
Message 315 - Posted: 27 May 2022, 15:03:42 UTC - in response to Message 312.  

We write checkpoints and it is working also on your machine. The computation of the work units is resumed. The elapsed time is computed by BOINC directly (not our app).


no!

it simply is that silly wrapper which is unable to check this.

create a native boinc app and you will be fine.
ID: 315 · Rating: 0 · rate: Rate + / Rate - Report as offensive     Reply Quote
1 · 2 · Next

Questions and Answers : Bugs : Are there checkpoints?

©2024 LODA Language