Posts by AnandBhat

1) Questions and Answers : Bugs : Short run times for some workunits (Message 660)
Posted 6 Dec 2022 by AnandBhat
Post:
As I run LODA with resource share 0, I only get the tasks needed to keep my CPUs busy (16 at a time for the 16 threads) and have no queued LODA tasks at any given time. However, if I get work from another project with an earlier deadline, BOINC suspends the LODA tasks while it works on the ones that expire earlier.

I will stop requesting work for other projects if I'm running LODA going forward until this is addressed.
2) Questions and Answers : Bugs : Short run times for some workunits (Message 656)
Posted 6 Dec 2022 by AnandBhat
Post:
I noticed something that may be related.

1. When LODA tasks are in a Waiting to run or Suspended status in BOINC, the process in Windows Task Manager shows it is Suspended. Tasks for other projects do not appear to do this.

2. The memory for the LODA tasks in the Suspended state as seen in Task Manager drops over time.

3. When resumed, the LODA tasks "jump" in % completed, either directly to 100% or to a higher percent complete. It's almost as if the system thinks the task has been processing something while it was suspended.

Here's a snippet from when this LODA task fluctuated between Running and Waiting to run, with my indicators for when the switch happened. The progress completed %s appeared to increase proportionally to the time the task was not running (related to the standard 2 hour runtime?):
2022-12-06 12:40:55|INFO |Starting LODA v22.12.2. See https://loda-lang.org/
2022-12-06 12:40:55|INFO |Found environment variable: PROJECT_DIR=.\
2022-12-06 12:40:55|INFO |Loading init data from file: .\init_data.xml
2022-12-06 12:40:55|INFO |Platform: windows, system memory: 15734 MiB
2022-12-06 12:40:55|INFO |User name: AnandBhat, host ID: 139
2022-12-06 12:40:55|INFO |Using LODA home directory "C:\ProgramData\BOINC/projects/boinc.loda-lang.org_loda\"
2022-12-06 12:40:55|INFO |Checking environment
2022-12-06 12:40:55|WARN |Setting environment variable: COMSPEC=C:\WINDOWS\system32\cmd.exe
2022-12-06 12:40:55|WARN |Setting environment variable: SYSTEMROOT=C:\WINDOWS
2022-12-06 12:40:55|WARN |Setting environment variable: PATH=C:\WINDOWS\system32;C:\WINDOWS\system32\WindowsPowerShell\v1.0;C:\Program Files\Git\cmd;C:\ProgramData\BOINC/projects/boinc.loda-lang.org_loda\git\cmd;C:\Program Files\Git\usr\bin;C:\ProgramData\BOINC/projects/boinc.loda-lang.org_loda\git\usr\bin
2022-12-06 12:40:55|WARN |Setting environment variable: TMP=C:\ProgramData\BOINC/projects/boinc.loda-lang.org_loda\
2022-12-06 12:40:55|WARN |Setting environment variable: TEMP=C:\ProgramData\BOINC/projects/boinc.loda-lang.org_loda\
2022-12-06 12:40:55|INFO |Loading sequences from the OEIS index
2022-12-06 12:40:59|INFO |Loaded 337318/358544 sequences in 4.45s
2022-12-06 12:41:06|INFO |Initialized 5 matchers (ignoring 123223 sequences)
2022-12-06 12:41:07|INFO |Loaded 500 patterns
2022-12-06 12:41:07|INFO |Initialized 1 generators (profile: pattern, overwrite: none)
2022-12-06 12:41:07|INFO |Mining programs in client mode (extended validation mode)
2022-12-06 12:42:18|INFO |Processed 46 programs, 1.2%  <=====Entered Waiting to Run, suspended for ~20 minutes
2022-12-06 13:01:14|INFO |Processed 147 programs, 16.9% <=====Entered Running, progress jumped by 15%
2022-12-06 13:03:21|INFO |Processed 414 programs, 18.7% <=====Entered Waiting to Run, suspended for ~40 minutes
2022-12-06 13:42:05|INFO |Processed 57 programs, 51.0% <=====Entered Running, progress jumped by 32%
2022-12-06 13:42:41|INFO |Processed 536 programs, 51.5%
2022-12-06 13:43:24|INFO |Processed 149 programs, 52.1%
2022-12-06 13:44:01|INFO |Processed 261 programs, 52.6%
2022-12-06 13:46:41|INFO |Processed 62 programs, 54.8%
2022-12-06 13:47:22|INFO |Processed 15 programs, 55.4%
2022-12-06 13:48:26|INFO |Processed 682 programs, 56.3%
2022-12-06 13:49:02|INFO |Processed 257 programs, 56.8% <=====Entered Waiting to Run, suspended for ~30 minutes
2022-12-06 14:19:14|INFO |Processed 133 programs, 81.9% <=====Entered Running, progress jumped by 25%
3) Questions and Answers : Bugs : Short run times for some workunits (Message 655)
Posted 6 Dec 2022 by AnandBhat
Post:
I saw this happen again today. I have LODA running with resource share 0 (i.e., only download work when my other projects do not have any work). I had 16 LODA tasks running when I received some LHC tasks with an earlier deadline. My system automatically paused the LODA tasks (status: "Waiting to run") and proceeded to run and complete the LHC tasks. After a couple of hours, when the LODA tasks that were at varying %s of progress got their chance to run, the tasks immediately "completed" and were reported as valid.

E.g.,
https://boinc.loda-lang.org/loda/workunit.php?wuid=2685909
https://boinc.loda-lang.org/loda/workunit.php?wuid=2685639

I have the option for setting "Leave non-GPU tasks in memory when suspended" checked in BOINC.

I managed to freeze network communications and captured this output file for wu_1670076947_34367_0. It appears the task abruptly "completed" after resuming:
2022-12-06 10:39:13|INFO |Starting LODA v22.12.2. See https://loda-lang.org/
2022-12-06 10:39:13|INFO |Found environment variable: PROJECT_DIR=.\
2022-12-06 10:39:13|INFO |Loading init data from file: .\init_data.xml
2022-12-06 10:39:13|INFO |Platform: windows, system memory: 15734 MiB
2022-12-06 10:39:13|INFO |User name: AnandBhat, host ID: 139
2022-12-06 10:39:13|INFO |Using LODA home directory "C:\ProgramData\BOINC/projects/boinc.loda-lang.org_loda\"
2022-12-06 10:39:13|INFO |Checking environment
2022-12-06 10:39:13|WARN |Setting environment variable: COMSPEC=C:\WINDOWS\system32\cmd.exe
2022-12-06 10:39:13|WARN |Setting environment variable: SYSTEMROOT=C:\WINDOWS
2022-12-06 10:39:13|WARN |Setting environment variable: PATH=C:\WINDOWS\system32;C:\WINDOWS\system32\WindowsPowerShell\v1.0;C:\Program Files\Git\cmd;C:\ProgramData\BOINC/projects/boinc.loda-lang.org_loda\git\cmd;C:\Program Files\Git\usr\bin;C:\ProgramData\BOINC/projects/boinc.loda-lang.org_loda\git\usr\bin
2022-12-06 10:39:13|WARN |Setting environment variable: TMP=C:\ProgramData\BOINC/projects/boinc.loda-lang.org_loda\
2022-12-06 10:39:13|WARN |Setting environment variable: TEMP=C:\ProgramData\BOINC/projects/boinc.loda-lang.org_loda\
2022-12-06 10:39:13|INFO |Loading sequences from the OEIS index
2022-12-06 10:39:19|INFO |Loaded 337318/358544 sequences in 5.74s
2022-12-06 10:39:27|INFO |Initialized 5 matchers (ignoring 123210 sequences)
2022-12-06 10:39:27|INFO |Initialized 1 generators (profile: mutate3, overwrite: none)
2022-12-06 10:39:27|INFO |Mining programs in client mode (extended validation mode)
2022-12-06 10:39:49|INFO |Processed 49 programs, 0.5%
2022-12-06 10:40:48|INFO |Processed 2104 programs, 1.3%
2022-12-06 10:41:30|INFO |Processed 53 programs, 1.9%
2022-12-06 10:44:04|INFO |Processed 9 programs, 4.0%
2022-12-06 10:44:53|INFO |Processed 159 programs, 4.7%
2022-12-06 10:45:46|INFO |Processed 469 programs, 5.4%
2022-12-06 10:46:22|INFO |Processed 229 programs, 5.9%
2022-12-06 10:46:58|INFO |Processed 2035 programs, 6.4%
2022-12-06 10:47:11|INFO |Fetched http://api.loda-lang.org/miner/v1/oeis/b245433.txt.gz
2022-12-06 10:47:34|INFO |Processed 3391 programs, 6.9%
2022-12-06 10:48:10|INFO |Processed 229783 programs, 7.4%
2022-12-06 10:49:04|INFO |Processed 308607 programs, 8.2%
2022-12-06 10:50:30|INFO |Processed 893 programs, 9.4%
2022-12-06 10:51:06|INFO |Processed 439748 programs, 9.9%
2022-12-06 10:52:49|INFO |Processed 335372 programs, 11.3%
2022-12-06 12:41:00|INFO |Finished mining after 121 minutes
4) Questions and Answers : Bugs : Tasks failed with error code -131 (file size too big) (Message 654)
Posted 5 Dec 2022 by AnandBhat
Post:
I have a few tasks that failed with the error code -131 (file size too big)

https://boinc.loda-lang.org/loda/result.php?resultid=3369836
https://boinc.loda-lang.org/loda/result.php?resultid=3369645
https://boinc.loda-lang.org/loda/result.php?resultid=3369594
https://boinc.loda-lang.org/loda/result.php?resultid=3369338
https://boinc.loda-lang.org/loda/result.php?resultid=3369499

None of these have been reassigned yet.

The error in the task log is of the form:
<message>
upload failure: <file_xfer_error>
  <file_name>wu_1670076947_13832_0_r1089728081_0</file_name>
  <error_code>-131 (file size too big)</error_code>
</file_xfer_error>
</message>
I see references to <max_nbytes>102400.000000</max_nbytes> in client_state.xml for LODA tasks however I was unable to check how big the result file sizes were as they appear to have been deleted when the status was communicated to the server.
5) Questions and Answers : Bugs : Short run times for some workunits (Message 652)
Posted 1 Dec 2022 by AnandBhat
Post:
A few of my workunits have had abnormally small run times. I thought the run times for all work units was set to 2 hours and thought it was best to report this here if this warrants a closer look.

https://boinc.loda-lang.org/loda/workunit.php?wuid=2599141 - 297s
https://boinc.loda-lang.org/loda/workunit.php?wuid=2599156 - 270s
https://boinc.loda-lang.org/loda/workunit.php?wuid=2599261 - 153s
https://boinc.loda-lang.org/loda/workunit.php?wuid=2599315 - 53s
etc.

I'm not sure if these were marked completed due to some out of memory condition or some other system/ network error as these appear to have completed at the same time. However, they've all passed validation and I've received credits for them. The task outputs do not show any errors and the LODA logs page is blank.
6) Questions and Answers : Getting started : Wingman task assignments to same computer (Message 154)
Posted 16 May 2022 by AnandBhat
Post:
If the nature of work done at LODA makes replication redundant, I would suggest setting initial replication and minimum quorum to 1 instead. Or at the very least, use Adaptive Replication.
7) Questions and Answers : Getting started : Work Unit Logs available (Message 104)
Posted 14 May 2022 by AnandBhat
Post:
Thanks! This is very useful
8) Questions and Answers : Getting started : Wingman task assignments to same computer (Message 101)
Posted 14 May 2022 by AnandBhat
Post:
I have several workunits where both initial replication tasks have been sent to the same computer.

Examples:
Workunit 18294 - Tasks 41024 and 41025 have both been assigned to computer 139.
Workunit 18357 - Tasks 41150 and 41151 have both been assigned to computer 139.

I'm assuming this is an assignment error, as it'd be kinda pointless to have quorum of 2 from the same computer.
9) Questions and Answers : Getting started : New version available for better problem analysis (Message 100)
Posted 14 May 2022 by AnandBhat
Post:
Thanks. I'll remove it from that machine until (if) a workaround is found.

I have got it working on another machine where I've installed Git as an admin under Program Files.
10) Questions and Answers : Getting started : New version available for better problem analysis (Message 96)
Posted 14 May 2022 by AnandBhat
Post:
Christian Krause wrote:
Hi All,
since some of the boinc clients ran into errors, I uploaded a new version (220513) which provides better logs. If your workers fail immediately after the start, please respond to this thread. Then I can dig into the logs and try to analyze the problem.

Cheers,
Christian
I continue to see failures with LODA 2205.14. See Task 41129

I have Git installed in my user folder (I left it at the default during installing, which was C:\Users\anandbha\AppData\Local\Programs\Git -- it probably did this as I'm not the administrator and cannot install under Program Files anyway). I can access git from the command prompt without having to navigate to this path so its definitely in the environment path.

C:\Users\anandbha>git --version
git version 2.36.1.windows.1




©2024 LODA Language