Server

Launch Methods

  • MS Windows script

    start\AFANASY\_afserver.cmd

  • UNIX script

    start/AFANASY/_afserver.sh

  • Linux daemon when Linux packages are installed

    sudo systemctl start afserver

  • Setup CGRU environment and launch a command:

    cd cgru
    source ./setup.sh
    afserver
    

System Job

System job is designed to execute system tasks on render farm (such as post commands). When server needs to execute some command it appends system job with a task.

Note

Your farm should be configured to execute have system services to execute job post commands (remove rendered scenes).

You can explore system job by Watch GUI in super user mode, to manipulate it’s parameters to control its running.

../_images/sysjob_job.png

System Job

../_images/sysjob_tasks.png

System Job Tasks

If error system task can’t be restarted (a number of error retries reached the maximum value) it will be deleted. It needed to prevent the growth of system tasks number.

You can watch system job log and its task log. When error occurs the log will be appended with the command output.

To reset system commands queue you can restart block or task.

Configuration

  • "af_sysjob_tasklife": 1800

    Maximum system task age in seconds. If task age will equal to this number it will be treated as an error task. It needed to prevent the growth of tasks number, if some task(s) can’t be executed (restarted).

  • "af_sysjob_tasksmax": 1000

    Maximum number of running or ready tasks. If number of tasks will equal to this number, no new tasks will be created. But commands will not be lost, they will be stored in special list, to wait for some tasks will be done. It needed to prevent the growth of tasks number, if system job running will be stopped for some time (may be all hosts appeared in black lists). Tasks need more memory and CPU time then a simple commands list.

  • "af_sysjob_postcmd_service": "postcmd"

    Service type for Post Commands system block.

  • "af_sysjob_events_service":"events"

    Service type for Events system block.

  • "af_sysjob_wol_service": "wakeonlan"

    Service type for Wake-On-LAN system block.

  • "af_render_cmd_wolsleep": "wolsleep"

    Sleep command performed by a render client.

  • "af_render_cmd_wolwake": "wolwake"

    Wake command constructed by a server and performed on a online client by the system job.

Post Commands

Post commands are executed on a job deletion. It is designed to clean up temporary files, that are not needed w/o the job. In a most common case, it is a temporary scene file to render.

Most submission scripts copy (save) current scene to some temporary file. This way artist can continue to make and save modifications in the current opened scene during render. Scene will be rendered at the state it was submitted.

Post commands are executed by renders via server system job post_commands block.

Wake-On-LAN

You can setup Afanasy to Wake-On-LAN machines.

Wake-On-LAN work-flow:

  • Server sends a message to client to ask him to sleep.
  • Client receives message from server to sleep.
  • Client executes a wolsleep command which can be customized in Afanasy configuration.
  • Client falls a sleep.
  • Server does not receive updates from client and make it offline.
  • Server “decides” to wake a render up.
  • Server adds a task wolwake mac1 .. macN to system job wake-on-lan block. Command can be customized in Afanasy configuration.
  • Another online and ready render executes the task.
  • This task sends magic packet for each mac address of a sleeping render to a broadcast address. It is a small Python script provided with CGRU.
  • Render wakes up.

You can wake and sleep renders by afwatch GUI and afcmd command.

Events

Events are generated by server. When event happened, job and user data is pushed to event service as a command by JSON. If event is emitted by render, render and all parent pools will be written too. Event service Python class reads its command - JSON data and can generate any command to execute. So event task receives data by a command, do something with this data and can construct a real command to execute as a task process.

JOB_DONE

Some job became done.

JOB_ERROR

Some job task produced an error.

JOB_DELETED

Job has been deleted.

RENDER_ZOMBIE

Render stopped to send updates to server for zombie_time seconds.

RENDER_SICK

Render produced sick_errors_count errors from different users in a row and got SICK state.

RENDER_NO_TASK

Render has no task for no_task_event_time seconds.

RENDER_OVERLOAD

Render has low free memory or disk or swap. How much resources considered as low, you can configure by JSON config parameters:

  • af_render_overflow_mem - percentage of a free memory.
  • af_render_overflow_swap - percentage of a free swap.
  • af_render_overflow_hdd - percentage of a free disk space.

By default this parameters are equal to -1 and this means that the resource check is disabled. Practically good free percentage to emit event is 1, as an overloaded machine never reaches zero free memory or hdd.

The next time event will be emitted after overload_event_time seconds.

There is already default Python service class: cgru/afanasy/python/services/events.py It designed to send emails.

Example of a custom data to send emails:

{
    "emails":["some@email.com"],
    "events":
    {
        "JOB_ERROR":{"methods":["email"]},
        "JOB_DONE":{"methods":["email"]}
    }
}

User and job custom data objects are simple merged. So user can have information about email and job about events. If user will have email and events in custom data all it jobs will send emails.

You can write any custom Python service class, for example: cgru/afanasy/python/services/events_local.py

And set it as System job events block service name in your configuration file: "af_sysjob_events_service":"events_local"

Statistics

Afanasy server can store jobs and tasks statistics in SQL database. It uses PostgreSQL engine. On job deletion and task finish (with any result) server insert some job and task data into database tables.

Database Schema

afanasy=# \d jobs;
                           Table "public.jobs"
     Column     |          Type          | Collation | Nullable | Default
----------------+------------------------+-----------+----------+---------
 annotation     | character varying(512) |           |          |
 blockname      | character varying(512) |           |          |
 capacity       | integer                |           |          | 0
 description    | character varying(512) |           |          |
 folder         | character varying(512) |           |          |
 jobname        | character varying(512) |           |          |
 hostname       | character varying(512) |           |          |
 service        | character varying(512) |           |          |
 tasks_done     | integer                |           |          | 0
 tasks_quantity | integer                |           |          | 0
 run_time_sum   | bigint                 |           |          | 0
 time_done      | bigint                 |           |          | 0
 time_started   | bigint                 |           |          | 0
 username       | character varying(512) |           |          |
 serial         | bigint                 |           |          | 0
 id_block       | integer                |           |          | 0

afanasy=# \d tasks;
                           Table "public.tasks"
    Column     |          Type           | Collation | Nullable | Default
---------------+-------------------------+-----------+----------+---------
 annotation    | character varying(512)  |           |          |
 blockname     | character varying(512)  |           |          |
 capacity      | integer                 |           |          | 0
 command       | character varying(4096) |           |          |
 description   | character varying(512)  |           |          |
 error         | integer                 |           |          | 0
 errors_count  | integer                 |           |          | 0
 folder        | character varying(512)  |           |          |
 frame_pertask | bigint                  |           |          | 0
 hostname      | character varying(512)  |           |          |
 jobname       | character varying(512)  |           |          |
 resources     | character varying(4096) |           |          |
 service       | character varying(512)  |           |          |
 starts_count  | integer                 |           |          | 0
 time_done     | bigint                  |           |          | 0
 time_started  | bigint                  |           |          | 0
 username      | character varying(512)  |           |          |
 serial        | bigint                  |           |          | 0
 id_block      | integer                 |           |          | 0
 id_task       | integer                 |           |          | 0

Database Setup

  • Edit Postgre SQL client authentication configuration file pg_hba.conf.

    Its location depends on Linux distributive. For example:

    Debian, Ubuntu: /etc/postgresql/ [version] /main/pg_hba.conf

    CentOS, Fedora, openSUSE: /var/lib/pgsql/data/pg_hba.conf

    make install: /usr/local/pgsql/data/pg_hba.conf

    Add this line: local afanasy afadmin password Read comments in this file to know what does it mean. (If problems with authentication try trust for all methods.)

  • Restart database

  • Create afanasy database and user

    sudo su - postgres
    createdb afanasy
    psql afanasy
    CREATE USER afadmin PASSWORD 'AfPassword';
    

Create Tables

  • Go into CGRU root folder: cd /opt/cgru
  • Source setup: source ./setup.sh
  • Check database connection: afcmd db_check
  • Program should output an error or print “Database connection is working” if everything is ok.
  • Create required tables: afcmd db_reset_all
  • This command also delete old tables if they exists.

Server setup

You need to install a web server with PHP and PGSQL modules. Any Linux distribution have this packages.

In most Linux-es all this can be provided by packages: apache2 libapache2-mod-php php php-pgsql

The site is located in cgru/afanasy/statistics folder.

Web Page

There is a Web page to view Afanasy SQL statistics database.

../_images/stat_tasks.png

Statistics Tasks Graph Page

TIME-WAIT

TIME-WAIT is a special socket state, needed to ensure that all packages will not be lost. If server calls close() function first, its socket will fall into this state. To ensure that the connection last package is processed, it will wait:

TIME-WAIT = 2 * MSL (Maximum Segment Lifetime)

This is the reason why server should not call close() first. On a big amount of clients (~1000), application can reach 2^16 ports limit. Afanasy waits for about 2sec for client to close socket first. To check socket connected state we just try to write in it. SIGPIPE is ignored by Afanasy

To check sockets state you can:

netstat -nat | grep 51000 | wc -l
netstat -nat | egrep ':51000.*:.*TIME_WAIT' | wc -l
ss -tan state time-wait | wc -l
ss -tan 'sport = :51000' | awk '{print $(NF)" "$(NF-1)}' | sed 's/:[^ ]*//g' | sort | uniq -c