Not sure what is causing the problem, after opening the bug , there is no 
response  from the SLURM team.

Regards,
Triveni

-----Original Message-----
From: gasper.ku...@ung.si [mailto:gasper.ku...@ung.si]
Sent: Friday, October 16, 2015 1:32 PM
To: slurm-dev <slurm-dev@schedmd.com>
Subject: [slurm-dev] Re: Problem while updating to new slurm version


Yes, our database still holds all of the data it had before this upgrade to 
version 15.08. For AccountingStorageEnforce we are using 
"AccountingStorageEnforce=limits,associations". Is the associations storage 
enforcement producing this issue?

Cheers,
Gašper Kukec Mezek

>
> I see a similar bug reported:
> http://bugs.schedmd.com/show_bug.cgi?id=1942
>
> We also use AccountingStorageType=accounting_storage/mysql. Our
> accounting data was not deleted from the database, after we started
> slurm without Accounting settings.
>
> Are you using associations for AccountingStorageEnforce?
> Is there  any data in the database?
>
> Cheers,
> Barbara
>
> On 10/15/2015 07:47 PM, gasper.ku...@ung.si wrote:
>> This solved the problem I was having earlier (thank you), but another
>> error now came up. While running slurmctld without accounting
>> settings, it started up without a problem, but adding the account
>> settings back again, I get:
>> --------------------------
>> # slurmctld -Dvvvv
>> slurmctld: pidfile not locked, assuming no running daemon
>> slurmctld: slurmctld version 15.08.1 started on cluster ung
>> slurmctld: debug3: Trying to load plugin
>> /usr/lib64/slurm/crypto_munge.so
>> slurmctld: Munge cryptographic signature plugin loaded
>> slurmctld: debug3: Success.
>> slurmctld: debug3: Trying to load plugin
>> /usr/lib64/slurm/select_cons_res.so
>> slurmctld: Consumable Resources (CR) Node Selection plugin loaded
>> with argument 20
>> slurmctld: debug3: Success.
>> slurmctld: debug3: Trying to load plugin
>> /usr/lib64/slurm/preempt_none.so
>> slurmctld: preempt/none loaded
>> slurmctld: debug3: Success.
>> slurmctld: debug3: Trying to load plugin
>> /usr/lib64/slurm/checkpoint_none.so
>> slurmctld: debug3: Success.
>> slurmctld: debug:  Checkpoint plugin loaded: checkpoint/none
>> slurmctld: debug3: Trying to load plugin
>> /usr/lib64/slurm/acct_gather_energy_none.so
>> slurmctld: debug:  AcctGatherEnergy NONE plugin loaded
>> slurmctld: debug3: Success.
>> slurmctld: debug3: Trying to load plugin
>> /usr/lib64/slurm/acct_gather_profile_none.so
>> slurmctld: debug:  AcctGatherProfile NONE plugin loaded
>> slurmctld: debug3: Success.
>> slurmctld: debug3: Trying to load plugin
>> /usr/lib64/slurm/acct_gather_infiniband_none.so
>> slurmctld: debug:  AcctGatherInfiniband NONE plugin loaded
>> slurmctld: debug3: Success.
>> slurmctld: debug3: Trying to load plugin
>> /usr/lib64/slurm/acct_gather_filesystem_none.so
>> slurmctld: debug:  AcctGatherFilesystem NONE plugin loaded
>> slurmctld: debug3: Success.
>> slurmctld: debug2: No acct_gather.conf file
>> (/etc/slurm/acct_gather.conf)
>> slurmctld: debug3: Trying to load plugin
>> /usr/lib64/slurm/jobacct_gather_linux.so
>> slurmctld: debug:  Job accounting gather LINUX plugin loaded
>> slurmctld: debug3: Success.
>> slurmctld: debug3: Trying to load plugin
>> /usr/lib64/slurm/ext_sensors_none.so
>> slurmctld: ExtSensors NONE plugin loaded
>> slurmctld: debug3: Success.
>> slurmctld: debug3: Trying to load plugin
>> /usr/lib64/slurm/switch_none.so
>> slurmctld: debug:  switch NONE plugin loaded
>> slurmctld: debug3: Success.
>> slurmctld: debug:  No backup controller to shutdown
>> slurmctld: debug3: Trying to load plugin
>> /usr/lib64/slurm/accounting_storage_mysql.so
>> slurmctld: debug2: mysql_connect() called for db slurm_db
>> slurmctld: debug2: It appears the table conversions have already
>> taken place, hooray!
>> slurmctld: Accounting storage MYSQL plugin loaded
>> slurmctld: debug3: Success.
>> slurmctld: debug2: acct_storage_p_get_connection: request new
>> connection
>> 0
>> slurmctld: error: _get_assoc_mgr_tres_list: no list was made.
>> slurmctld: error: Association database appears down, reading from
>> state file.
>> slurmctld: debug3: Version in assoc_mgr_state header is 7424
>> slurmctld: debug:  Recovered 4 tres
>> slurmctld: debug:  Recovered 0 associations
>> slurmctld: debug3: Version in assoc_usage header is 7424
>> slurmctld: debug:  Reading slurm.conf file: /etc/slurm/slurm.conf
>> slurmctld: debug3: layouts: layouts_init()...
>> slurmctld: layouts: no layout to initialize
>> slurmctld: debug3: Trying to load plugin
>> /usr/lib64/slurm/topology_none.so
>> slurmctld: topology NONE plugin loaded
>> slurmctld: debug3: Success.
>> slurmctld: debug:  No DownNodes
>> slurmctld: debug3: Version in last_conf_lite header is 7424
>> slurmctld: debug3: Trying to load plugin
>> /usr/lib64/slurm/jobcomp_none.so
>> slurmctld: debug3: Success.
>> slurmctld: debug3: Trying to load plugin
>> /usr/lib64/slurm/sched_backfill.so
>> slurmctld: sched: Backfill scheduler plugin loaded
>> slurmctld: debug3: Success.
>> slurmctld: debug3: Trying to load plugin
>> /usr/lib64/slurm/route_default.so
>> slurmctld: route default plugin loaded
>> slurmctld: debug3: Success.
>> slurmctld: layouts: loading entities/relations information
>> slurmctld: debug3: layouts: loading node drevo1
>> slurmctld: debug3: layouts: loading node drevo2
>> slurmctld: debug3: layouts: loading node zorro
>> slurmctld: debug3: layouts: loading node zorro2
>> slurmctld: debug:  layouts: 4/4 nodes in hash table, rc=0
>> slurmctld: debug:  layouts: loading stage 1
>> slurmctld: debug:  layouts: loading stage 1.1 (restore state)
>> slurmctld: debug:  layouts: loading stage 2
>> slurmctld: debug:  layouts: loading stage 3
>> slurmctld: debug3: Version string in node_state header is
>> PROTOCOL_VERSION
>> slurmctld: Recovered state of 4 nodes
>> slurmctld: debug3: Version string in job_state header is
>> PROTOCOL_VERSION
>> slurmctld: debug3: Job id in job_state header is 12275
>> slurmctld: debug3: Set job_id_sequence to 12275
>> slurmctld: Recovered information about 0 jobs
>> slurmctld: cons_res: select_p_node_init
>> slurmctld: cons_res: preparing for 1 partitions
>> slurmctld: debug:  Ports available for reservation 12000-12500
>> slurmctld: debug2: init_requeue_policy: kill_invalid_depend is set to
>> 0
>> slurmctld: debug:  Updating partition uid access list
>> slurmctld: debug3: Version string in resv_state header is
>> PROTOCOL_VERSION
>> slurmctld: Recovered state of 0 reservations
>> slurmctld: State of 0 triggers recovered
>> slurmctld: read_slurm_conf: backup_controller not specified.
>> slurmctld: cons_res: select_p_reconfigure
>> slurmctld: cons_res: select_p_node_init
>> slurmctld: cons_res: preparing for 1 partitions
>> slurmctld: Running as primary controller
>> slurmctld: Registering slurmctld for cluster ung at port 6817 in
>> database.
>> slurmctld: debug3: Trying to load plugin
>> /usr/lib64/slurm/priority_multifactor.so
>> slurmctld: fatal: It appears you don't have any association data from
>> your database.  The priority/multifactor plugin requires this
>> information to run correctly.  Please check your database connection
>> and try again.
>> --------------------------
>>
>> The first guess I have is that it is having some problems registering
>> the cluster. Here is the cluster information through sacctmgr:
>> --------------------------
>> # sacctmgr list cluster
>>     Cluster     ControlHost  ControlPort   RPC     Share GrpJobs
>> GrpTRES GrpSubmit MaxJobs       MaxTRES MaxSubmit     MaxWall
>>         QOS   Def QOS
>> ---------- --------------- ------------ ----- --------- -------
>> ------------- --------- ------- ------------- --------- -----------
>> -------------------- ---------
>>         ung    193.2.120.31         6817  7424         1
>>
>>     normal
>> --------------------------
>>
>> Also, we are using MySQL for accounting storage, so it might be that
>> it ran when deleting account settings, because it was set to some
>> other form of accounting (error: You are not running a supported
>> accounting_storage plugin (accounting_storage/none)). The mysql and
>> slurmdb daemons are both running and the folder structure inside the
>> mysql database is as
>> expected:
>> --------------------------
>> mysql> show databases;
>> +--------------------+
>> | Database           |
>> +--------------------+
>> | information_schema |
>> | mysql              |
>> | slurm_db           |
>> | test               |
>> +--------------------+
>>
>> mysql> show tables;
>> +-----------------------------+
>> | Tables_in_slurm_db          |
>> +-----------------------------+
>> | acct_coord_table            |
>> | acct_table                  |
>> | clus_res_table              |
>> | cluster_table               |
>> | qos_table                   |
>> | res_table                   |
>> | table_defs_table            |
>> | tres_table                  |
>> | txn_table                   |
>> | ung_assoc_table             |
>> | ung_assoc_usage_day_table   |
>> | ung_assoc_usage_hour_table  |
>> | ung_assoc_usage_month_table |
>> | ung_event_table             |
>> | ung_job_table               |
>> | ung_last_ran_table          |
>> | ung_resv_table              |
>> | ung_step_table              |
>> | ung_suspend_table           |
>> | ung_usage_day_table         |
>> | ung_usage_hour_table        |
>> | ung_usage_month_table       |
>> | ung_wckey_table             |
>> | ung_wckey_usage_day_table   |
>> | ung_wckey_usage_hour_table  |
>> | ung_wckey_usage_month_table |
>> | user_table                  |
>> +-----------------------------+
>> --------------------------
>>
>> I also made sure that the privileges for the slurm user in the
>> database are correct.
>>
>> Cheers,
>> Gašper Kukec Mezek
>>
>>
>>>
>>>
>>> <Gasper.Kukec@...> writes:
>>>
>>>>
>>>> Here is the sacctmgr return value:
>>>> --------------------------
>>>> # sacctmgr show config
>>>> Configuration data as of 2015-10-13T09:05:05
>>>> AccountingStorageBackupHost  = (null) AccountingStorageHost  =
>>>> zorro
>>>> AccountingStorageLoc   = slurm_db
>>>> AccountingStoragePass  = secret
>>>> AccountingStoragePort  = 3306
>>>> AccountingStorageType  = accounting_storage/mysql
>>>> AccountingStorageUser  = slurm
>>>> AuthType               = auth/munge
>>>> MessageTimeout         = 10 sec
>>>> PluginDir              = /usr/lib64/slurm
>>>> PrivateData            = none
>>>> SlurmUserId            = slurm(106)
>>>> SLURM_CONF             = /etc/slurm/slurm.conf
>>>> SLURM_VERSION          = 15.08.1
>>>> TrackWCKey             = 0
>>>> --------------------------
>>>>
>>>>  From slurmdbd, I get:
>>>
>>> I think it's a bug.
>>> We had the same problem, when I ran slurm upgrade on our test
>>> cluster (also CentOS 6). After I removed the accounting settings in
>>> the slurm.conf, restarted slurmdbd and slurmctld, both services
>>> started to work. After that I stopped them again, added same
>>> accounting settings in the config file and started services. Now it
>>> works.
>>> Unfortunately I don't why this happens.
>>>
>>> Cheers,
>>> Barbara
>>>
>>>
>
The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. WARNING: Computer viruses can be transmitted via email. The 
recipient should check this email and any attachments for the presence of 
viruses. The company accepts no liability for any damage caused by any virus 
transmitted by this email. www.wipro.com

Reply via email to