2018-03-16 §
22:44<zhuyifei1999_>suspended process 22825 (BotOrderOfChapters.exe) on tools-bastion-03. Threads continuously going to D-state & R-state. Also sent message via $ write on pts/10
12:13<arturo>reboot tools-webgrid-lighttpd-1420 due to almost full /tmp
2018-03-15 §
16:56<zhuyifei1999_>granted elasticsearch credentials to tools.denkmalbot T185624
2018-03-14 §
20:57<bd808>Upgrading elasticsearch on tools-elastic-01 (T181531)
20:53<bd808>Upgrading elasticsearch on tools-elastic-02 (T181531)
20:51<bd808>Upgrading elasticsearch on tools-elastic-03 (T181531)
12:07<arturo>reboot tools-webgrid-lighttpd-1415, almost full /tmp
12:01<arturo>repool tools-webgrid-lighttpd-1421, /tmp is now empty
11:56<arturo>depool tools-webgrid-lighttpd-1421 for reboot due to /tmp almost full
2018-03-12 §
20:09<madhuvishy>Run clush -w @all -b 'sudo umount /mnt/nfs/labstore1003-scratch && sudo mount -a' to remount scratch across all of tools
17:13<arturo>T188994 upgrading packages from `stable`
16:53<arturo>T188994 upgrading packages from stretch-wikimedia
16:33<arturo>T188994 upgrading packages form jessie-wikimedia
14:58<zhuyifei1999_>building, publishing, and deploying misctools 1.31 5f3561eT189430
13:31<arturo>tools-exec-1441 and tools-exec-1442 rebooted fine and are repooled
13:26<arturo>depool tools-exec-1441 and tools-exec-1442 for reboots
13:19<arturo>T188994 upgrade packages from jessie-backports in all jessie servers
12:49<arturo>T188994 upgrade packages from trusty-updates in all ubuntu servers
12:34<arturo>T188994 upgrade packages from trusty-wikimedia in all ubuntu servers
2018-03-08 §
16:05<chasemp>tools-clushmaster-01:~$ clush -g all 'sudo puppet agent --test'
14:02<arturo>T188994 upgrading trusty-tools packages in all the cluster, this includes jobutils, openssh-server and openssh-sftp-server
2018-03-07 §
20:42<chicocvenancio>killed io intensive recursive zip of huge folder
18:30<madhuvishy>Killed php-cgi job run by user 51242 on tools-webgrid-lighttpd-1413
14:08<arturo>just merged NFS package pinning https://gerrit.wikimedia.org/r/#/c/416943/
13:47<arturo>deploying more apt pinnings: https://gerrit.wikimedia.org/r/#/c/416934/
2018-03-06 §
16:15<madhuvishy>Reboot tools-docker-registry-02 T189018
15:50<madhuvishy>Rebooting tools-worker-1011
15:08<chasemp>tools-k8s-master-01:~# kubectl uncordon tools-worker-1011.tools.eqiad.wmflabs
15:03<arturo>drain and reboot tools-worker-1011
15:03<chasemp>rebooted tools-worker 1001-1008
14:58<arturo>drain and reboot tools-worker-1010
14:27<chasemp>multiple tools running on k8s workers report issues reading replica.my.cnf file atm
14:27<chasemp>reboot tools-worker-100[12]
14:23<chasemp>downtime icinga alert for k8s workers ready
13:21<arturo>T188994 in some servers there was some race in the dpkg lock between apt-upgrade and puppet. Also, I forgot to use DEBIAN_FRONTEND=noninteractive, so debconf prompts happened and stalled dpkg operations. Already solved, but some puppet alerts were produced
12:58<arturo>T188994 upgrading packages in jessie nodes from the oldstable source
11:42<arturo>clush -w @all "sudo DEBIAN_FRONTEND=noninteractive apt-get autoclean" <-- free space in filesystem
11:41<arturo>aborrero@tools-clushmaster-01:~$ clush -w @all "sudo DEBIAN_FRONTEND=noninteractive apt-get autoremove -y" <-- we did in canary servers last week and it went fine. So run in fleet-wide
11:36<arturo>(ubuntu) removed linux-image-3.13.0-142-generic and linux-image-3.13.0-137-generic (T188911)
11:33<arturo>removing unused kernel packages in ubuntu nodes
11:08<arturo>aborrero@tools-clushmaster-01:~$ clush -w @all "sudo rm /etc/apt/preferences.d/* ; sudo puppet agent -t -v" <--- rebuild directory, it contains stale files across all the cluster
2018-03-05 §
18:56<zhuyifei1999_>also published jobutils_1.30_all.deb
18:39<zhuyifei1999_>built and published misctools_1.30_all.deb T167026T181492
14:33<arturo>delete `linux-image-4.9.0-6-amd64` package from stretch instances for T188911
14:01<arturo>deleting old kernel packages in jessie instances for T188911
13:58<arturo>running `apt-get autoremove` with clush in all jessie instances
12:16<arturo>apply role::toollabs::base to tools-paws prefix in horizon for T187193
12:10<arturo>apply role::toollabs::base to tools-prometheus prefix in horizon for T187193
2018-03-02 §
13:41<arturo>doing some testing with puppet classes in tools-package-builder-01 via horizon
2018-03-01 §
13:27<arturo>deploy https://gerrit.wikimedia.org/r/#/c/415057/