2018-05-25 §
05:31<zhuyifei1999_>Edit /data/project/.system/gridengine/default/common/sge_request, h_vmem 256M -> 512M, release precise -> trusty T195558
2018-05-22 §
11:53<arturo>running puppet to deploy https://gerrit.wikimedia.org/r/#/c/433996/ for T194665 (mono framework update)
2018-05-18 §
16:36<bd808>Restarted bigbrother on tools-services-02
2018-05-16 §
21:01<zhuyifei1999_>maintain-kubeusers on stuck in infinite sleeps of 10 seconds
2018-05-15 §
04:28<andrewbogott>depooling, rebooting, re-pooling tools-exec-1414. It's hanging for unknown reasons.
04:07<zhuyifei1999_>Draining unresponsive tools-exec-1414 following Portal:Toolforge/Admin#Draining_a_node_of_Jobs
04:05<zhuyifei1999_>Force deletion of grid job 5221417 (tools.giftbot sga), host tools-exec-1414 not responding
2018-05-12 §
10:09<Hauskatze>tools.quentinv57-tools@tools-bastion-02:~$ webservice stop | T194343
2018-05-11 §
14:34<andrewbogott>repooling labvirt1001 tools instances
13:59<andrewbogott>depooling a bunch of things before rebooting labvirt1001 for T194258: tools-exec-1401 tools-exec-1407 tools-exec-1408 tools-exec-1430 tools-exec-1431 tools-exec-1432 tools-exec-1435 tools-exec-1438 tools-exec-1439 tools-exec-1441 tools-webgrid-lighttpd-1402 tools-webgrid-lighttpd-1407
2018-05-10 §
18:55<andrewbogott>depooling, rebooting, repooling tools-exec-1401 to test a kernel update
2018-05-09 §
21:11<Reedy>Added Tim Starling as member/admin
2018-05-07 §
21:02<zhuyifei1999_>re-building all docker images T190893
20:48<zhuyifei1999_>building, signing, and publishing toollabs-webservice 0.39 T190893
00:25<zhuyifei1999_>`renice -n 15 -p 28865` (`tar cvzf` of `tools.giftbot`) on tools-bastion-02, been hogging the NFS IO for a few hours
2018-05-05 §
23:37<zhuyifei1999_>regenerate k8s creds for tools.zhuyifei1999-test because I messed up while testing
2018-05-03 §
14:48<arturo>uploaded a new ruby docker image to the registry with the libmysqlclient-dev package T192566
2018-05-01 §
14:05<andrewbogott>moving tools-webgrid-lighttpd-1406 to labvirt1016 (routine rebalancing)
2018-04-27 §
18:26<zhuyifei1999_>`$ write` doesn't seem to be able to write to their tmux tty, so echoed into their pts directly: `# echo -e '\n\n[...]\n' > /dev/pts/81`
18:17<zhuyifei1999_>SIGTERM tools-bastion-03 PID 6562 tools.zoomproof celery worker
2018-04-23 §
14:41<zhuyifei1999_>`chown tools.pywikibot:tools.pywikibot /shared/pywikipedia/` Prior owner: tools.russbot:project-tools T192732
2018-04-22 §
13:07<bd808>Kill orphan php-cgi processes across the job grid via clush -w @exec -w @webgrid -b 'ps axwo user:20,ppid,pid,cmd | grep -E " 1 " | grep php-cgi | xargs sudo kill -9'`
2018-04-15 §
17:51<zhuyifei1999_>forced puppet puns across tools-elastic-0[1-3] T192224
17:44<zhuyifei1999_>granted elasticsearch credentials to tools.flaky-ci T192224
2018-04-11 §
13:25<chasemp>cleanup exim frozen messages in an effort to aleve queue pressure
2018-04-06 §
16:30<chicocvenancio>killed job in bastion, tools.gpy affected
14:30<arturo>add puppet class `toollabs::apt_pinning` to tools-puppetmaster-01 using horizon, to add some apt pinning related to T159254
11:23<arturo>manually upgrade apache2 on tools-puppemaster for T159254
2018-04-05 §
18:46<chicocvenancio>killed wget that was hogging io
2018-03-29 §
20:09<chicocvenancio>killed interactive processes in tools-bastion-03
19:56<chicocvenancio>several interactive jobs running in bastion-03. I am writing to connected users and will kill the jobs once done
2018-03-28 §
13:06<zhuyifei1999_>SIGTERM PID 30633 on tools-bastion-03 (tool 3d2commons's celery). Please run this on grid
2018-03-26 §
21:34<bd808>clush -w @exec -w @webgrid -b 'sudo find /tmp -type f -atime +1 -delete'
2018-03-23 §
23:26<bd808>clush -w @exec -w @webgrid -b 'sudo find /tmp -type f -atime +1 -delete'
19:43<bd808>tools-proxy-* Forced puppet run to apply https://gerrit.wikimedia.org/r/#/c/421472/
2018-03-22 §
22:04<bd808>Forced puppet run on tools-proxy-02 for T130748
21:52<bd808>Forced puppet run on tools-proxy-01 for T130748
21:48<bd808>Disabled puppet on tools-proxy-* for https://gerrit.wikimedia.org/r/#/c/420619/ rollout
03:50<bd808>clush -w @exec -w @webgrid -b 'sudo find /tmp -type f -atime +1 -delete'
2018-03-21 §
17:50<bd808>Cleaned up stale /project/.system/bigbrother.scoreboard.* files from labstore1004
01:09<bd808>Deleting /tmp files owned by tools.wsexport with -mtime +2 across grid (T190185)
2018-03-20 §
08:28<zhuyifei1999_>unmount dumps & remount on tools-bastion-02 (can someone clush this?) T189018T190126
2018-03-19 §
11:02<arturo>reboot tools-exec-1408, to balance load. Server is unresponsive due to high load by some tools
2018-03-16 §
22:44<zhuyifei1999_>suspended process 22825 (BotOrderOfChapters.exe) on tools-bastion-03. Threads continuously going to D-state & R-state. Also sent message via $ write on pts/10
12:13<arturo>reboot tools-webgrid-lighttpd-1420 due to almost full /tmp
2018-03-15 §
16:56<zhuyifei1999_>granted elasticsearch credentials to tools.denkmalbot T185624
2018-03-14 §
20:57<bd808>Upgrading elasticsearch on tools-elastic-01 (T181531)
20:53<bd808>Upgrading elasticsearch on tools-elastic-02 (T181531)
20:51<bd808>Upgrading elasticsearch on tools-elastic-03 (T181531)
12:07<arturo>reboot tools-webgrid-lighttpd-1415, almost full /tmp