[Wiki Loves Monuments] [Labs-l] Reboot of virt11 Friday Sept 6 at 20:00 UTC

Thu Sep 5 22:01:16 UTC 2013

On Fri, Sep 6, 2013 at 5:46 AM, Maarten Dammers <maarten at mdammers.nl> wrote:

> Hi Ryan,
>
> Op 4-9-2013 23:38, Ryan Lane schreef:
>
>  During wikimania I was cleaning up some base images that were eating up a
>> large amount of disk space and caused an issue on virt11 that requires a
>> reboot. This will cause a reboot of about 45 instances. Here's a list of
>> the instances that will be affected:
>>
>> <https://wikitech.wikimedia.**org/w/index.php?title=Special:**
>> Ask&q=[[Resource+Type%3A%**3Ainstance]][[Instance+Host%**
>> 3A%3Avirt11]]&p=format%**3Dbroadtable%2Flink%3Dall%**2Fheaders%3Dshow%**
>> 2Fsearchlabel%3Dinstances%**2Fclass%3Dsortable-**
>> 20wikitable-20smwtable&po=%**3FInstance+Name%0A%3FInstance+**
>> Type%0A%3FProject%0A%3FImage+**Id%0A%3FFQDN%0A%3FLaunch+Time%**
>> 0A%3FPuppet+Class%0A%**3FModification+date%0A%**
>> 3FInstance+Host%0A%3FNumber+**of+CPUs%0A%3FRAM+Size%0A%**
>> 3FAmount+of+Storage%0A&limit=**100&eq=no<https://wikitech.wikimedia.org/w/index.php?title=Special:Ask&q=[[Resource+Type%3A%3Ainstance]][[Instance+Host%3A%3Avirt11]]&p=format%3Dbroadtable%2Flink%3Dall%2Fheaders%3Dshow%2Fsearchlabel%3Dinstances%2Fclass%3Dsortable-20wikitable-20smwtable&po=%3FInstance+Name%0A%3FInstance+Type%0A%3FProject%0A%3FImage+Id%0A%3FFQDN%0A%3FLaunch+Time%0A%3FPuppet+Class%0A%3FModification+date%0A%3FInstance+Host%0A%3FNumber+of+CPUs%0A%3FRAM+Size%0A%3FAmount+of+Storage%0A&limit=100&eq=no>
>> >
>>
> How long will he downtime be and can you please announce earlier? A week
> is a normal notice time.
> The Wiki Loves Monuments tools and applications (like the mobile app) rely
> on this so please keep it as short as possible.
>
>
The reboot will take about 10 minutes.

That said, relying on labs for something like this is legitimately insane.
Have you talked with Wikimedia Foundation about getting production level
support for WLM? That's what you actually need.

What will you do if the node hosting your instance completely dies? Is your
work puppetized? Can you just bring up a new instance to replace it? Are
you doing backups?

Outside of tools (and deployment-prep, which is rather ephemeral) we don't
consider any project "semi-production" and the failure model is meant to be
handled at the instance level. The underlying infrastructure will just fail
and will not recover for you. You have to assume that your instances can
simply disappear at any moment (this is the traditional cloud computing
model, btw).

- Ryan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.wikimedia.org/pipermail/wikilovesmonuments/attachments/20130906/e96be1f8/attachment.html>