Data Center

Things I learned as a Sysadmin......

by Jon.Hudson on ‎06-11-2012 02:53 AM (1,360 Views)

grab_the_yellow_one.jpg

Things I learned as a Sysadmin


1.) Some Quantum effects exist at the macro level. You can NOT touch
something in a data center without impacting it.


2.) Losing lights when you have floor panels pulled is REALLY dangerous.


3.) A system that halts when you remove a panel is never documented in large
enough font or in enough places.


4.) Serial devices at boot will sometimes send a "reset" signal. PUT THEM ON
UPS.


5.) "We'll do it right later" never ever happens.


6.) Somewhere right now, someone is hard coding an ipaddress into some code
or script.


7.) A port that is listening, can be told things.


8.) If you always do things the way everyone else does it, you don't
learn.


9.) If you do things in ways no one does it you will find things that QA did
not. (see #8)


10.) Things that spin for a long time, may not be as healthy as you think.
Stop them, and they may never spin again.


11.) Never refer to a process that your customer will hear about as "shocking
the drive" (see #10)


12.) Any device you can't get console on remotely will fail between 2am and
8am on weekends and holidays.


13.) Applying redo logs can take longer than restoring the drive from
tape.


14.) Anyone that says "nothing changed" is a damn liar.


15.) Developers should never ever under any circumstances have root
access.


16.) Sudo is your friend. (see #15)


17.) Bandwidth, Storage and Closets are somehow cosmically connected. If
there is room, it will be used.


18.) The most universal thing in all datacenters in every corner of Earth is
"Keep it Simple Stupid"


19.) NO one puts those little plastic stoppers back into Fibre
cables.


20.) A fibre cable placed "just for a second" on top of a server that does
not have those little plastic stoppers inserted is actually a very small vacuum
cleaner. (see #19)


21.) Once a system has been running for a while, there is really no such
thing as sudden death. Something somewhere complained, perhaps even screamed
before death, you just weren't listening.


22.) TTLs are very very important. So is DNS cache. You will think about
about this if you ever move datacenters and don't remember what is
important.


23.) When test that an application is back up after a maintenance, a positive
test from within your network does NOT mean the Users can really get to
it.


24.) When you are oncall, the best way to ensure everything goes smoothly is
to be totally bored. Because if you start having fun, something will crash.


25.) The person who volunteers to write the Post Mortem Report has a
statistically proven lower probability of being at
fault.


26.) If any of these commands are foreign to you, you are working too hard:
mtr, lsof, nmap, telnet ipadd:port, awk, xargs, fink, %s/string1/string2/g
inside Vi, strings.(obviously there are others, which are available for the
right price ;-)


27.) Just because you have the tape, does not mean their data is still
good.


28.) If as much energy spent figuring out who's fault it is was instead spent
on documentation we wouldn't need to figure out who's fault something was as
often.


29.) Incorrect documentation is much more dangerous than no
documentation.


30.) Anything that can go wrong, eventually will go wrong. It's a question of
When not If.


31.) Owning a product of your top vendors competitor (even if you never plug
it in) magically lowers prices of your top vendor.


32.) UNIX Rules.


33.) Caffeine helps code compile.


34.) Don't put compilers on boxes with external ports open.


35.) "Temporary" cabling techniques have a tendency to become permanent.


36.) When in a co-lo, BTUs/sq.ft matters more than any super powers you can
fit in 1U.


37.) If your storage rep comes to regular meetings, you are buying too much
storage.


38.) Once an exploit is published, someone has already been enjoying it for
quite a while.


39.) It really is impossible to have too many monitors.


40.) No matter how sure the Users are, The Internet is never down.


41.) Just because you can, does not mean you should. (See #18)


42.) Never hire anyone who doesn't know they answer to the Ultimate
Question.