Keynote: “Frying Squirrels & Unspun Gyros” – (Yahoo)
All about HA planning
29% of DataCenter outages caused by UPS failures - point: some defenses create their own failure modes
If you solve for Datacenter-sized outages, the ‘little stuff’ is mostly taken care of
Keynote: Performance of Web vs Apps
Apps are trending to have more impact/viewship/etc than the web
Fragmentism is a scary future with Apps
“Pushing Pixels” (i.e. highly graphical apps) seems to be best served by Apps right now
Responsiveness is very compelling when considering your choice
App lifecycle (approval, etc) is MUCH SLOWER than Web
“Conditional Tier Rendering” – use Node.js to choose client/server rendering
In many scenarios, Web seems to have the advantage right now
Keynote: [Political|Commercial] [Threats|Opportunities] for Faster & Stronger Web
(Union Square Ventures)
Ops as viewed by Investors: “Speed is a feature”. Invest in it.
Why has web been so successful:
- Supportive Legislation
- Decentralized Internet
- Open Standards
Now (+ Possible Future)
- Restrictive Legislation
- Walled Gardens
- Controlled Internet
Lame claim by ISPs: “We haven’t invested enough in infrastructure, so therefore we should be able to control what crosses it.” >:-|
App: No “View Source”, No Linking
Historically Recognized Patterns in Technological Adoption: ~20 years in there’s a temporary crash while the world adapts to a structure of Networks over the older Hierarchies
Combat the threats to the internet:
- Engage (politically)
- Build!
[? slashawesome.net ?]
Keynote: Lightning Talks
Intel – Agile Application Performance Management
SNOOZE!!
Akamai – Akamai Internet Observatory
Basically, they are opening up some subsets of their data to be analyzed by the ‘open data’ community.
Example: in 60 seconds, gather 1.3 BILLION log lines
Their data should give a view of 20-30% of world internet usage
http://www.akamai.io
Compuware – UX Management
SNOOZE
Citrix – Blah blah Network Utility pitch…
SNOOZE
GoDaddy – Blah blah
Unfortunately, it seems that misogyny is a point of company pride for them.
Lightning Demos
YSlow
- now has a cmd-line version
- now open-source
- new URL: yslow.org
integrates with PhantomJS; this can then integrate with Continuous Integration tools!!!
Chrome
about://tracing
you can see low-level chrome processing info
(can help you speed up a site)
“CrRenderMain” is the proc to keep an eye on
FYI, ~16ms is how often a redraw should happen for current refresh-rates
to see examples: jankfree.com
BrowserScope
Pretty cool tool to see actual API adherence from many browsers
Changing Culture and Being a Force for Awesome… – (Jesse Robbins)
“The right culture is a requirement for survival and success at web scale.”
Bad news: changing culture takes time
Jesse’s Rule: Don’t Fight Stupid, Make More Awesome
Changing Culture:
- start small, build trust & safety
- create champions
- use metrics to build confidence
- celebrate successes
- exploit compelling events
Hack for Change
- Starting Small
- small is safe
- call it an experiment
- Creating Champions
- get exec sponsors, start @ your boss
- give *everyone else* the credit
- give ‘special status’ (e.g. pins/shirts/hats/etc)
Metrics
- find KPI that supports change
- track & use that KPI ruthlessly
- first, use it to show value
- later, use it to show cost of lagging
- tell a story with your data, not just a number
Celebrate Success
- tell a powerful story
- always be positive about people and talk about how they
- overcame a problem
- NEVER about people who created a problem
- leave room for people to come to your side
- Exploit Compelling Events
just wait, it will happen
no “I told you so”, just “what do we do now?”
re: Permission - ”No” frequently means “I don’t know how to say Yes to you“: Use creativity to hack whatever the barriers are
Wisdom of the Crowd: Real User Measurement
- “RUM”
- Look into using BOTH of these: “window.performance” and “window.onerror”
- Google Analytics has been doing for ~1 year, you may already have the data!
Leveling Up – Take Your [Ops|Eng] Role to the Next Level
Work: “It’s not just *what* you do, it’s also *how* you do it.”
Decent talk, but I didn’t take many notes. Hopefully the slides will be put up later, but this blog post from the speaker covers a lot of the same ground: http://katemats.com/2012/05/29/rocking-your-job-leveling-up/
Logging as Event Streams – (Rackspace)
Logging meets the cloud
- many multi-tenant services
- many servers across many teams
Logging is an ‘Event Emitter’ (as realized when working with Node.js)
Structured Logging
- since 1995
- many producers, many consumers
- many programming languages
JSON
pretty much all languages can use it
can set up as newline terminated
*grep* works on it
can do hierarchical data (recommended: only go 1 level deep)
Message Tags / IDs
- Easy to search for strings (I.e. “AH02571″)
- Important for internal usage, and especially for Open Source
Ref: Zipkin: Uses 64bit integer as ID
Shipping Logs
- ALWAYS write to local disk first (ALWAYS!)
- svlogd (?)
- Scribe – Abandoned by Facebook
- Flume – Hadoop focused
- syslog – Solid contender, very robust
Chose Scribe
- does bulk log mv, can buffer / retry / etc
Graylog2
- opensource
- many imports
- indexes in to ElasticSearch
- use regex to build pre-fab notification sets
Log Data: You don’t need to keep it forever
Scalable System Ops – (Tumblr)
[Note: Lots of great info in this talk, but a lot was out of my experience. (Provisioning servers, etc.) Not gonna recreate it
here, but suggest anyone interested in provisioning to look for this talk.]
“Collins” – tool to help w/provisioning
Use a lot of Unix principles when building tools
[Note: By the afternoon I was super tired and worn out, so my notes got very terse, to the point that I will only sum up what I can about the talk.]
Down With the Fancy Pants
“Complex systems fail in complex ways.”
“Premature optimization is the root of all evil.”
Basically, this talk was a caution to critically look at your stack to see if you can reduce weaknesses by removing unproven technologies.
Solving IT Issues with Differential Diagnosis
The speaker talked about his success in adopting learnings from the medical community to apply Differential Diagnosis and the Hypothetico-
Deductive Model to IT problems.
Basically, he encourages us to use more scientific rigor when trying to find the root causes of system malfunctions.
Choose Your Own Adventure
This was a very fun talk, appropriately placed at the end of the conference.
Adam brought a long experience in Ops/SysAdmin/Dev to speak about a range of topics chosen by audience members yelling out their
preference from a list of choices.
Challenges to Cultural Change
These are slides to a talk that I didn’t attend, but this echo’s a bunch of good advice from Jesse Robbins’ Keynote presentation. I’m including it because it was well recieved: http://www.slideshare.net/lnxchk/challenges-to-cultural-change
EDIT: 28 June 2300h – light editing for formatting