Page overview
AWS builds its own hardware at labs including this one in Austin, Texas.
You probably know AWS as a company that offers cloud computing services, not one that builds its own hardware. In fact, AWS has its own family of custom chips and accelerators, with each new generation building on—and improving on—what came before. All the chips are designed and built by the Annapurna Labs team. Employees are located in multiple locations around the world, including Tel Aviv, Israel; Toronto, Canada; and this lab in Austin, Texas.
A custom silicon “system on a chip” (SoC) for machine-learning acceleration, built by the Annapurna Labs team, is about to be placed on a tester. “Our guiding principles are to offer customers more choice, lower cost, and higher performance,” said Rami Sinno, director of silicon engineering. “By integrating all of our silicon development in-house, and not relying on third parties, we can deliver silicon products [on] an accelerated timeline.”
Sinno lifts the lid on an array of accelerators. These ones are entirely custom designed by Annapurna Labs. Senior manager, silicon, Eyal Freund and his team develop the machine-learning SoC, while senior manager, hardware, Eran Jurman and his team develop cards all the way up to the complete server.
Karim Syed, system validation manager, troubleshoots an issue in a mini data center, where the team can test and trial new equipment or processes before rolling them out for real. “Things that might sound mundane—like the possibility of components coming loose in a server in transit—become critical issues when you’re dealing with such complex hardware,” said Tony Hagale, senior systems manager. “Maintenance becomes its own competency.”
Senior principal engineer Ali Saidi is technical lead for AWS’s Graviton range of processors. Graviton3—the latest iteration—offers 25% higher compute performance than its predecessors. Graviton3-based virtual servers running in the Amazon Elastic Compute Cloud (EC2) use up to 60% less energy for the same performance than comparable EC2 instances. That helps customers reduce their carbon footprint, while contributing to Amazon’s broader efforts to reduce carbon emissions as part of The Climate Pledge.
Saidi and team continuously innovate across the design, manufacturing, and packaging of the Graviton range. With Graviton3, they placed seven die (tiny pieces of custom silicon) and around 55 billion transistors into one central processing unit (CPU) to deliver another leap in performance for EC2 customers.
One of senior test product engineer Brendan Tully’s responsibilities is testing “wafers”—thin slices of semiconducting material used for the production of integrated circuits (in this case, Annapurna Labs’ custom-designed silicon).
Tully and Syed develop tests to make sure chips can perform under a variety of conditions. “There’s no ‘partial’ testing or sampling,” said Syed. “At the chip level, card level, and server level, nothing goes out the door without being tested.”
Validation engineer Sarah Nasser brings up a test card. Nasser is part of the software integration team, which is responsible for making sure the card and server function according to specification. “We test all the software components necessary to make a server available to AWS customers in the cloud,” she said.
A software bug can be easily fixed, but a hardware problem often means the team needs to start over again—and cycles can be long. “People would be surprised at how much interplay there is between hardware and software,” said Hagale. “People don’t think about how much goes in to get it all to work.”