IBM leaps two hurdles for next-gen memory

DcData AdminPublic

IBM has solved two related problems with phase-change memory and now says the fast next-generation data-storage technology will be ready for use in 2016 in servers.

In a paper for the IEEE International Memory Workshop, Big Blue researchers describe how they squeezed two bits of data into each phase-change memory cell rather than just one. Though that’s not the first incarnation of this idea, called multilevel storage, the researchers said they’ve made it practical by sidestepping a problem called “drift” that otherwise causes data errors the longer data is stored.

IBM's prototype multilevel cell phase-change memory (MLC PCM) chipIBM’s prototype multilevel cell phase-change memory (MLC PCM) chip

(Credit: IBM)

The engineering advancements help overcome significant barriers in introducing a technology that holds the potential to significantly transform computer designs. Phase-change memory (PCM), could snuggle up alongside conventional dynamic random access memory (DRAM) to improve computer performance in ways that flash memory so far can’t. It’s not as fast as DRAM, but IBM says it’s 100 times faster at reading and writing data than flash memory, its chief competitor today.

IBM’s PCM technology isn’t yet ready for real-world use, but the improvements in multilevel storage and drift tolerance means the technology should be competitive in 2016 for the server applications IBM has in mind, said Haris Pozidis, one of the IBM Research paper authors.

“Our main application, being in the server business, is enterprise storage and memory applications,” Pozidis said. “In the consumer market, the most important attribute is cost per bit. In enterprise applications, the most important attributes are speed, because [PCM will be] sitting close to the main memory where there are lots of transactions per second, and the endurance of device. We must make sure the device can write and read many numbers of times.”

A slow industry change
IBM isn’t the only one working on PCM–others include memory manufacturing leaders such as Hynix, Samsung, and Micron. Intel, which stopped making memory decades ago, is researching PCM. And many academic researchers are tackling engineering challenges, too. Two recent examples: a Stanford group is working on technology using carbon nanotubes to make PCM cells more compact, and researchers at the University of California-San Diego have built a 10GB PCM storage drive prototype called Onyx (PDF).

IBM doesn’t intend to manufacture phase-change memory chips, but instead plans to license its technology to other makers, Pozidis said.

PCM has been a very long time coming. None other than Intel co-founder Gordon Moore wrote about the phase-change memory idea in a 1970 paper. Intel has used the term ovonics to describe the technology, but there are others include PRAM, PCRAM, and chalcogenide RAM–the latter named after the special material at the heart of phase-change memory.

Server-grade PCM could arrive in 2016, but other markets with different requirements are moving faster. For example, Samsung sells PCM chips for use in mobile phones as a replacement for the “NOR” type of flash memory.

But servers–the powerful networked computers that host Web sites, exchange e-mail, and conduct financial transactions–are a huge market ripe for transition. Flash memory has made some inroads into the server market in the form of solid-state disks (SSDs) that offer significant performance increases over hard drives. But in addition to being expensive, flash SSDs essentially wear out as data is read and written over and over.

Flash degrades at about 30,000 write cycles for business-grade storage products and 3,000 write cycles for consumer-grade flash, IBM said. Flash memory controllers sidestep this problem by moving data to fresh flash memory cells, but performance drops over time. In comparison, PCM can endure at least 10 million write cycles, IBM said.

Pozidis doesn’t expect PCM to replace DRAM, which can read and write data much more quickly. But it could boost DRAM performance by caching data for fast access when it’s needed again.

How’s it work?
Phase-change memory has a simple basic design for recording data: heat changes the electrical properties of a tiny patch of the glasslike chalcogenide material.

At left, a schematic showing how phase-change memory (PCM) cells can be addressed, and at right, a close-up view of the phase-change element (PCE) and its contacts with electrodes.At left, a schematic showing how phase-change memory (PCM) cells can be addressed, and at right, a close-up view of the phase-change element (PCE) and its contacts with electrodes.

(Credit: IBM)

When cooled quickly, the material’s molecules stay in the jumbled, amorphous state they’re in when the material is hot. When cooled relatively slowly, though, the molecules align into a crystalline lattice that happens to transmit electricity much better. By measuring this electrical resistance, a device can figure out what number the cell is storing, and by heating it up and cooling it in a controlled way, new data can be written.

With the multilayer approach, the cell is cooled at intermediate rates so that four different states between crystalline and amorphous can be used. With four states, two bits of data can be stored in each cell–00, 01, 10, and 11 in binary terms–doubling the density of a memory chip and reducing the cost to store a given amount of data.

That’s not enough, though. Today’s flash memory can use eight states, meaning that three bits of data can be stored per cell.

“PCM has to get there,” Pozidis said. “We believe we can get there.”

In fact, he thinks PCM could go even farther. “Potentially using different materials, I believe we can go to four bits per cell,” he said.

IBM demonstrated its multilevel cell technology on a chip with 256 million cells; by storing two bits per cell, its capacity is 512 megabits. The drift tolerant-technology was used on a smaller 2 megabit version, Pozidis said. Both were built with an older 90-nanometer manufacturing process that lets features be created that are as small as 90 billionths of a meter.

Key to making PCM compete on cost will be shrinking it to modern manufacturing processes; flash today is built with a 24nm process. Pozidis is confident.

“Phase change can scale to much lower dimensions,” he said.

One problem with PCM is that the electrical resistance level that records data drifts over time. This graph shows drift in two memory cells, each able to store data with four levels of resistance. One, storing level 3, resistance drifts upward faster than the average shown with the blue dotted line until it's actually greater than another cell storing level 2. That cell's resistance, shown with the pink line, drifted more slowly than the average level-2 cell.One problem with PCM is that the electrical resistance level that records data drifts over time. This graph shows drift in two memory cells, each able to store data with four levels of resistance. One, storing level 3, resistance drifts upward faster than the average shown with the blue dotted line until it’s actually greater than another cell storing level 2. That cell’s resistance, shown with the pink line, drifted more slowly than the average level-2 cell.

(Credit: IBM)

Catch the drift
The finer the distinctions between different levels of PCM resistance, though, the sooner the problem called resistance drift becomes. With drift, the electrical resistance of a particular cell changes over time, blurring the boundaries between different levels and risking data corruption. It’s hard to handle, because different cells drift at different rates.

IBM’s approach to the problem uses some cells to record what IBM calls a codeword rather than actual data. The approach, which IBM calls modulation coding, lets IBM rely on measuring relative properties of the cells, not the absolute electrical resistance itself.

“We’ve designed modular coding so it stores information not on the absolute resistance of levels, which we know will change, but on the relative ordering,” Pozidis said.

The upshot is an error rate that makes the technology practical.

“It is quite impressive that drift-tolerant coding exhibits a raw error rate around 10^?5 [one error in 100,000 memory cells] even after 37 days at room temperature,” the paper said. “Simple, low-redundancy error-correction codes could then be sufficient to bring the overall error rate down to levels around 10^-15 [one error in 1,000,000,000,000,000 cells] or less, which are required for practical memory devices.”

Even within the stilted language of academic papers, the excitement comes through.