Thursday, April 20, 2017

Hidden Figures: Lessons in Race, Gender, and the Importance of Funding Science

When I finished reading Rise of the Rocket Girls, I had to stop in my local book shop to grab Hidden Figures. While Rocket Girls details the women who worked as human computers at the Jet Propulsion Lab on the west coast, Hidden Figures focuses on a group of African American women working as human computers on the east coast at the Langley Research Center, first as part of the National Advisory Committee for Aeronautics and later part of NASA.

Hidden Figures is a bit denser in terms of the history and a bit less focused on the narratives of the individual women at the heart of the story. The author Margot Lee Shetterly has beautifully described how she came to tell this story, which is deeply connected to her life in Virginia, where her father worked for NASA's Langely Research Center and where all the family friends were connected to NASA in some way. As a result, the author has a very unique perspective on the lives of the young women who broke the boundaries at the NACA/NASA. Based on her initial research and interviews, she started The Human Computer Project, which aimed to document all the people who worked in this capacity at NACA/NASA. As she did this, she started to find the story that would become the book Hidden Figures. This careful research and detail is evident in every page of the book. In the epilogue, the author describes how hard it was to choose what stories to tell if you're finally bringing people's contributions to the space race to light, it can be hard to leave anything out!

As a result, the book is structured around the historical events that the women are connected with and driving forward. Interestingly, the movie complements the story perfectly in this respect. I admit that, in an unusual turn, I decided to watch the film as I was reading the book; this really helped me put the stories together into a linear (albeit much too simple) narrative. Don't get me wrong, the film was not without fault, oversimplification of the story was not as glaring as the white savior problem. (History vs. Hollywood has a good summary of the things the film got right and wrong.)

Because Hidden Figures has achieved such strong critical acclaim, there are plenty of excellent reviews of the content (e.g., The New York Times and The Atlantic), so I decided to focus instead on why it's so important that this story has been brought to light in 2017:
- We need to remember the realities of segregation.
- We need to remember the limited opportunities available for women in mid-century America
- We need to remember what can be achieved when we invest in science and technology.

Remembering the realities of segregation
Here is where I need to ask forgiveness for my ignorance of history. I have been studying science for many years, and this focus has meant that I have missed a lot of other important things, with history probably being at the top of the list. In my defense, I bet that lots of people, especially white folks or those educated in the south, were never taught about the realities of segregation. This is the first reason I think this book seems particularly relevant right now: we are at a time in our history when we seem to be moving backwards in terms of civil rights and we are at the risk of wiping out the truth of these events that happened only 60 years ago. Just as some deny the moon landing or the roundness of Earth, some want to erase this history, which they paint as being exaggerated (e.g., Betsey DeVos' suggestion that HBCUs were pioneers in school choice and the revision of Texas state history textbooks to down play Jim Crow.)

Hidden Figures also reminds us that even after the obvious signs of segregation were removed, the less obvious and more pernicious elements of segregation persisted. For example, the book details how computer and engineer Mary Jackson's son was the "first colored boy to win" the local All American Soap Box Derby. While there was no explicit rule that black boys couldn't enter the race, it just wasn't something that would register to people in the black community. Even if someone did hear about such an event, there were still barriers to participation, which Shatterly's description perfectly:
"The electrified fence of segregation and the centuries of shocks it delivered so effectively circumscribed the lives of American blacks that even after the current was turned off, the idea of climbing over the fence inspired dread."
Unfortunately, people of color continue to be under-represented in STEM fields. Data from the National Science Foundation states that minorities (of any gender) make up only 10% of the STEM work force, despite being about a quarter of the population (statistics from 2015). A 2015 survey of women of color in STEM reports on the "Double Jeopardy" that these women face in their jobs. Women reported being mistaken for janitors and that they were likely to have to "prove themselves again and again" (covered here).  

Katherine Johnson (Taraji Henson) at work in Hidden Figures

Remembering the limited options for women in mid-century America
The story of the female computers at Langley Research Center also highlights the contribution of women in STEM before the age of women's liberation. Before that time,
women were generally limited to jobs as nurses, teachers, or secretaries. Even the jobs that women filled during the war were chosen for their simple and repetitive nature. This is why so many women joined computing groupsthese were considered to be menial and repetitive tasks that required an attention for detail. Of course, even when women got the jobs that were suitable for their skills, they found their employers willing to say goodbye as soon as they got married or pregnant. (Remember that employers in the US were not required to give family leave until 1993.)

Even today, women are still not equally represented in STEM fields. Perhaps most troubling is the "leaky pipeline", where women are getting an equal number of advanced degrees as men, but are not securing high level jobs. There are a number of causes for these inequities (e.g., the confidence gap), but few solutions are clear. 

Implicit bias is another problem. The Implicit Bias test can help uncover your blind spots. As a woman of science, I was surprised to learn that I had some bias in associating men with science, particularly engineering and physics, where women are still in the lowest numbers. Interestingly, the author of Hidden Figures always thought scientists and engineers were black and middle class because that is what she grew up around. Hidden Figures reflects Shetterly's upbringing by portraying many positive black women in STEM. Recent data shows that girls as young as 6 start self-selecting out of STEM fields and already associate boys with inherent smartness and girls with hard work (coverage in the LA Times). Thus, it seems we need to start raising awareness of the prevalence of women in science and engineering. Some suggest that seeing more women and people of color in roles like this can encourage young women to chose STEM careers. 

It is frustrating to consider how many stories like Hidden Figures and Rise of the Rocket Girls still need to be uncovered. I recently read a post from Hilda Bastian, who has started a crowd-sourced project to get pictures of female scientists for their Wikipedia pages. She explains her rationale: "Pictures are one of the main drivers for whose stories get told and shared. So expanding the pool of women we can 'see' matters." Going forward, I plan to contribute to this project and look for ways to highlight the accomplishments of women in my daily work.

Remembering what we can achieve when we fund science and technology Finally, Hidden Figures highlights a time when America invested significant financial resources into science and engineering. The "space race" was clearly driven by a competitive and antagonistic spirit rather than an adventurous and scientific one; our entire purpose for going into space was to beat the Russians. (Ah, mid-century America, when our relationship with Russia was less complex)

Of course, the decision to put "Whitey on the Moon" rather than focus on the problems at home received criticism, especially when the competition for resources was an ideological war thousands of miles away (link checks out!) It's always been difficult for scientists to make the case for scientific funding. Our elected representatives might not understand why a project on the salamander axolotl could have benefits for regenerative medicine in humans or why a study on fruit flies could have an impact on cancer or how understanding the way an animal poisons its prey could help us develop pain killers, but now more than ever we need to invest in science

The Apollo missions gave us many important, indirect benefits: cell phone cameras and baby formula;, improved athletic footwear and solar panels.; and memory foam and precision GPS. Reminding people of what great things can be accomplished when we adequately fund science is an important part of doing science these days. Funding for NASA and other scientific organizations in the government has an excellent return on investment, which I plan to detail further in my next post about the upcoming Science March
Every day in this post-truth world, I really try to stay positive about the future. For me, Hidden Figures helped. I think the story highlights a time that is considered to be the height of the American empire (read: when America was Great), when these young, gifted, and black women put pencil to paper to propel us to the moon and beyond. I plan to watch Hidden Figures with my son and remind him of these important lessons. I recommend reading or watching Hidden Figures so you can benefit from Shetterly's perspective and knowledge as well.

Wednesday, March 15, 2017

Your next generation data storage solution: DNA

I recently stumbled upon the image on the right, which distills the changes in data storage in the past 40-plus years. Perhaps even more amazing to consider is that the future of data storage could become even smaller. The genetic material that stores all the information required to build a person or a pear or a penguin may be the key to creating even smaller data storage that never reaches obsolescence.

Our genome is often compared to a computer, where DNA is the code. In fact, DNA is a proven data storage system with billions of years of reliable use. While your old floppy disks may now be unreadable, the tools required to read and copy DNA are present in every genome, making it unlikely that we would lose the ability to decode DNA. These advantages led scientists to ask: could DNA also be used to store other types of data? Perhaps the information that would normally be encoded by 0's and 1's in your hard drive could be stored in sequences based on ACGT's.

The first publication to propose that DNA could be used for purposes other than building an organism comes in 1999 from Bancroft and colleagues in the journal Science. They suggest that genomic steganography could be a method for storing coded messages in DNA for use in espionage. Using a simple substitution cipher where each codon equals an alphanumeric value, the researchers synthesized a DNA sequence to encode the message "June 6 invasion: Normandy". The message was flanked by sequences to allow the recipient to decode the message. The final sequence of just 109 nucleotides of DNA was hidden within denatured human DNA and, just like the predecessor microdots used in espionage, embedded on top of a period in a typewritten message. 
Subsequent work from Bancroft's group and others in the early 00's suggested that DNA could help to address the need for increasing data storage. Computer scientists estimate that by 2020, there will be 4.4 x 1019 bytes (44 zettabytes) of digital data; to give you a sense of scale, 1 ZB would be about 152 million years of high resolution videoEven with the advances in storage potential, storing just 1 ZB requires more than 1000 kilograms of the cobalt alloy used to make hard drives. In contrast, 1 gram of DNA could store 4.6 x 1018 bytes. 

Early publications were proof of principal experiments that aimed to generate increasingly bigger data files encoded in DNA. The general approach, outlined above, convers a digital file to binary and to DNA. The beginnings were admittedly small, just as scientists had to sequence the genome of E. coli before they could complete the human genome. One problem is that DNA sequencing technology is improving at much faster rates than DNA synthesis techniques. Essentially, you could read the data you stored faster and cheaper than you could write it. Creating long accurate strands of DNA had technical and financial limitations. To circumvent this problem, George Church's lab used multiple copies of short DNA sequences to encode an entire book (53,246 words), 11 JPG images, and a JavaScript program. The paper, published in Science in 2012, also describes the recovery and reassembly process. The following year, a Nature paper from Ewan Birney's lab at the European Bioinformatics Institute reported a similar approach that increased the file size and decreased decoding errors. The final DNA file consisted of 739 KB of information, including text, pictures, videos, and audio files; they also added a PDF of the classic Nature paper from Watson and Crick describing the structure of DNA.   

In July 2016, researchers from the University of Washington collaborated with Microsoft to push the limits of DNA storage again (coverage in The Verge). Their storage reached 200 MB and included copies of the Universal Declaration of Human Rights, the top 100 books from Project Gutenberg, and the Crop Trust seed database; for fun, they encoded a video from the band OK Go for the song "This Too Shall Pass".

Most recently, a paper in Science from Yaniv Erlich and Dina Zielinski, who are working at the intersection of molecular biology and computer science, details a new storage architecture for more efficient DNA storage. They adapted fountain coding, which is currently used by streaming services like Netflix and Spotify to eliminate gaps in playback. The method greatly improved the storage density, getting closer to the theoretical limit for DNA storage (1.83 bits per nucleotide). Their DNA storage sample included the movie The Arrival of a Train, an entire computer operating system, a computer virus, and a Amazon gift card (which was quickly decoded by one of the researchers' Twitter followers). While the size of the data was smaller than previous attempts (only 2.2 MB), the method greatly improved data density and readability. One problem with previous storage methods is that reading the DNA leads to loss of the original sample. While it is easy to amplify DNA, it can sometimes introduce mistakes. Erlich and Zielinski's fountain technique permitted error-free amplification even after 10 complete reads. Their work achieved a density of 2.15 x 1018 bytes, which would allow storage of all the world's data in the trunk of a car.
Another stumbling block was that DNA was writable, but not re-writable, which limit the applications to archival data storage. Two recent papers (in Nature Communications and PNAS) report on a method that allows re-writing of DNA (bringing us from 8 track to cassette tapes) as well as reading from any point in the sample, rather than from a set starting spot (bringing us from cassette to CD).
 1 gram of DNA can store 4.5 x  1018 byte

While there has been
tremendous progress in increasing the amount and density of data storage, the major roadblock continues to be the amo
unt of time it takes to encode and decode data in DNA. Another place where inorganic data storage beat carbon-based products is in the cost, especially of synthesizing DNA. In the most recent paper, the cost was $3,500/MB, while the 2012 paper $12,400/MB.  

Despite these limitations, biologists are teaming up with computer scientists to explore the future of DNA data storage. This is largely driven by the need to store increasing amounts of digital data with decreasing resources. Estimates indicate that by 2040 global memory demand (3 x 1024 bytes) will exceed the supply of silicon necessary to build traditional data storage devices.

Obsolescence is another shortcoming of current storage methods. Just as it has become difficult to play your cassette tape collection (much to my chagrin), your old floppies and ZIP disks are not readable either. Scientists conjecture that because DNA is the basis for life on Earth, we will always have methods for DNA sequencing. This gives DNA a huge advantage for long-term archival storage. Luckily, DNA also has great fidelity over the long term. Scientists are increasingly able to recover readable sequences from ancient samples of DNA with the best results coming from samples stored at low temperature. Thus, you could imagine a long-term storage system, like a secure server in a remote tundra, where the DNA back up disk to re-start civilization would be stable and safe. 

This isn't completely crazy. The Svalbard Global Seed Vault is a huge storage site in the frozen tundra of Norway where scientists and governments are making contributions of plant seeds. The idea is to keep a stock of the original seed in case of the collapse of civilization. I am sure we could rent a shoe box-sized space there for storing all the relevant files from humankind (that means there probably won't be room for cat videos). It is certain that resource limitations will continue to make digital DNA storage, borne of a thought experiment over beer, not just a reality but a necessity.

Scientific American, Tech Turns to Biology as Data Storage Needs Explode
George Church interview in Popular Science
Ed Yong covered the DNA fountain technique in The Atlantic