ESPA – Storing Data on Web3

– So Shoah is the Hebrew
word for the Holocaust. We are an organization that
was founded by Steven Spielberg after the movie Schindler's
List which was in 1993 and then in 1994 we began
collecting interviews around the world. We actually opened up in 56 countries ended up collecting 52,000
interviews from 1994 through 2000 recording as many Holocaust
survivor witnesses as possible. And that was that's the data
is 25 Petabytes uncompressed but compressed it's six
Petabytes of content. We collected to 44
languages and then in 94 the first web was coming
out, and so our goal was to make every minute
of video searchable like it was a webpage and
I can show you a little bit about what that looks like by keywords, people's names, latitude and
longitude images, et cetera.

And then in 2006 as we became
more and more about education and following our mission
to develop empathy, understanding and respect
through testimony, we're in half the high
schools in the United States with our tolerance education programs we're all over the world with access and we were gifted to the
University of Southern California. And USC was super
interested in bringing in very large digital collections
that were of interest to the faculty and community there. And one of the things they
did when they brought us in was they expanded us to looking
not just at the Holocaust but other genocides and we're
on our 14th mass atrocity as we know the heat is sort
of growing in our world and so there's no limit to
the number of collections of survivors who want
to tell their stories that we're able to get and the
lessons to learn from them.

So going to the next slide here let me just talk a little bit
about the life of a testimony. And again, we are a use case
for loading up the Filecoin. I'll tell you about where
we are on that in a moment but happens is the tapes will come in and we actually not
only do new interviews, we will go to museums and other folks and deal with interviews. They have to bring them
in to get 'em cataloged and preserved and the
reason we do that is because everything rots the conservative numbers are you get 50 years
for film before you see age based route, 20 years on videotape, five years on hard
drive, three years on LTO two years optical, the newer the tech unfortunately the faster
it's been rotting. And so that's been an issue
for many organizations.

So we help organizations
get their older tapes and other media that they
may have interviews digitized and preserved and cataloged
for use at schools and other places, as well as
collecting new interviews. So we do the preparation, the digitization we do all kinds of Quality Control because video is a very
manual mechanical process and all kinds of different
things can occur.

We restore as needed, and then
we catalog and index again making every minute searchable
like it's its own webpage. We have a Visual History Archive as sort of our middleware that we do lots of different user interfaces
to our customer base which are teachers and educators,
scholars, and researchers around the world, communities of users, people who are second
generation, third generation from survivors of the
Armenian genocide, Holocaust and other genocide
collections that we have. And then we get into Digital Preservation which is making sure the bits stay around. Before I go into the preservation part which is where we are leveraging Filecoin. Let me just talk about the size again. So it's 116,000 hours searchable
by almost 70,000 keywords. We also have a very sizable image database of images that are shown during the video that allow you to search into
the video and names of people which helps a lot with
various genealogy efforts as well as survivors
getting them back together and helping people find family
that they may have lost.

So the top level concept
around preservation that we use is one called a Supertext. It was coined by Steve Sample who was the president of
USC a few presidents ago. And the basic concept is that we've had reading and
writing for 5,000 years. So we've have content
like Shakespeare, Cicero, the Quran, the Bible,
the Torah that have made through hundreds or thousands of years. And the question is,
what do you have to do with moving images to of
them the same opportunity to become a Supertext that
makes it through time? Especially since we've
only had moving images for a little over 140 years
and there are no image Supertext yet because they
just there haven't been images moving images around that long. So that's our, with all
of these testimonies we want to give them a chance
to make it through time with the lessons that they
teach and that is our challenge when it comes to preservation.

How do we give audio visual material a chance to become a Supertex? So we've developed a Bit-Level
Preservation Continuum where on one side of the
continuum we use the internet which has a lot of geographic distribution and uses a lot of electricity to store and preserve the actual content and we have a project that we've started with Stanford called the Starling Labs. And I'm sure you guys have heard some of those things about that lab. It's been a wonderful partnership for us and so that is using blockchain and Filecoin specifically
to attempt to store and preserve of testimony. Then we have things that we've
been doing for many, many many years where we have data centers filled with tape robots
and different places around the world, East
Coast, West Coast, Europe. We've also added cloud data
centers so Microsoft Azure, AWS synchronizing those
synchronizing the fixity checking the health of the video
files in those environments and updating them as different
pieces of content rot or get out of sync over time and that is our current
solution that we have in place.

We are on the research
level with Filecoin now and things are moving forward. And then on the other side of the medium we're looking at things
that actually could hold audio visual content for a very
long time, like DNA, stone, various forms of glass
and when you get there there's a couple different
things you worry about the reader that gets used
for storing these content long time like on glass or silica.

These see through media you can use a microscope to read the data. But with DNA or stone, you
have of different technology you need to be able to read the bits of information off of them and it's not clear how that
will make it through time although, since we are made of DNA we assume there will
be readers and writers of DNA moving into the future. And so that feels like a pretty safe bet but that is also in the research area. But basically our idea is to allow audio visual
material to be a Supertext. You diversify in each
of these different areas and across all of these
areas as much as possible to let the content make
its way through time. So from a data center point
of view we currently have for the show foundation
a six Petabyte database.

We actually USC have used the
show foundation architecture and the libraries to bring in all kinds of different collections that
don't necessarily have to do with genocide either and have turned it into a cloud archive
for academic collections. We have 65 Petabytes of data manage there the Shoa Foundations a piece of that. And then we do the bit level fixity meaning every six months at a minimum we're checking
each and every file making sure and that's nothing
compared to what you see on the Filecoin network, but
it's what we've done to date in the tape data centers. And then migrating off to brand new media at a minimum every three year keeping various copies around the world and then what we're adding is hopefully a soon a Filecoin copy
that can be synchronized with our Microsoft Azure copies and our data center copies to again continue to allow the
content to move through time in its entirety and unchanged.

So I briefly mentioned before starling which is a framework for preservation both cut us actually capturing, storing, and verifying the content
that we're working on with Stanford Engineering. And I want to do a special
shout out and thanks to PiKNiK who's been working with starling and as for onboarding the
Shoah Foundation data. We're hoping to have all of the Petabytes of the Shoah Foundation
content uploaded into Filecoin. End of May all the hard drives
have been shipped to PiKNiK who's the minor who's
doing the uploading for us. So thank you. Thank you to Protocol Labs
and the Filecoin foundation for supporting this initial foray of ours into blockchain and putting
our content up there. We do have the content
indexed down by the minute I thought I would quickly show if I could hear just a quick example.

So you can go in this is a simple search
engine that we have, you can go in and you can either
search by experience groups which are various genocides, keywords we have those
66,000 people or places but I can just start typing in words like food hiding Auschwitz and this is telling me
there's 5,000 interviews that mention those three concepts. I'm then able to go in and I'm able to see and browse through those 5,000 interviews, and if I open up one of these interviews what you'll see are the minutes of video where those topics are talked about like the index at a back of a book. And what I can do is let's say that we queried on food hiding in Auschwitz, I might be interested
in identity concealment.

I would click here. It would jump to the 23rd
segment of this interview and you'd be able to start watching and seeing from the keywords there. And I can now move around this
interview minute by minute and jump around to the various topics. Again, our goal was to have
every minute searchable like it was a webpage (indistinct) There are various ways to
now move through the content you can move through by
all the people mentioned to every minute in the interview you can move through
by the indexing terms. Again, this is like an index
of the back of a book here and you can move minute by topic, you can go to a transcript
and you can search on words that you were interested in and it'll tell you what
timeframe they are in the video and it'll jump to that
moment If you would like, you can also move around by images if there's any pictures
or images in the archive you can go there and click on the image.

And it'll jump to the moment in the video where that image is shown. – He is Mark's son and
he is eight months old. – So we do this for again,
not just Holocaust survivors but on the left here you'll see all of the various genocides that
we've been collecting content and making the content available for. I'm going to switch back now. This is the cataloging. And so then we have a number of different ways of
accessing the material. IWitness are grades four through 12 access learning management system with lots of activities,
hundreds built by us and thousands built by
teachers to be able to move through the content and learn from it. We have for researchers,
the Visual History Archive, we have our website,
which is a general tool for all the communities at large, we have some publicly available content through YouTube and other places and then we've been
doing different kinds of testimonies where we've been
adding an interactive nature basically combining what you get it from an Alexa with a testimony so you can have a conversation with the testimonies and the survivors and we've been shooting
those interviews in 3D as well as 2D so that you
can have those in museums and have a more personal
experience with a survivor not actually being there through their AI.

For us, we're just starting out. We've been working well, just start we've been working for a year, but we've been doing our research and working with PiKNiK and other groups. And we think by the middle of this year we'll be uploaded and ready
to begin integrating Filecoin into our preservation infrastructure..

As found on YouTube

You May Also Like