backtop


Print 89 comment(s) - last by leftcheek.. on May 12 at 5:36 PM

"All of that stuff is being captured as we speak whether we know it or like it or not" -- former FBI agent

Source: CNN on YouTube





Comments     Threshold


This article is over a month old, voting and posting comments is disabled

Let's Look At The Data
By Stiggalicious on 5/6/2013 1:17:10 PM , Rating: 2
To shed some light on this, I thought I might calculate some things. I will assume some data, and gather some other data that I can find, so this is obviously just a ballpark figure.

# phone users in US: ~230 million
Average phone usage per person: 459 minutes/month
Data usage for compressed audio: 14.4Kbits/s

So when you calculate a total monthly data usage for storing phone conversations, you get 4.19GB/s, or 362.5TB of data added per day.

That's adding a 2TB hard drive to the stack every 8 minutes. Seagate must be getting some sweet business...

If the government has been doing this since 2002 (when the largest HDD was around 250GB) and has been continually storing data since then, that's a total of 1.45 MILLION Terabytes of data.Assuming the average hard drive size is 750GB, that's almost two million hard drives.
How many racks of servers would this take to fit?
The most dense commercially available storage system (excluding Backblaze, which is too new and too awesome for government) can fit 16 3.5" HDDs in a 3U case. A standard rack is 44U high, so that leaves us with 224 HDDs per rack.

Therefore, if you were to store every phone conversation in the US since 2002, you'd need over Eight and a half Thousand server racks of hard drives, which would fit in a building that's about 3.5 acres in size.




RE: Let's Look At The Data
By BRB29 on 5/6/2013 1:59:33 PM , Rating: 2
http://en.wikipedia.org/wiki/Bit_rate

800 bit/s – minimum necessary for recognizable speech, using the special-purpose FS-1015 speech codecs.
1400 bit/s – lowest bitrate open-source speech codec Codec2.[13]
2.15 kbit/s – minimum bitrate available through the open-source Speex codec.
8 kbit/s – telephone quality using speech codecs.

If I was to record just to record the information of the speech then I only need recognizable speech as using for evidence is not allowed anyways. 800 bits/sec is good enough.
All converations are then screened and filtered. Most of them probably gets deleted. Of course a computer does all this based on keywords. The amount of data stored is nowhere near what anybody here is speculating.


RE: Let's Look At The Data
By 91TTZ on 5/6/2013 2:29:29 PM , Rating: 2
quote:
The most dense commercially available storage system (excluding Backblaze, which is too new and too awesome for government) can fit 16 3.5" HDDs in a 3U case.


We have disk arrays here at work that have 24 drives in a 2U chassis. They're the smaller drives that are becoming more popular.

http://www8.hp.com/us/en/products/disk-storage/pro...


RE: Let's Look At The Data
By BRB29 on 5/6/2013 2:59:30 PM , Rating: 2
It's a no brainer for any IT Director to know that recording phone convos are high capacity extremely low access data. The most cost effective way to do this is tape storage. They can last a long time if they are not constantly accessed. This technology has been around for a long time

http://www-03.ibm.com/systems/storage/tape/ts3500/...

Provide up to 900 PB of automated, low-cost storage under a single library image , improving floor space utilization and reducing storage cost per TB with IBM 3592 JC Enterprise Advanced Data Cartridges

One base frame and up to 15 expansion frames per library; up to 15 libraries interconnected per complex
Up to 12 drives per frame (up to 192 per library, up to 2,700 per complex)
Up to 224 I/O slots (16 I/O slots standard)
IBM 3592 JA/JJ/JB/JC and JW/JR/JX/JY write-once-read-many (WORM) cartridges or LTO Ultrium 6, 5 and 4 cartridges
Up to 125 PB compressed with LTO Ultrium 6 cartridges per library, up to 1.875 EB compressed per complex
Up to 180 PB compressed with 3592 extended capacity cartridges per library, up to 2.7 EB compressed per complex


RE: Let's Look At The Data
By 91TTZ on 5/6/2013 4:04:28 PM , Rating: 2
Yeah, the actually calls themselves would probably be stored on tapes, while the database containing the transcripts of the calls would be on disk. Then they can bring them up whenever they need to.


RE: Let's Look At The Data
By MrBlastman on 5/6/2013 3:09:57 PM , Rating: 2
Why would they use hard drives? Tapes man, tapes! They've been around for decades and they're quite a bit cheaper than hard drives. DAT tapes, digital tapes, tapes of all types. You use the hard drive to record a database of what tape goes with what set of calls and where it is stored. You use the tapes to record the actual conversations.


RE: Let's Look At The Data
By 91TTZ on 5/6/2013 3:22:29 PM , Rating: 2
quote:
If the government has been doing this since 2002 (when the largest HDD was around 250GB) and has been continually storing data since then, that's a total of 1.45 MILLION Terabytes of data...
Therefore, if you were to store every phone conversation in the US since 2002, you'd need over Eight and a half Thousand server racks of hard drives, which would fit in a building that's about 3.5 acres in size.


That's actually not that big for a datacenter. Where I work we have multiple datacenters, each of which that can handle this amount of data and my company isn't huge. Apple has a 20 acre datacenter and the government is working on things even bigger, such as this 35 acre monster capable of storing yottabytes (trillions of terabytes) worth of data.

http://en.wikipedia.org/wiki/Utah_Data_Center

To put it into perspective, at the data rate you mentioned, that new Utah Data Center would be able to store 7.5 million years' worth of phone calls. And that's if it's only 1 yottabyte. The article mentions it's many yottabytes.


RE: Let's Look At The Data
By Jeffk464 on 5/6/2013 5:40:20 PM , Rating: 2
I live next to that data center in your link, you want me to knock on the door and ask them?

PS should be good for property value, this area is turning into silicon valley.


RE: Let's Look At The Data
By Mint on 5/7/2013 10:57:28 AM , Rating: 2
We may not have kept everything since 2002, but your 400TB/day figure tells us how easy it is to archive this stuff today.

Even at retail prices, that costs less than $20k per day for raw storage, which is peanuts for the US gov't. Sure, processing and using that data in a meaningful way costs more, but it's well inside the realm of possibility.

After the Patriot Act passed, I gave up on all hope of protecting privacy from the gov't. At this point, though, I'd rather have them covering it up than make it public, because when trying to hide it they will be much more careful with the data to prevent egg on their face.

The last thing I want is for them to treat data as carelessly as corporations do, where we almost expect weekly breaches.


RE: Let's Look At The Data
By gmyx on 5/7/2013 3:31:17 PM , Rating: 2
I don't think math is correct at all:

Lets take your numbers: 230m users using 459 minutes a month, stored at 14.4kbit/s (I know it can be much lower)

That is 105 570 000 000 minutes of audio per month. But bit rates are in seconds: 844 560 000 000 seconds of audio.

Using 14.4kbits / second: 12 161 664 000 000 kbits required to store that. But that is bits... 1 520 208 000 000 kbytes. Full number: 1 520 208 000 000 000 bytes.

Space required per month: 1500tb -> 1.5pb per month.

Unless my math is wrong as well and you can down rate me ;)


"It seems as though my state-funded math degree has failed me. Let the lashings commence." -- DailyTech Editor-in-Chief Kristopher Kubicki










botimage
Copyright 2016 DailyTech LLC. - RSS Feed | Advertise | About Us | Ethics | FAQ | Terms, Conditions & Privacy Information | Kristopher Kubicki