Chaff is Chaff

A good number of people think that chaff is an important place for us to concentrate. It is not. Here's why.

You cannot hide a packet among a group of others pointing to some other server. It is trivially easy to sniff the destination of a packet, or sniff for patterns (even very slow ones that only repeat 1 of 1000 packets). A packet sniffer on your ISP would tear this idea to shreds. There are now "deep packet sniffers" available that can handle millions of simultaneous connections.

A major problem here is that this offers a false sense of security. To a newbie, Chaff sounds like a reasonable solution for hiding something. When, in fact, it is absolutely not safe and really offers no protection from anything.

There are tools that actually do hide the destination of your packets. Tor is one such tool. This is FAR more important than attempting to move a haystack to hide a needle.

Please guys, let's concentrate on the things that actually improve privacy!

Comments

It seems to me that the

It seems to me that the "disagreement" is really just people talking about different things.

"Chaff" can mean a lot of different things, as we've seen here. To some, it means steganography (which, btw, is not the same as stenography ;-) To others it means normalizing the data type ratios to decrease the chances of standing out on a histogram. To a completely opposite purpose, Tinfoil Hat Linux uses it as such:

"Power usage & other side channels.
If you start the Paranoid options, a copy of GPG runs in the background
generating keys & encrypting random documents. This makes it harder to
determine When your REAL encryption is taking place. See the TEMPEST section
below."

I think this topic is getting so much attention because it hits on a very fundamental question: what exactly are we planning to protect against? We can't assume the illuminati is already watching our users' computers, so we need to enable them to fly under the data mining radar. However, we also can't assume the illuminati is NOT watching any given computer, and therefore we can't focus all our efforts on blending in.

Perhaps PL should have a switch the user can operate to choose what type of paranoia is necessary. Options could include:

  • They're not watching me specifically yet, to my knowledge, but they're data mining everyone
  • They're watching me (check all of the following that apply)
    • With a keylogger
    • Reading my email
    • Security cameras
    • Tempest
    • Reading my hard drive when I'm not around
  • I suspect they might be watching me but I wouldn't bet my histogram profile on it.

There are flaws in this breakdown, I'm sure, but this type of approach will help secure the OS much more than just picking some average and sticking with it.
________________________
http://softpixel.com/~bbinkovitz

This has gotten more attention that all PL threads combined!

I have suggested that Paranoid Linux be a series of packages that could be emerged (in Gentoo for example) to automatically configure the system with our specification in mind: deniability, cryptography, and anonymity. These three are accomplished almost entirely by dozens of pre-existing packages. I would suggest that chaffing be a package of scripts that can extend the system into chaffing behavior - since chaffing can certainly be useful.

Of course to create the ultimate distro in terms of deniability, cryptography, and anonymity - a modified system from the ground up is required - so Paranoid Linux cannot just be a set of packages and scripts. It must be a distribution variant that installs from the ground up to be deniable, encryptable, and anonymous.

As I think fletch does - I would hope we can keep Paranoid Linux development free from inexperienced user influence. This is not to say inexperienced users shouldn't be a valuable part of the development process - they are the target audience! This paradox does not have to plague our distribution as it has others.

Something that has to be said though: If Paranoid Linux is going to utilize open development, then we require a better system for contribution in order to decide upon our goals and specifications.

Up yours normality!

OK, time for theory, deeply delving into the 'what if?' zone.
In Little Brother, it is said that you could use really basic Bayesian maths to find your dissidents if you looked for those with far more encrypted traffic than anyone else. While m1k3y comes up with temporary or half-baked solutions, JoLu suggests a very profound idea: change the definition of normal.
Now, this works well in the book because JoLu is conveniently employed by 'Frisco's most popular ISP.
I'm just guessing here, but would I be right if I said no-one at PL is employed by China's most popular ISP?
So we don't have that option.
However, I have an idea, but one which comes with an ethical dilemma. Make a double ended piece of software, which senses copies of itself, so, if another copy is found, it encrypts all traffic to and from each source; a piece of soft ware which creeps in unseen: In short, a virus. Now the ethics come into play: Say akamai was infected with this 'virus'. It would send out encrypted data to everywhere, sufficiently masking the needle with an at leasts 3 billion large haystack. But it is still a virus: is this the right way? I mean, people spend heinous amounts of money trying to get rid of the damn thing; even if it works for the 'greater good'. And what exactly is the greater good? It is using many for the good of few.

So yes, I would say do it but what about everybody else?
This is a wild idea, but I think there is some potential in it, so please have a free for all figuring it out.

Thoughtfully,
f3l1x
"The only thing that hasn't changed since 9/11 is that the government is still *u**ing up"
-Steal This Book Today

A more thoughtful idea would

A more thoughtful idea would be to build add-ons for commercial applications (Skype), that would be widely used, or get some of these applications (ie. Firefox, Chrome or other web browsers) to build them in with a defaulted "on" state.

I like your thinking

The issue is that yes, that would be a virus. Our best bet is to concentrate on helping TPB with their plan to encrypt the entire internet. It would be too hard to pull something like IndieNet out of nowhere. Though, if we got internet radio stations to start encrypting their streams....

Take it back.

squeeze something out of this

Viruses get detected. This particualr one would slow traffic, which is a big deal, and believe me you'd be convicted for making it eventually. It's a silly (desperate?) idea. In my opinion if someone studied Tor, users could be identified some other way. As for encryption, encrypted data is the same as random bits. Those are much like binary files. So pretending somehow they are binaries/compressed/audio/whatever would help. There's always places like lyrics in ID3 tags for mp3 audio, or comments in zips, but a program may need to be made on the side you're sending to, to unvail true requests, plus if someone was to check you out he might find out. But if it's identification of secured users alone that you worry about, I don't think it matteres that much. If they count random streams of binary codes (let's call encryption that) for each ip (and alert someone if that count exceeds certain number, number = encrypted packages / overall packages), you could here and there send a request to a site that's not encrpted. But then you need site lists, which aren't secure... wah. It all just gets more and more complicated... I've got one more idea, then you can trash this or whatever you want. Use encryption which uses byterange of ASCII plain text. That way robots would see it as not encrypted.
Anyway, people from cryptography would always say, you need an open standard, your key must be secret, not the way of encrypting. And we're discussing just that - hiding that it's encrypted. People don't develope that stuff... I don't know.

someman7@gmail.com

P.S. I don't care, I'm sending this.

Fair Call

(see title)
"The only thing that hasn't changed since 9/11 is that the government is still *u**ing up"
-Steal This Book Today

Peer-to-peer is suspicious

The problem with a lot of these ideas is that any direct (tcp/udp) peer-to-peer activity is suspicious, because an ordinary user spends most of their time surfing the web (i.e. visiting pages on servers hosted at a data center). There are not that many peer-to-peer apps that have enough bandwidth and the right usage patterns to hide data. If you wanted to covertly deliver information to one person, you could probably hide it encrypted in a video stream (or invent some secret hand gestures), but you wouldn't be able to Tor all your Internet traffic since it's hard to claim you spend all day video chatting with people. And of course, the other big peer-to-peer application, file sharing, is bound to attract attention simply because most people use it to illegally share copyrighted materials.

Digital Forensics

"Chaff" as you call it, is more to skew statistical analysis and manual forensics.
I do on-site digital forensics (I don't often do networking) and an indicator of odd activity is lack of activity altogether, which in my line of work could be something like an empty browsing history or lack of automated log files.
Same applies for networking. If I were in charge in developing an automated monitoring system for the purposes of spotting 'devious' elements, I would collect statistics on the type of traffic generated from a user.
If the connection is residential, I would be expecting a high amount of HTTP connections and POP3 traffic, if I start seeing a lot of unidentified traffic which contains encrypted elements I would flag it for further investigation.

Missing the point.

The true purpose of the chaff system is not to hide packets. Chaff does diddley squat for you if you're already being watched by the suits. But that's because it's not designed to do that. Chaff is a tool to keep you out from under that microscope in the first place. The system should throw up chaff to make any histogram they take of your internet connection to look like the norm. You throw off enough unencrypted chaff to balance out your encrypted signals, and you look a lot less like a wolf and more like another one of the sheep. I have a conceptual spec for a chaff-generation system, if anybody thinks it's worth a shot:

Unencrypted Chaff Design:

Overview: An AI user sends traffic to "standard usage" domains in the clearm to make the enc/unenc profile of the machine more average. A list of permanent sites to visit aids the illusion of an actual person being behind the surfing.

The AI system has a "web" of interest areas. These areas are connected to each other with logical links, by categories and by associative links. These interest areas provide the basic regions for the AI to visit. Of course this would require that a great many sites be indexed into the web of subjects, but with the example of StumbleUpon, we have seen that this can be done. Once every 20-100 days the AI will add and remove a few new subjects on its list, based on the history of visted sites. Also, once per month, the human user will check the list, to make it more realistic.

The AI selects a certain number of sites to visit each day out of the list of sites in the interest groups. To further mimic human site selections, the AI tiers its interest areas. The top interest area (by # of permanents) gets 5% of traffic, the rest of the top 20% gets 40%, the next 20% gets 24%, the next 20% gets 18.2%, the next 20% gets 9.12%, and the final 20% gets 3.68%.

Sites visted in each 24-period are broken down into the following categories:

newVisits - Sites drawn from the web of interests to visit on any particular day. Number of newVisits should be from 25 to 150 sites minus the square root of the permSites and superPermSites. Time spent on a newVisit should be generated by the following method: For each page, spend 1 sec to 2 min per 2500 words. When finished with a page, 45% chance to visit another page on the same site linked from the page. 5% chance to visit another site linked to by the page. These percentages should be adjusted based on the interest web's relative strengths.

permSites - Each day, the AI should select from 2-5 sites out of the newVisits to elevate to permSites. This selection should be made based on the strengths in the interest web. All permSites are visited 1-3 times per day, and from 2 min to 25 min should be spent on each permSite. Every 72-96 hours after the elevation to permSite, the AI give the permSite a 50% chance to be de-elevated.

superPermSites - If a site remains a permSite for 3 weeks, it is elevated to superPermSite status. superPermSites are visited 3-7 times each day, for 5 min to 2 hours at a time. superPermSites undergo the same automatic removal process as permSites, except that when selected, there is another probability test: 65% chance to remain a superPermSite, 25% chance to become a permSite, and 10% chance of removal.

When visiting a page that has posting enabled, there with be a 0.1% chance to pass the posting form to the human user, generating human content. This chance becomes 2% for permSites and 5% for superPermSites.

The AI can have up to 25 "tabs" worth of content open at one time, and may make up to 3 page fetch requests at one time. Pages do not need to be active to tick away time on their counters.

It's more than likely that my numbers deserve some tweaking, and hopefully someone out here has a novel idea that I never even considered. But here's my starting point.

~~~~~

Take it back.

Kind semi-chaff idea. Please shout down if it doesn't work

OK, just a wild idea I had all of 3 seconds ago.
For chaff 'history', talk to the nice folks at TOR, and ask them to modify their program some what like this:

Those who use TOR for a school-based purpose *cough me cough* probably aren't doing anything too illicit with it. Ergo, we figure out a list of 'censor friendly' sites and use TOR to shift around these sights and send them to PL to use as history, to help with chaff.
This <strong>should<should> work around those oh-so-pesky packet sniffers, because we are actually using real history. Sneaky.
(Or possibly very stupid and 100% detectable, as I really don't know my donkey from my elbow when it comes to this kind of stuff)
F3l1x
"The only thing that hasn't changed since 9/11 is that the government is still *u**ing up"
-Steal This Book Today

Of course its not.

Now I'm no expert, but the best way to encrypt something is to hide it in public. The two easiest ways to do this are revolving cleartext (see Digital Fortress by Dan Brown), or a ridiculously complex translation logic program. Take this idea: Find a complex, preferably pictogrammatic language, eg. Cantonese. Now write a little program that does something like this: Take english text, and single out the consonants. Next find the romaji equivalent of your characters. Create a logic script (sounds painful) that finds and strings together sensical and innocent sentences using the consonants in the romaji (flinches). Then finally, figure out some super sneaky way of weaving the encyption code into it. This is where the other side comes in. The encryption in the sentence is an open ended 'lock' and the recipient's code is a 'pick', that uses hints in the 'innocuous' message to decode it. Now, the masterstroke; make a 'skeleton key' that all the 'locks' recognise, and which, when used, causes the 'lock' to think itself to be an actual message.
Now this is probably far too complex, as you can tell, I don't really code, but that doesn't stop it being a better idea than chaff.

Anyways, woohoo 1st post and all that.
Va^b0
"The only thing that hasn't changed since 9/11 is that the government is still *u**ing up"
-Steal This Book Today

reality check

Good f3l1x,

The best way to encrypt something is to use a validated instance of a publicly inspected open-source algorithm, e.g., AES. Home-brew schemes like yours invariably fall to the attentions of experienced, professional cryptologists (who not only do the math, they also write the code).

As for Digital Fortress, you can learn more about cryptography and data-hiding by watching Hackers. Digital Fortress is so full of holes (both technical and plot) that it can only serve as a counter-example in a college freshman-level creative writing course.

I am minded of the hoary joke (dating back, quite possibly, to the Eniac age): The first bi-directional English-Chinese translation computer was finally completed. The computer scientists and electrical engineers fed the phrase "Out of sight; Out of mind" into the machine, then fed the output Chinese translation back in. The response was "Invisible idiot." (BTW, "romaji" [sic] is a Japanese, not Cantonese, alphabetic presentation of the phonetic equivalent of the individual words.)

Sp00ky

"No matter how paranoid you are, it isn't paranoid enough." -- X-files

The Theorem

As for the whole thing of data exchange for the internet, just wondering how easy it is to fake time stamps on email (Or IMs)?
Because using a awesomeified version of my theory which actually works, would it be a possibility to deliver data through either IMs or emails (make them look spammy to be realistic) with fake time stamps? This should not really alert Big Brother as much, because everyone gets spam, and if someone was spammed really badly, would they be able to delete it all? If the time stamps were backwards from the current time to the computer's connection (although having spam appear every x miliseconds wouldn't work) would it be possible? Probably not, but it is worth a look...
If anyone has anything to say on this, including insults to my intelligence, please let them loose...

Va^bo
"The only thing that hasn't changed since 9/11 is that the government is still *u**ing up"
-Steal This Book Today

Point taken

I see what you are saying,but hey, as a theory it could work.
I only used Digital fortress because it sprang to mind at the time, I totally agree with you though.
Oh, and as for ロマジ I speak 日本語, but not Cantonese, so I just used the same word.

F3l1x
"The only thing that hasn't changed since 9/11 is that the government is still *u**ing up"
-Steal This Book Today

cool!

I don't speak/write any Oriental languages (loved the ideogram insertions); do speak/write a smattering of German. But when I'm wearing my tech writer hat I do need to know about technical terms for linguistic components. Oh, the curse of the eclectic mind!

Sp00ky

"No matter how paranoid you are, it isn't paranoid enough." -- X-files

VPNs from ParanoidNetMockup

Hello,

I have seen the ParanoidNetMockup.pdf (http://www.filedropper.com/paranoidnetmockup) and I think that it could be interesting (whether there is chaff being sent or not) to use something like P2P in VPN context, as can be found here: http://www.ntop.org/n2n/

If this does not feel like the right place to post this comment, please feel free to move it where it belongs, or tell me to post it where more appropriate. Since fletch's ending comment was to "concentrate on the things that actually improve privacy", I thought this could be interesting here.

Chaff is a red rag to the totalitarian bull

Chaff is inherently dangerous. In a totalitarian society, different rules apply and there's no real safe way to do automated traffic. Chaff looks a bit different from normal traffic. For example, normal traffic will include intelligent postings to web sites. Responses to articles etc. If I just go through the list of all people who are sending lots of web traffic and select those that don't post comments I probably have a good set of targets. Now I just go and raid/torture them all. Some the time I get an innocent victim. If so I kill him so he doesn't complain. But quite often I'll be able to cach and torure a user of your system and use his configuration to learn how to find more.

Now, your response will, of course, be to start generating comments. Unfortunately this is like software patching. You make your move, then the other side gets to see your move, then they choose their move and you can't even see what it was and you certainly can't react directly. Whatever it is which makes your system different from a human will be spotted and used to find it.

So; the ideal system should generate NO automated traffic which wouldn't be generated by a legitimate system in that state. You probably want to use Windows in a VM which thinks it is directly internet connected (or a real system for more hardcore usage) and have the paranoid system as a tranparent proxy between the Windows system and the rest of the world.

There are lots of covert channels you can use

  • timing changes on outgoing packets
  • changing salts in encrypted packets
  • hijacking SSL sessions

but doing it right will be difficult. Chaff is not the way to go.

Presumes no user.

Your comments about the lack of intelligent content in chaff assumes no human user of the system. This may not always be the case.

I think that the primary use of chaff would be less in distracting someone from seeing a covert channel then in disguising the fact that a channel is used primarily for covert purposes. That is, say I want to use flickr to post photos that that can contain hidden signals (a picture of a bench means X, a trashcan means Y, etc, or, some other steganographic content). If I never post to that flickr account except when sending a covert message, then the mere act of posting to that account is unusual and may give me away. In this case, the chaff would be the automated and occasional posting of random crap to that account. Thus, when I need to post covert content, it doesn't look unusual.

more pro-chaff rhetoric

The RSS and Last.fm combination is brilliant. Just for the stake of sticking it to the man, why not TheLastRipper? Further, I'm still a fan of dummy social network accounts... spend five minutes in a Panera within fifteen minutes of a college campus and that's all you'll see.

Again, let me reiterate that this is useless to someone who doesn't understand how crypto protects them-- it still strikes me as a useful layer of plausible deniability though.

Personally I like to think

Personally I like to think of chaff like this:

Say if an Agent wanted to courier a message to some other Agent somewhere.

He could take the message along as it is, but if he were captured, they would be able to read his message.

He instead could try encrypting his message first, but his captors would still easily know it was a message of value, and begin trying to decrypt it.

If he used invisible ink then if he were captured, they would find him with a blank piece of paper, but would then likely suspect there was a message hidden in it and begin trying to find it.

But if he were to create a nomal "decoy letter", just talking about unimportant things, then put the invisible ink in between the lines of the letter, the enemy would be much more likely to not realise anything suspicious was going on (depending on the quality of the decoy letter).

This "decoy letter" would be the chaff. Used by itself it is useless. That would be like writing the message in plain, visible text in between the lines of the decoy letter. If used with encryption, we still have the same problem, because its obvious theres something else there as well.

The only problem is finding a way to hide the encrypted communication from suspicion with some technology to replace the "invisible ink". The chaff would then serve to make it seem like it was just a normal communication.

(hope somebody understands this, I'm probably rambling rubbish)

Yep. (I think)

Although its beyond me on the whole.
"The only thing that hasn't changed since 9/11 is that the government is still *u**ing up"
-Steal This Book Today

Huge amounts of data is not the key

Huge should not be the key with Chaff, quality also matters. Ensure that whatever communications are going not distinguishable by pattern and also make them communications of interest generating wasted cycles.

Additionally chaff is bullshit because it is chaff, security through obscurity, do not rely or trust on it

Agreed.

Cory is a good writer, but despite the buzzword recognition Little Brother is a science fiction novel, not a technical manual. You can't put out chaff that looks like normal internet use in order to hide traffic that *doesn't* look like normal internet use.

Remember, Cory is the guy who suggested (on Instructables) that you try to hide the noise pattern in your camera by compressing it and blurring it. There's a professional comment in reply that points out it's much better to capture the noise by photographing blank coloured sheets, subtract the noise, then take 10 overlapping pictures and blend them. It's easier than the amount of compression and fiddling he suggested as appropriate, and you actually get pictures that don't look like crap.

That's a simple problem. Hiding network traffic is very complex problem. ParanoidLinux was artistic license for "there is probably a reasonable solution". For all Cory knows there *isn't* one. It is entirely dependent on what "normal internet use" looks like and how much traffic you have to hide, at what bit-rate, with what latency, and to which people.

The only way to hide darknet traffic is to actually tunnel it over your "normal internet use". Unfortunately even BitTorrent probably wouldn't be any good for that. IIRC the .torrent file includes a cryptographic checksum of each block, so you can't pretend to download a .torrent and then exchange arbitrary packets with peers - your packets wouldn't match the checksums. It'd be trivial for an attacker to pattern match for unencrypted .torrent files and randomly test packet checksums.

Email, on the other hand... Perhaps I have an obsession with email. But it's not suspicious to access an email server using TLS/SSL encryption, and to maintain a constant connection.

Hard crypto works. Steganography - hiding stuff in plain sight like Cory's DNS video hack - is another thing altogether. Your normal DNS traffic runs at what - 100x lower rate than real-time video? And then you tunnel video over one bit in 100 out of that? It may be possible, but it's not proven. If they can crack your crypto, then your screwed and they can 0wn the world anyway, because you're using the same crypto that protects bank websites. Stenography has no track record. How can you have confidence in a given chaff generation algorithm? How many people are trying to break it and willing to publish, as opposed to the attackers who are trying to break it and will pretend that it's still secure?

So: email. Obviously you have to keep the traffic down to a normal email like level. Obviously you should chose your server very carefully. You especially don't want a server that automatically copies messages to your sent folder (auto-BCC functionality). A simple safe solution would be run a darknet where you all use the same email server, but that makes you easier to detect. I believe some servers attempt to negotiate encryption on *outgoing SMTP* (TLS again), which would let you extend that trust domain across multiple servers.

Note that this is transport layer encryption. GPG is cool but not enough people use it - so if you send GPG encrypted messages over unencrypted SMTP, attackers will notice immediately.

Have a mail client running in IMAP mode without a cache, which will help you account for a reasonable traffic level if challenged by an attacker. You might keep your real email application on a gigabyte+ micro-SD card - I think it depends on what sort of threats you're trying to defend against.

The traffic limits on this aren't as bad as they might sound. In a Little Brother like scenario, you could still e.g. develop an entire Paranoid Linux. Time critical stuff goes by email. In Little Brother everyone lives close together anyway, so you can swap gigabyte SD cards with massive binaries or source repositories, or meet up for longer and sync laptops.

If you're doing something like that, why do you need your machine to generate chaff? Anything going over the email channel is encrypted; pushing more data over that only makes it more suspicious. Automatically generating extra traffic over unencrypted channels is not smart. If you want normal-looking traffic, maybe use an RSS feed reader so you can be more productive at browsing news / blogs / facebook (and hence spend more time using your encrypted channels). Use a web accelerator that does prefetching for a bit more traffic - and more productivity for your *manually generated* chaff. Use Last.FM.

The point of this manual chaff is to make sure that you preserve a reasonably normal and explainable traffic profile. If the ONLY thing you use is encrypted email - then Big Brother knows you must be tunnelling all your web traffic through it.

BitTorrent

> Unfortunately even BitTorrent probably wouldn't be any good for that. IIRC the .torrent file includes a cryptographic checksum of each block, so you can't pretend to download a .torrent and then exchange arbitrary packets with peers - your packets wouldn't match the checksums.

Actually, I would put the payload in failed packets. Have a BT client that can stash and reassemble the failed packets and you're set. The data in the failed packets could be an function of a shared key and the corresponding valid packet. Or whatever. Just mix in some bad packets that are truly noise as a layer of chaff.

The value of chaff

I think the true value of chaff is to hide steganography. It is correct that Tor and Freenet nodes will be known to any slightly competent eavesdropper, and use of those systems will be detected despite any amount of chaff.

Consider the case where a hidden encrypted stream is hidden inside a different, seemingly innocuous data stream. For example, you're sending and receiving messages by encoding them in JPEGs which are uploaded and downloaded from some webserver on a some IP that has been set up by the other party in the communication. Uploading and download JPEGS all day would look a lot less suspicious when interspersed with a bit of web traffic.

I'm assuming a lot here; that someone has set up a webserver, and has shared the IP address without tipping off the eavesdroppers, and also that the eavesdropper cares enough to statistically analyze the user's traffic. But my central point stands, I think: it is not the case that chaff is categorically useless. It is not a solution by itself, but it does have a small role to play.

Use of mass-market services is unsuspicious.

Someone has set up that webserver - it's called Flickr. The hypothetical You just needs to develop a reputation as a photo-obsessive who posts everything up there, and spends ages browsing others' photos. Some of the photos you browse, of course, will be steganophried messages; most won't, if you've any sense. In this case, the non-message photos would be chaff. It's an example of where chaff is useful; poisoning behavioural analysis.

Obscuring crypto

My understanding was that the point of chaff is to obscure just how much encrypted data is being flung around. I mean, if every single packet you send out is encrypted, you would appear to have something to hide.

However, a zombie that bumbles around facebook all day, fills out Livejournal quizzes, watches youtube videos, and IMs other robots all day would make the overall traffic analysis look less suspicious.

I agree that it does nothing to keep your cipher data safe, but a sufficiently random and frequently updated library of random requests to send out would serve to obscure what's actually going on. Obviously nobody is going to calculate 2^1024 possible ciphers and find out what you're actually doing, but they just might notice that you're doing something unusual. Once that happens, it's a matter of trusting Time Warner not to help track you down (unless you live in Manhattan, you can only steal so much wi-fi before everyone starts encrypting their networks.

Please tell me if I'm wrong-- and a big thanks from the demigeeks who don't have the skills to take on a project like this.

Encrypted traffic can't be

Encrypted traffic can't be hidden by chaff. Not possible. Since the development of Deep Packet Inspection, even application-level encryption is obvious.
And so long as encryption is obvious (and uncommon), even if it works it allows your identity to be determined. Result? Name on a list, at best.
Steganography is a better solution. The message can be encrypted before hiding, but the essential idea is to prevent others even knowing the communication is taking place.

Hiding crypto is the point of steganography

> Encrypted traffic can't be hidden by chaff. Not possible.

Sure it can. That's really the point of steganography, in essence. Yes, you can't hide encrypted packets by sending a bunch of plaintext packets to other people, but you can hide encrypted packets by using steganography to hide them in innocent looking data.

Saying that "chaff" is useless really just shows a misunderstanding. The original idea from Cory's book is pretty simplistic and vaguely put, but the underlying idea makes some sense. You just need to understand the idea as "I need to hide information using steganography inside my ordinary looking activities. The chaff is the ordinary activities bit."

The flickr idea above is an excellent example of this. Regular people upload a lot of photos to flickr. Regular people download a lot of photos from flickr. Using flickr doesn't make you suspicious.

So, given a program that automatically mangles photos slightly by encoding encrypted messages into the low bits of the data (any good stega scheme will start with strong encryption, of course, with the proviso that the encoded message has to look like white noise), you just need to take a bunch of random innocuous photos ("Look ma! I baked a cake to celebrate the gloriousness of our wise leaders!") and upload them using your mangler tool.

Your mom gets a nice photo of a cake, and your sister's computer will automatically recognize the hidden message in the photo because she has the right key and import it into her secret environment. When she checks her messages, she'll find that you've sent her important news clippings from Web Free Fascististan, and routed various encoded messages to her from her friends and message boards.

See? The basic idea of "chaff" is sound, but you need to understand it to really mean "steganography." It also probably implies all sorts of things about how the system can work. For example, you need to actually be a producer of information for this to work -- photos, videos, video game packets, something. There needs to be something you can encapsulate encrypted data in which looks otherwise benign. And this also probably implies a store-and-forward network, not a real-time one, since it'll be very difficult for most people to create enough data in the near future. Perhaps one day, you'll be able to hide information in a video conferencing chat session or in your phone calls, but right now that would stand out.

Instead of flicker. Porn.

Or something that we all look at like porn. Say persons A, B and C wants some information from person D. Now them all going to the same flicker page is going to be suspicious so person D throws it all up on say Iporno.whatever. Persons A, B and C go to this webpage and download some photos and gets the information they want. Plus it is reusable and easy to hide since you are using chaff to mask this website amongst a whole torrent of others. Hell you don't even need to do more then just go onto the webpage to get the photo (in this case it is masked in the header pic) since it would get downloaded straight to your cache. Just have a program monitoring all inputs to the folder try to decrypt every new input.

The perception of security

I agree with that. I keep trying to explain to people that it's better to assume you have no protection, than to rely on bogus protection. The illusion of security encourages you to be careless - after all, the X will protect you (X= firewall, proxy, encryption, whatever). In fact, it often *won't* protect you. Moreover, some of these services could be bait by various agencies to ensnare dissidents and others.

There are many people in prison or dead, who relied on security that didn't protect them.

Just curious

If the entire idea of chaff, even as presented at it's best, so to speak, is to spider from site to site, and behave as anticipated a human would, how will this idea succeed in fooling an outside observer? Anyone watching would see you going to various sites that would be suspicious, and traffic from you going to innocuous locations and doing nothing, moving too fast, too slow, or too predictably to be a human being and not an automation. So how does this fool anyone with any amount of time on their hands at all? Simple pattern analysis would foil simpler generators, even if they added new sites to their "spidering", and sites that added sites more actively would be obviously inhuman as they would be too sporadic or random.

And if the whole thing is run through TOR, what does chaff do other than put a strain on the network?

tl;dr I really like the idea of chaff, but how is fletch not 100% right on this?

It could be done

The task of having a bot mimic human activity isn't that difficult. You can easily factor in a random time so that the bot pauses occasionally on a page, or skips through a couple, unpredictably. It might even scan for certain words (like "sexy", or other triggers) and add a pause, as though a person was interested in that topic. Maybe skip through pages with lots of math. That sort of thing. It should be easy to examine real records of browsing, to ascertain what sorts of pages slow a person down, which ones tend to get passed by quickly.

Spidering can easily mimic human patterns. Just look for phrases or keywords in a consistent manner. So instead of randomly spidering to Gregorian Chants, WoW, baseball scores and Urdu, make the bot move within themes - say, "sports", especially baseball but not exclusively; and with perhaps beer and SUV's, whatever. It really shouldn't be too difficult to create "profiles" that would give some parameters of how a human would wander through the Internet.

I don't think chaff is a good idea, though. It's similar to "security through obscurity", relying on the ignorance of others to protect you. Someone will figure it out, and find a way around it, and then you'll have worse than no protection.

As fletch pointed out, it's simple enough to pull a thread out of the various packets and to read it. No amount of chaff is going to suffice, because the important bits are all clearly identified. Ethereal does this.

You have an interesting point:

If you are preparing to fend off potential threat, you must think as an attacker - make probe tools first, to test your potential solutions. If you want an automaton to mimic yourself, you have to create a mold of yourself somehow. So, first thing to create would be a set of behavior analysis tools, then try to fool it with your chaffbot. Of course government(s) use such tools on monster computers, but they have much tighter demands: real time, massively parallel, all-encompassing. However, if your target is only you yourself, one of your bellowed boxes will be more then sufficient to extract your statistical profile. Always assume you are already scanned, characterized, and categorized (which probably is true for each person on Earth, of whichever nationality, who is regularly online). You need to keep tracker happy, not raise alarms. If you are curious and into "Gregorian Chants, WoW, baseball scores and Urdu", you are interesting on general principle, just for the fun of it (if observers are any good material) and you will draw more attention by pretending to had become a Johny Sixpack. That would clearly be suspicious.

No. Chaff will not hide

No. Chaff will not hide 'undesirable activity'. It just buries it in a haystack. And it's exactly like looking for a needle in a haystack, and "they" have some REALLY big magnets...

The big thing with chaff

Is that any solution can be countered. Any encryption can be broken. The bad guys will *always* find a way to get the info they want. Chaff is not the panacea, but none of what you all are describing is a panacea.

What chaff is is just an extra bit of static that they have to sort through to find out what they want to. If the chaff engine is designed in such a way that the chaff is generated at random, by following, bookmarking, following bookmark links and other methods to form its own random use profile (or even multiple user profiles) and also uses Tor, encryption, etc it will be a lot more difficult for the bad guys to figure out what's really going on.

Every bit of energy they waste sorting through chaff is another point towards making it not worthwhile. Eventually we'll get to a point where it's so costly to figure out what exactly is going on that they'll either give up or have to devote so many uses to one user that they reach the breaking point.

We will never create an unbreakable fully private system, therefore we should strive to create a system that just isn't worth breaking.

Chaff is easily sniffable, no matter how you try to hide it.

This is because the WHOLE POINT of chaff is to mask the fact that you ARE doing subversive stuff. Really. But the subversive stuff will be TOR (all nodes are published, easily locatable by The Man), and Freenode (or a custom implementation of something similar). Freenode is TRIVIALLY detactable. No. the encryption on TOR and Freenet are not going to be broken any time in the next millenia (so long as we get the freshly patched RSA keygen), but the point of chaff is to hide that you're using them at all, and, sorry, that is actually un-doable.

Chaff is noise...

While I agree that the chaff is there to mask what you are really doing, I don't think it is as useless as you make out. If your chaff includes some noise that cannot be easily broken (SSH, HTTPS, etc.) and also some "broken" Tor traffic (i.e. use Tor to read facebook, slashdot and others in plain text) then anyone sniffing your traffic has to sort through the traffic to find Tor, then sort through the Tor to find encrypted stuff and even then they shouldn't be able to tell if it is something subversive, or just you checking your bank balance online via SSL.

So I don't think chaff should be ruled out entirely, but it does need to be implemented in a smart manner...

I would also like to echo concerns from others over bandwidth usage of chaff, while more traffic may make finding the signal harder, it will be annoying if the chaff bogs the system down.

Undoable?

What about masking those communications through a side-channel like Cory wrote about pushing video through DNS requests?

It's possible to mask single

It's possible to mask single messages using steganogrophy - think email - but a Tor connection?

And that "pushing video through DNS" is absolute bullshit.
"200 MB should get you about 1,000,000 DNS queries. A typical site uses 5 MB per year of DNS bandwidth." (ZoneEdit.com)
At 4 MB a video (doesn't sound too far off YouTube average, on a minute's research), assuming all the DNS data is video (which isn't steganogrophy, or any kind of secure), that's 20,000 queries per video. How long does it take you to visit 20,000 unique websites you haven't been to recently? (DNS queries cache, you know.)

I have seen IP over DNS implemented (some people just can't leave anything untested); it's painfully slow (think 1900 baud dialup).

Encryption is breakable?

Read up on 1024-bit RSA, and 128-bit or 256-bit AES. The FBI recently admitted that it's easier to guess passwords than break modern encryption.

Beware

Encryption relies on primes and factorization of large numbers. Supposedly a proper encryption method, using large primes, can resist a brute force attack; and no publicly known polynomial method exists to factor primes. Unfortunately, your security relies on several important pieces. If one or more of these is weak, your protection is diminished.

First, your numbers may not be primes at all. The numbers you'll need to use will too large for you to check directly. You'll need to rely on tests for primality, but those tests will be probabilistic. That means you may have a pseudoprime. If that's the case, you're hosed, because a powerful computer could find the factors of that pseudoprime, and then your house of cards comes tumbling down.

The encryption algorithm also needs to be robust. Some of the older methods have been shown to be far less robust than believed, by many orders of magnitude. Newer algorithms are not *known* to have these defects, but that's not the same as saying they are known not to have the defects. All it means is that the defects, if any, haven't yet been uncovered. They may exist; those who are spying on you may know of them; and then you're hosed again.

Finally, a long shot, it may be that some agency has found a determinstic test for primality in polynomial time. Not likely - the problem is though to be intractable or "hard" - but much of the research is classified. You can't rely on encryption.

I would not rely on what the FBI says about its ability to decrypt. Think of it this way. If they claim to be able to decrypt your messages, you'll either seek even better encryption, or you'll be very careful about what you say. But if they throw up their hands and claim they're beaten - well, hell, you can say whatever you want. How convenient for the FBI, if they happened to be lying about their abilities...

FUD alert

You're wrong, misinformed or too vague on several points:

Encryption relies on primes and factorization of large numbers.

Sure, RSA and friends relies on prime factorization or related mathematical puzzles (e.g. discrete logarithms). But these are all assymmetric (i.e. public key) cryptography systems, and only a subset of all of them. No common symmetric ciphers relies on prime factorization, so your "critique" does not apply to DES, AES et al. and not Elliptic Curve and other types of PKC.

First, your numbers may not be primes at all. The numbers you'll need to use will too large for you to check directly. You'll need to rely on tests for primality, but those tests will be probabilistic.

There's no problem here. The randomized (or "probabilistic") primality testing algorithms used today are all yes-biased monte carlo, so they will never be wrong when they say that a number is prime. They can only err in the way that they may say that a certain number isn't a prime when it in fact is. And the probability for that error can be arbitrarily chosen by running the algorithm on the number enough times, not that it's terribly important to do so. If the algorithm says that something isn't prime when generating a key, discard it and randomly choose a new key and test it. That strategy cannot fail -- you will get a real prime.

Newer algorithms are not *known* to have these defects, but that's not the same as saying they are known not to have the defects.

What defects are you talking about? Resistance against differential cryptanalysis? All widely used symmetric ciphers have that today, e.g. AES and even DES (thanks to NSA for that). All modern crypto systems are designed to defend against all old general cryptanalytic attacks, otherwise they would never be accepted by any standards body. New attacks are required to break modern ciphers, plain and simple.

Finally, a long shot, it may be that some agency has found a determinstic test for primality in polynomial time. Not likely - the problem is though to be intractable or "hard" - but much of the research is classified.

It was proven just a few years ago that such an algorithm indeed exists, but AFAIK it has not been found yet^1. While that was a theoretical breakthrough, in practice it means nothing. The randomized poly-time yes-biased monte carlo algorithms for primality testing of today (e.g. Miller-Rabin) works well for the purpose of generating strong RSA keys.

Now, what you probably meant was that some spy agency might have discovered a poly-time prime factorization algorithm^2, which is a different beast altogether. Now that would be something...

^1 Agrawal, Kayalo, Saxena, 2002.
^2 That runs on classical computers -- we already know such an algorithm for quantum computers.

Brute-forcing RSA should

Brute-forcing RSA should take longer than the heat-death of the universe. Even if there's a flaw, it's likely to be computationally intensive to break anyway. If they know who you are, it's likely to be cheaper and yield a better return to just torture the info out of you. The job here is to prevent the "knowing who you are" part.

Depends on how you define "breaking".

The strength of encryption itself aside, the weakest part of any process will always be the end user

In my mind, I see the FBI not as admitting a weakness, but instead stating a simple fact; poorly designed passwords make encryption immaterial. Along this logic, even 256-bit AES is "breakable" in the same sense that instead of smashing down the door to my house, I could always just use a key to open it instead.

And of course, all the onion routing and encryption in the world won't help against something as simple as a screen-reading backdoor operating locally.

Paranoia is the point...

... we are talking about network communication. In these cases, neither tor nor freenode has a "password". it's RSA-encrypted on the fly with keys that the user (generally) will never see or know.

Privacy in dyne:bolic

Hi!
There is a distro called dyne:bolic (http://dynebolic.org), that supports keeping private data (all of it) in one encrypted file. It is also clutter-free, clean, easy to "install" on FAT partition, and easy to move to other machine. In pure:dyne (fork of d:b) Tor and Privoxy are installed by default.

I've heard that Skype is hard to crack or block.

IMHO also some solution for activists is needed in the areas of: making press-ready documents (Scribus? TeX?), blogging/Internet publishing (maybe with server?), audio edition and streaming (podcasting, radio), video edition.

All should be not only "untrackable", but also EASY TO REMOVE AND BREAK, so prefferably small and easy to work with pendrive-install.

Chaff isn't solution to me in any case - just waisting of bandwidth, that should be used to make strong privacy, not just acting as a teenager. Steganography is much better way.

How about running blogging platform with an option to use steganographic material as an input (in place of plain text)?

How about changing MAC with each reboot?

Skype was hard to block.

Skype was hard to block. There's a 2meg app available for free to do it now.
Your MAC is generally set in your network hardware. You can spoof it, sure, but since your MAC doesn't leave the local LAN (you to the router) what's the point?
A blogsite capable of turning steganographic comms into posts would be good; point it at the "drop box," give it the algorithm, and away you go...

I think...

I think we should keep something similar to Chaff on the table, but possibly save it for a future release. I would hate to see perfectly good release held up because of a system that might or might not work. Let's focus on the essentials for now.

/2cents