Hi Nick,

The great thing about any *unsalted* hashes is you can precompute them
ahead of time, then it is just a lookup to find the password which matches
the hash in seconds -- always makes for a more exciting demo than "come
back in a few hours".

It is a no-brainer to write a generator function to create all possible
passwords from a charset like "
abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789", hash them
and store them to lookup later.  It is however incredibly wasteful on
storage space.

- all passwords from 1 to 9 letters long
- using the charset above = 13,759,005,997,841,642 passwords
- assuming 20 bytes to store the SHA-1 and up to 9 to store the password
equals approximately 375.4 Petabytes

Thankfully there is already a more efficient/compact mechanism to achieve
this using Rainbow Tables <http://en.wikipedia.org/wiki/Rainbow_table> --
better still, there is an active community of people who have already
precomputed many of these datasets already.  The above dataset is readily
available to download and is just 864GB -- much more feasible.

All you need to do then is write a rainbow-table lookup function in Spark
and leverage the precomputed files stored in HDFS.  Done right you should
be able to achieve interactive (few second) lookups.

Have fun!

MC


*Michael Cutler*
Founder, CTO


*Mobile: +44 789 990 7847Email:   mich...@tumra.com <mich...@tumra.com>Web:
    tumra.com <http://tumra.com/?utm_source=signature&utm_medium=email>*
*Visit us at our offices in Chiswick Park <http://goo.gl/maps/abBxq>*
*Registered in England & Wales, 07916412. VAT No. 130595328*


This email and any files transmitted with it are confidential and may also
be privileged. It is intended only for the person to whom it is addressed.
If you have received this email in error, please inform the sender immediately.
If you are not the intended recipient you must not use, disclose, copy,
print, distribute or rely on this email.


On 12 June 2014 01:24, Nick Chammas <nicholas.cham...@gmail.com> wrote:

> Spark is obviously well-suited to crunching massive amounts of data. How
> about to crunch massive amounts of numbers?
>
> A few years ago I put together a little demo for some co-workers to
> demonstrate the dangers of using SHA1
> <http://codahale.com/how-to-safely-store-a-password/> to hash and store
> passwords. Part of the demo included a live brute-forcing of hashes to show
> how SHA1's speed made it unsuitable for hashing passwords.
>
> I think it would be cool to redo the demo, but utilize the power of a
> cluster managed by Spark to crunch through hashes even faster.
>
> But how would you do that with Spark (if at all)?
>
> I'm guessing you would create an RDD that somehow defined the search space
> you're going to go through, and then partition it to divide the work up
> equally amongst the cluster's cores. Does that sound right?
>
> I wonder if others have already used Spark for computationally-intensive
> workloads like this, as opposed to just data-intensive ones.
>
> Nick
>
>
> ------------------------------
> View this message in context: Using Spark to crack passwords
> <http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-to-crack-passwords-tp7437.html>
> Sent from the Apache Spark User List mailing list archive
> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com.
>

Reply via email to