javascript 哈希_如何在JavaScript中实现简单的哈希表-白红宇

javascript 哈希_如何在JavaScript中实现简单的哈希表

阅读量：2524 次

发布时间：2019-05-11

本文共 14348 字，大约阅读时间需要 47 分钟。

javascript 哈希

by Alex Nadalin

通过亚历克斯·纳达林

How beautiful is {}?

{}有多美？

It lets you store values by key, and retrieve them in a very cost-efficient manner (O(1), more on this later).

它使您可以按键存储值，并以非常经济高效的方式检索它们( O(1) ，稍后会介绍更多)。

In this post I want to implement a very basic hash table, and have a look at its inner workings to explain one of the most ingenious ideas in computer science.

在这篇文章中，我想实现一个非常基本的哈希表，并看一下它的内部工作原理，以解释计算机科学中最巧妙的想法之一。

问题 (The problem)

Imagine you’re building a new programming language: you start by having pretty simple types (strings, integers, floats, …) and then proceed to implement very basic data structures. First you come up with the array ([]), then comes the hash table (otherwise known as dictionary, associative array, hashmap, map, and…the list goes on).

想象一下，您正在构建一种新的编程语言：首先要拥有非常简单的类型(字符串，整数，浮点数，…)，然后继续实现非常基本的数据结构。首先，您要提出数组( [] )，然后是哈希表(否则称为字典，关联数组，哈希图，地图，等等。)。

Ever wondered how they work? How they’re so damn fast?

有没有想过它们如何工作？他们怎么这么快

Well, let’s say that JavaScript did not have have {} or new Map(), and let’s implement our very own DumbMap!

好吧，假设JavaScript没有{}或new Map() ，让我们实现我们自己的DumbMap ！

关于复杂性的说明 (A note on complexity)

Before we get the ball rolling, we need to understand how complexity of a function works: Wikipedia has a good refresher on , but I’ll add a brief explanation for the lazy ones.

在开始研究之前，我们需要了解函数的复杂性是如何工作的：Wikipedia对了很好的复习，但是我将为懒惰的人添加一个简短的解释。

Complexity measures how many steps are required by our function — the fewer steps, the faster the execution (also known as “running time”).

复杂度衡量我们的功能需要多少步骤-步骤越少，执行速度越快(也称为“运行时间”)。

Let’s take a look at the following snippet:

让我们看一下以下代码片段：

function fn(n, m) {  return n * m}

The computational complexity (from now simply “complexity”) of fn is O(1), meaning that it’s constant (you can read O(1) as “the cost is one”): no matter what arguments you pass, the platform that runs this code only has to do one operation (multiply n into m). Again, since it’s one operation, the cost is referred as O(1).

fn的计算复杂度(从现在开始简称为“复杂度”)为O(1) ，这意味着它是常量(您可以将O(1)读为“ 成本为一 ”)：无论您传递什么参数，平台都会运行此代码只需执行一个操作(将n乘以m )。同样，由于这是一次操作，因此成本称为O(1) 。

Complexity is measured by assuming arguments of your function could have very large values. Let’s look at this example:

通过假设函数的参数可以具有非常大的值来衡量复杂性。让我们来看这个例子：

function fn(n, m) {  let s = 0

for (i = 0; i < 3; i++) {    s += n * m  }

return s}

You would think its complexity is O(3), right?

您会认为它的复杂度是O(3) ，对吗？

Again, since complexity is measured in the context of very large arguments, we tend to “drop” constants and consider O(3) the same as O(1). So, even in this case, we would say that the complexity of fn is O(1). No matter what the value of n and m are, you always end up doing three operations — which, again, is a constant cost (therefore O(1)).

同样，由于复杂度是在非常大的参数的上下文中测量的，因此我们倾向于“丢弃”常量，并认为O(3)与O(1)相同。因此，即使在这种情况下，我们也可以说fn的复杂度为O(1) 。不管n和m的值是多少，您总是要进行三个操作-这又是一个不变的成本(因此为O(1) )。

Now this example is a little bit different:

现在这个例子有些不同：

function fn(n, m) {  let s = []

for (i = 0; i < n; i++) {    s.push(m)  }

return s}

As you see, we’re looping as many times as the value of n, which could be in the millions. In this case, we define the complexity of this function as O(n), as you will need to do as many operations as the value of one of your arguments.

如您所见，我们循环的次数是n的值，这可能是数百万。在这种情况下，我们将此函数的复杂度定义为O(n) ，因为您将需要执行与其中一个参数的值一样多的运算。

Other examples?

还有其他例子吗？

function fn(n, m) {  let s = []

for (i = 0; i < 2 * n; i++) {    s.push(m)  }

return s}

This examples loops 2 * n times, meaning the complexity should be O(2n). Since we mentioned that constants are “ignored” when calculating the complexity of a function, this example is also classified as O(n).

此示例循环2 * n次，这意味着复杂度应为O(2n) 。由于我们提到在计算函数的复杂度时常量被“忽略”，因此此示例也归类为O(n) 。

One more?

多一个？

function fn(n, m) {  let s = []

for (i = 0; i < n; i++) {    for (i = 0; i < n; i++) {      s.push(m)    }  }

return s}

Here we are looping over n and looping again inside the main loop, meaning the complexity is “squared” (n * n): if n is 2, we will run s.push(m) 4 times, if 3 we will run it 9 times, and so on.

在这里，我们在n循环并在主循环内再次循环，这意味着复杂度是“平方”( n * n )：如果n为2，则将运行s.push(m) 4次，如果3，则将运行s.push(m) 9次，依此类推。

In this case, the complexity of the function is referred as O(n²).

在这种情况下，函数的复杂度称为O(n²) 。

One last example?

最后一个例子？

function fn(n, m) {  let s = []

for (i = 0; i < n; i++) {    s.push(n)  }

for (i = 0; i < m; i++) {    s.push(m)  }

return s}

In this case we don’t have nested loops, but we loop twice over two different arguments: the complexity is defined as O(n+m). Crystal clear.

在这种情况下，我们没有嵌套循环，但是我们在两个不同的参数上循环了两次：复杂度定义为O(n+m) 。晶莹剔透。

Now that you’ve just gotten a brief introduction (or refresher) on complexity, it’s very easy to understand that a function with complexity O(1) is going to perform much better than one with O(n).

现在，您已经对复杂性进行了简要介绍(或复习)，很容易理解，复杂度为O(1)的函数的性能将比O(n)更好。

Hash tables have a O(1) complexity: in layman’s terms, they’re superfast. Let’s move on.

哈希表的复杂度为O(1) ：用外行的话来说，它们是超快的 。让我们继续。

(I’m kinda lying on hash tables always having O(1)complexity, but just read on ;))

(我有点躺在总是具有O(1)复杂性的哈希表上，但请继续阅读；))

让我们建立一个(哑)哈希表 (Let’s build a (dumb) hash table)

Our hash table has 2 simple methods — set(x, y) and get(x). Let’s start writing some code:

我们的哈希表有2个简单的方法set(x, y)和get(x) 。让我们开始编写一些代码：

And let’s implement a very simple, inefficient way to store these key-value pairs and retrieve them later on. We first start by storing them in an internal array (remember, we can’t use {} since we are implementing {} — mind blown!):

并且让我们实现一种非常简单，效率低下的方法来存储这些键值对并在以后检索它们。我们首先通过一个内部数组存储它们开始(请记住，我们不能使用{}因为我们正在实施{} -记吹！)：

Then it’s simply a matter of getting the right element from the list:

然后，只需从列表中获取正确的元素即可：

Our full example:

我们的完整示例：

Our DumbMap is amazing! It works right out of the box, but how will it perform when we add a large amount of key-value pairs?

我们的DumbMap很棒！它开箱即用，但是当我们添加大量键值对时它将如何执行？

Let’s try a simple benchmark. We will first try to find a non-existing element in a hash table with very few elements, and then try the same in one with a large quantity of elements:

让我们尝试一个简单的基准。我们将首先尝试在具有很少元素的哈希表中找到一个不存在的元素，然后在具有大量元素的一个哈希表中尝试相同的元素：

The results? Not so encouraging:

结果？不太令人鼓舞：

with very few records in the map: 0.118mswith lots of records in the map: 14.412ms

In our implementation, we need to loop through all the elements inside this.list in order to find one with the matching key. The cost is O(n), and it’s quite terrible.

在我们的实现中，我们需要遍历this.list中的所有元素，以便找到具有匹配键的元素。成本是O(n) ，这非常可怕。

快一点 (Make it fast(er))

We need to find a way to avoid looping through our list: time to put hash back into the hash table.

我们需要找到一种避免循环遍历列表的方法：是时候将哈希值放回到哈希表中了 。

Ever wondered why this data structure is called a hash table? That’s because a hashing function is used on the keys that you set and get. We will use this function to turn our key into an integer i, and store our value at index i of our internal list. Since accessing an element, by its index, from a list has a constant cost (O(1)), then the hash table will also have a cost of O(1).

有没有想过为什么将此数据结构称为哈希表？这是因为在设置和获取的键上使用了哈希函数。我们将使用此函数将键转换为整数i ，并将值存储在内部列表的索引i处。由于从列表访问元素的索引具有恒定的成本( O(1) )，因此哈希表的成本也为O(1) 。

Let’s try this out:

让我们尝试一下：

Here we are using the module, which simply converts a string to a numeric hash. We use it to store and fetch elements at index hash(key) of our list. The results?

在这里，我们使用模块，该模块仅将字符串转换为数字哈希。我们使用它在列表的索引hash(key)上存储和获取元素。结果？

with lots of records in the map: 0.013ms

W — O — W. This is what I’m talking about!

W — O —W。这就是我在说的！

We don’t have to loop through all elements in the list, and retrieving elements from DumbMap is super fast!

我们不必遍历列表中的所有元素，并且从DumbMap检索元素非常快！

Let me put this as straightforwardly as possible: hashing is what makes hash tables extremely efficient. No magic. Nothing more. Nada. Just a simple, clever, ingenious idea.

让我尽可能直截了当地说： 哈希是使哈希表非常高效的原因 。没魔术而已。娜达只是一个简单，聪明，巧妙的想法。

选择正确的哈希函数的成本 (The cost of picking the right hashing function)

Of course, picking a fast hashing function is very important. If our hash(key) runs in a few seconds, our function will be quite slow regardless of its complexity.

当然， 选择快速哈希函数非常重要。 如果我们的hash(key)在几秒钟内运行，无论其复杂程度如何，我们的函数都会变得非常慢。

At the same time, it’s very important to make sure that our hashing function doesn’t produce a lot of collisions, as they would be detrimental to the complexity of our hash table.

同时， 确保我们的哈希函数不会产生大量冲突非常重要 ，因为它们将不利于哈希表的复杂性。

Confused? Let’s take a closer look at collisions.

困惑？让我们仔细看看碰撞。

碰撞 (Collisions)

You might think “Ah, a good hashing function never generates collisions!”: well, come back to the real world and think again. , and it’s just a matter of time, or computational power, before a hashing function cracks and returns the same hash for 2 different inputs. Always assume your hashing function generates collisions, and implement the right defense against such cases.

您可能会认为“ 啊，好的哈希函数永远不会产生冲突！ ”：恩，回到现实世界再三思。，而散列函数破解并为2个不同的输入返回相同的散列只是时间或计算能力的问题。始终假设您的哈希函数会产生冲突，并对这种情况实施正确的防御。

Case in point, let’s try to use a hash() function that generates a lot of collisions:

举例来说，让我们尝试使用会产生大量冲突的hash()函数：

This function uses an array of 10 elements to store values, meaning that elements are likely to be replaced — a nasty bug in our DumbMap:

此函数使用10个元素组成的数组存储值，这意味着元素很可能会被替换— DumbMap一个讨厌的bug：

In order to resolve the issue, we can simply store multiple key-value pairs at the same index. So let’s amend our hash table:

为了解决此问题，我们可以简单地在同一索引中存储多个键值对。因此，让我们修改哈希表：

As you might notice, here we fall back to our original implementation: store a list of key-value pairs and loop through each of them. This is going to be quite slow when there are a lot of collisions for a particular index of the list.

您可能会注意到，这里我们回到原始的实现方式：存储键值对的列表并遍历每个键值对。当列表的特定索引存在很多冲突时，这将非常慢。

Let’s benchmark this using our own hash() function that generates indexes from 1 to 10:

让我们使用我们自己的hash()函数对其进行基准测试，该函数生成从1到10的索引：

with lots of records in the map: 11.919ms

and by using the hash function from string-hash, which generates random indexes:

并使用string-hash的哈希函数生成随机索引：

with lots of records in the map: 0.014ms

Whoa! There’s the cost of picking the right hashing function — fast enough that it doesn’t slow our execution down on its own, and good enough that it doesn’t produce a lot of collisions.

哇！选择正确的哈希函数需要付出代价-足够快以至于不会自行降低执行速度，并且足够好以至于不会产生很多冲突。

通常为O(1) (Generally O(1))

Remember my words?

记得我的话吗？

Hashtables have a O(1) complexity

哈希表的复杂度为O(1)

Well, I lied: the complexity of a hash table depends on the hashing function you pick. The more collisions you generate, the more the complexity tends toward O(n).

好吧，我撒谎：哈希表的复杂度取决于您选择的哈希函数。生成的冲突越多，则复杂度趋于O(n) 。

A hashing function such as:

哈希函数，例如：

function hash(key) {  return 0}

would mean that our hash table has a complexity of O(n).

这意味着我们的哈希表的复杂度为O(n) 。

This is why, in general, computational complexity has three measures: best, average, and worst-case scenarios. Hashtables have a O(1) complexity in best and average case scenarios, but fall to O(n)in their worst-case scenario.

这就是为什么通常情况下计算复杂度具有三个度量的原因：最佳，平均和最坏情况。哈希表在最佳和平均情况下的复杂度为O(1) ，但在最坏情况下的复杂度为O(n) 。

Remember: a good hashing function is the key to an efficient hash table — nothing more, nothing less.

请记住： 良好的哈希函数是高效哈希表的关键 -仅此而已。

有关碰撞的更多信息…… (More on collisions…)

The technique we used to fix DumbMap in case of collisions is called : we store all the key-pairs that generate collisions in a list and loop through them.

我们在发生冲突时用于修复DumbMap的技术称为：我们将所有生成冲突的键对存储在一个列表中，并循环遍历它们。

Another popular technique is :

另一种流行的技术是：

at each index of our list we store one and one only key-value pair
在列表的每个索引处，我们存储一个和一个唯一的键值对

when trying to store a pair at index x, if there’s already a key-value pair, try to store our new pair at x + 1
尝试将一对存储在索引x ，如果已经有一个键值对，请尝试将新对存储在x + 1

if x + 1 is taken, try x + 2 and so on…
如果采用x + 1 ，请尝试x + 2 ，依此类推...

when retrieving an element, hash the key and see if the element at that position (x) matches our key
检索元素时，对键进行哈希处理，看看该位置( x )上的元素是否与我们的键匹配

if not, try to access the element at position x + 1
如果不是，请尝试访问位置x + 1的元素

rinse and repeat until you get to the end of the list, or when you find an empty index — that means our element is not in the hash table
冲洗并重复直到您到达列表的末尾，或者当您找到空索引时-这意味着我们的元素不在哈希表中

Smart, simple, elegant and !

聪明，简单，优雅，！

常见问题解答(或TL; DR) (FAQs (or TL;DR))

哈希表会哈希我们存储的值吗？ (Does a hash table hash the values we’re storing?)

No, keys are hashed so that they can be turned into an integer i, and both keys and values are stored at position i in a list.

不，键是散列的，以便可以将它们转换为整数i ，并且键和值都存储在列表中的位置i 。

哈希表使用的哈希函数会产生冲突吗？ (Do the hashing functions used by hash tables generate collisions?)

Absolutely — so hash tables are implemented with to avoid nasty bugs.

绝对是这样-因此哈希表是采用来实现的，以避免讨厌的错误。

哈希表在内部使用列表还是链接列表？ (Do hash tables use a list or a linked list internally?)

It depends, . In our examples, we use the JavaScript array ([]) that can be :

这取决于。在我们的示例中，我们使用可以JavaScript数组( [] )：

> a = []

> a[3] = 1

> a[ <3 empty items>, 1 ]

您为什么选择JavaScript作为示例？ JS数组是哈希表！ (Why did you pick JavaScript for the examples? JS arrays ARE hash tables!)

For example:

例如：

>  a = [][]

> a["some"] = "thing"'thing'

> a[ some: 'thing' ]

> typeof a'object'

I know, damn JavaScript.

我知道，该死JavaScript。

JavaScript is “universal” and probably the easiest language to understand when looking at some sample code. JS might not be the best language, but I hope these examples are clear enough.

JavaScript是“通用的”，并且可能是查看某些示例代码时最容易理解的语言。 JS可能不是最好的语言，但是我希望这些例子足够清楚。

您的示例确实是哈希表的良好实现吗？真的那么简单吗？ (Is your example a really good implementation of a hash table? Is it really THAT simple?)

No, not at all.

一点都不。

Have a look at “” by , as it will give you a bit more context. There’s a lot more to learn, so I would also suggest that you check out:

看看 “用 ”，因为它将为您提供更多的上下文。还有很多东西要学习，所以我也建议您检查一下：

到底… (In the end…)

Hash tables are a very clever idea we use on a regular basis: no matter whether you create a , an or a . They all share the same concepts and beautifully work to let us store and retrieve element by an identifier, at a (most likely) constant cost.

哈希表是我们经常使用的一个非常聪明的主意：无论您是创建的还是的。它们都具有相同的概念，并且通过精美的工作使我们能够以(最可能的)固定成本通过标识符存储和检索元素。

Hope you enjoyed this article, and feel free to share your feedback with me.

希望您喜欢这篇文章，并随时与我分享您的反馈。

A special thanks goes to who helped me by reviewing this article.

特别感谢回顾了这篇文章对我的帮助。

Adios!

阿迪奥斯！