request.  This can be conceived as an alternate, more capable resolution of
  https://github.com/nim-lang/Nim/issues/12200
than
  https://github.com/nim-lang/Nim/pull/12208

The code re-org idea here is to upgrade tablimpl.nim:`delImpl`/`delImplIdx`
to abstract client code conventions for cell emptiness & cell hashing via
three new template arguments - `makeEmpty`, `cellEmpty`, `cellHash` which
all take a single integer argument and clear a cell, test if clear or
produce the hash of the key stored at that index in `.data[]`.

Then we update the 3 call sites (`Table`, `CountTable`, `SharedTable`) of
`delImpl`/`delImplIdx` by defining define those arguments just before the
first invocation as non-exported templates.

Because `CountTable` does not save hash() outputs as `.hcode`, it needs a
new tableimpl.nim:`delImplNoHCode` which simply in-lines the hash search
when no `.hcode` field is available for "prefix compare" acceleration.
It is conceivable this new template could be used by future variants, such
as one optimized for integer keys where `hash()` and `==` are fast and
`.hcode` is both wasted space & time (though a small change to interfaces
there for a sentinel key meaning "empty" is needed for maximum efficiency).

We also eliminate the old O(n) `proc remove(CountTable...)` in favor of
simply invoking the new `delImpl*` templates and take care to correctly
handle the case where `val` is either zero for non-existent keys in `inc`
or evolves to zero over time in `[]=` or `inc`.

The only user-visible changes from the +-42 delta here are speed, iteration
order post deletes, and relaxing the `Positive` constraint on `val` in
`proc inc` again, as indicated in the `changelog.md` entry.

(cherry picked from commit b2a1944587)
This commit is contained in:
c-blake
2020-07-28 17:48:50 -04:00
committed by narimiran
parent 378dc7c9fb
commit c804e559ad
4 changed files with 90 additions and 42 deletions

View File

@@ -78,6 +78,55 @@
- Fix a bug where calling `close` on io streams in osproc.startProcess was a noop and led to
hangs if a process had both reads from stdin and writes (eg to stdout).
- The callback that is passed to `system.onThreadDestruction` must now be `.raises: []`.
- The callback that is assigned to `system.onUnhandledException` must now be `.gcsafe`.
- `osproc.execCmdEx` now takes an optional `input` for stdin, `workingDir` and `env`
parameters.
- Added a `ssl_config` module containing lists of secure ciphers as recommended by
[Mozilla OpSec](https://wiki.mozilla.org/Security/Server_Side_TLS)
- `net.newContext` now defaults to the list of ciphers targeting
["Intermediate compatibility"](https://wiki.mozilla.org/Security/Server_Side_TLS#Intermediate_compatibility_.28recommended.29)
per Mozilla's recommendation instead of `ALL`. This change should protect
users from the use of weak and insecure ciphers while still provides
adequate compatibility with the majority of the Internet.
- A new module `std/jsonutils` with hookable `jsonTo,toJson,fromJson` operations for json
serialization/deserialization of custom types was added.
- A new proc `heapqueue.find[T](heap: HeapQueue[T], x: T): int` to get index of element ``x``
was added.
- Added `rstgen.rstToLatex` convenience proc for `renderRstToOut` and `initRstGenerator`
with `outLatex` output.
- Added `os.normalizeExe`, e.g.: `koch` => `./koch`.
- `macros.newLit` now preserves named vs unnamed tuples; use `-d:nimHasWorkaround14720`
to keep old behavior.
- Added `random.gauss`, that uses the ratio of uniforms method of sampling from a Gaussian distribution.
- Added `typetraits.elementType` to get element type of an iterable.
- `typetraits.$` changes: `$(int,)` is now `"(int,)"` instead of `"(int)"`;
`$tuple[]` is now `"tuple[]"` instead of `"tuple"`;
`$((int, float), int)` is now `"((int, float), int)"` instead of `"(tuple of (int, float), int)"`
- Added `macros.extractDocCommentsAndRunnables` helper
- `strformat.fmt` and `strformat.&` support `= specifier`. `fmt"{expr=}"` now
expands to `fmt"expr={expr}"`.
- deprecations: `os.existsDir` => `dirExists`, `os.existsFile` => `fileExists`
- Added `jsre` module, [Regular Expressions for the JavaScript target.](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions)
- Made `maxLines` argument `Positive` in `logging.newRollingFileLogger`,
because negative values will result in a new file being created for each logged
line which doesn't make sense.
- Changed `log` in `logging` to use proper log level on JavaScript target,
e.g. `debug` uses `console.debug`, `info` uses `console.info`, `warn` uses `console.warn`, etc.
- Tables, HashSets, SharedTables and deques don't require anymore that the passed
initial size must be a power of two - this is done internally.
Proc `rightSize` for Tables and HashSets is deprecated, as it is not needed anymore.
`CountTable.inc` takes `val: int` again not `val: Positive`; I.e. it can "count down" again.
- Removed deprecated symbols from `macros` module, deprecated as far back as `0.15`.
## Language changes
- In newruntime it is now allowed to assign discriminator field without restrictions as long as case object doesn't have custom destructor. Discriminator value doesn't have to be a constant either. If you have custom destructor for case object and you do want to freely assign discriminator fields, it is recommended to refactor object into 2 objects like this:
```nim

View File

@@ -140,6 +140,10 @@ proc hasKeyOrPut*[A, B](t: var SharedTable[A, B], key: A, val: B): bool =
withLock t:
hasKeyOrPutImpl(enlarge)
template tabMakeEmpty(i) = t.data[i].hcode = 0
template tabCellEmpty(i) = isEmpty(t.data[i].hcode)
template tabCellHash(i) = t.data[i].hcode
proc withKey*[A, B](t: var SharedTable[A, B], key: A,
mapper: proc(key: A, val: var B, pairExists: var bool)) =
## Computes a new mapping for the ``key`` with the specified ``mapper``
@@ -181,7 +185,7 @@ proc withKey*[A, B](t: var SharedTable[A, B], key: A,
if pairExists:
mapper(t.data[index].key, t.data[index].val, pairExists)
if not pairExists:
delImplIdx(t, index)
delImplIdx(t, index, tabMakeEmpty, tabCellEmpty, tabCellHash)
else:
var val: B
mapper(key, val, pairExists)
@@ -202,7 +206,7 @@ proc add*[A, B](t: var SharedTable[A, B], key: A, val: B) =
proc del*[A, B](t: var SharedTable[A, B], key: A) =
## deletes `key` from hash table `t`.
withLock t:
delImpl()
delImpl(tabMakeEmpty, tabCellEmpty, tabCellHash)
proc len*[A, B](t: var SharedTable[A, B]): int =
## number of elements in `t`

View File

@@ -78,22 +78,22 @@ template hasKeyOrPutImpl(enlarge) {.dirty.} =
maybeRehashPutImpl(enlarge)
else: result = true
template delImplIdx(t, i) =
template delImplIdx(t, i, makeEmpty, cellEmpty, cellHash) =
let msk = maxHash(t)
if i >= 0:
dec(t.counter)
block outer:
while true: # KnuthV3 Algo6.4R adapted for i=i+1 instead of i=i-1
var j = i # The correctness of this depends on (h+1) in nextTry,
var j = i # The correctness of this depends on (h+1) in nextTry
var r = j # though may be adaptable to other simple sequences.
t.data[i].hcode = 0 # mark current EMPTY
makeEmpty(i) # mark current EMPTY
t.data[i].key = default(type(t.data[i].key))
t.data[i].val = default(type(t.data[i].val))
while true:
i = (i + 1) and msk # increment mod table size
if isEmpty(t.data[i].hcode): # end of collision cluster; So all done
if cellEmpty(i): # end of collision cluster; So all done
break outer
r = t.data[i].hcode and msk # "home" location of key@i
r = cellHash(i) and msk # initial probe index for key@slot i
if not ((i >= r and r > j) or (r > j and j > i) or (j > i and i >= r)):
break
when defined(js):
@@ -101,10 +101,19 @@ template delImplIdx(t, i) =
else:
t.data[j] = move(t.data[i]) # data[j] will be marked EMPTY next loop
template delImpl() {.dirty.} =
template delImpl(makeEmpty, cellEmpty, cellHash) {.dirty.} =
var hc: Hash
var i = rawGet(t, key, hc)
delImplIdx(t, i)
delImplIdx(t, i, makeEmpty, cellEmpty, cellHash)
template delImplNoHCode(makeEmpty, cellEmpty, cellHash) {.dirty.} =
if t.dataLen > 0:
var i: Hash = hash(key) and maxHash(t)
while not cellEmpty(i):
if t.data[i].key == key:
delImplIdx(t, i, makeEmpty, cellEmpty, cellHash)
break
i = nextTry(i, maxHash(t))
template clearImpl() {.dirty.} =
for i in 0 ..< t.dataLen:

View File

@@ -499,6 +499,10 @@ proc add*[A, B](t: var Table[A, B], key: A, val: B) =
## (key, value) pair in the table without introducing duplicates.
addImpl(enlarge)
template tabMakeEmpty(i) = t.data[i].hcode = 0
template tabCellEmpty(i) = isEmpty(t.data[i].hcode)
template tabCellHash(i) = t.data[i].hcode
proc del*[A, B](t: var Table[A, B], key: A) =
## Deletes ``key`` from hash table ``t``. Does nothing if the key does not exist.
##
@@ -512,7 +516,7 @@ proc del*[A, B](t: var Table[A, B], key: A) =
a.del('z')
doAssert a == {'b': 9, 'c': 13}.toTable
delImpl()
delImpl(tabMakeEmpty, tabCellEmpty, tabCellHash)
proc pop*[A, B](t: var Table[A, B], key: A, val: var B): bool =
## Deletes the ``key`` from the table.
@@ -540,7 +544,7 @@ proc pop*[A, B](t: var Table[A, B], key: A, val: var B): bool =
result = index >= 0
if result:
val = move(t.data[index].val)
delImplIdx(t, index)
delImplIdx(t, index, tabMakeEmpty, tabCellEmpty, tabCellHash)
proc take*[A, B](t: var Table[A, B], key: A, val: var B): bool {.inline.} =
## Alias for:
@@ -2218,19 +2222,6 @@ proc enlarge[A](t: var CountTable[A]) =
if t.data[i].val != 0: ctRawInsert(t, n, move t.data[i].key, move t.data[i].val)
swap(t.data, n)
proc remove[A](t: var CountTable[A], key: A) =
var n: seq[tuple[key: A, val: int]]
newSeq(n, len(t.data))
var removed: bool
for i in countup(0, high(t.data)):
if t.data[i].val != 0:
if t.data[i].key != key:
ctRawInsert(t, n, move t.data[i].key, move t.data[i].val)
else:
removed = true
swap(t.data, n)
if removed: dec(t.counter)
proc rawGet[A](t: CountTable[A], key: A): int =
if t.data.len == 0:
return -1
@@ -2244,7 +2235,7 @@ template ctget(t, key, default: untyped): untyped =
var index = rawGet(t, key)
result = if index >= 0: t.data[index].val else: default
proc inc*[A](t: var CountTable[A], key: A, val: Positive = 1)
proc inc*[A](t: var CountTable[A], key: A, val = 1)
# ----------------------------------------------------------------------
@@ -2287,6 +2278,10 @@ proc `[]`*[A](t: CountTable[A], key: A): int =
assert(not t.isSorted, "CountTable must not be used after sorting")
ctget(t, key, 0)
template cntMakeEmpty(i) = t.data[i].val = 0
template cntCellEmpty(i) = t.data[i].val == 0
template cntCellHash(i) = hash(t.data[i].key)
proc `[]=`*[A](t: var CountTable[A], key: A, val: int) =
## Inserts a ``(key, value)`` pair into ``t``.
##
@@ -2297,7 +2292,7 @@ proc `[]=`*[A](t: var CountTable[A], key: A, val: int) =
assert(not t.isSorted, "CountTable must not be used after sorting")
assert val >= 0
if val == 0:
t.remove(key)
delImplNoHCode(cntMakeEmpty, cntCellEmpty, cntCellHash)
else:
let h = rawGet(t, key)
if h >= 0:
@@ -2305,11 +2300,8 @@ proc `[]=`*[A](t: var CountTable[A], key: A, val: int) =
else:
insertImpl()
proc inc*[A](t: var CountTable[A], key: A, val: Positive = 1) =
proc inc*[A](t: var CountTable[A], key: A, val = 1) =
## Increments ``t[key]`` by ``val`` (default: 1).
##
## ``val`` must be a positive number. If you need to decrement a value,
## use a regular ``Table`` instead.
runnableExamples:
var a = toCountTable("aab")
a.inc('a')
@@ -2320,9 +2312,11 @@ proc inc*[A](t: var CountTable[A], key: A, val: Positive = 1) =
var index = rawGet(t, key)
if index >= 0:
inc(t.data[index].val, val)
if t.data[index].val == 0: dec(t.counter)
if t.data[index].val == 0:
delImplIdx(t, index, cntMakeEmpty, cntCellEmpty, cntCellHash)
else:
insertImpl()
if val != 0:
insertImpl()
proc smallest*[A](t: CountTable[A]): tuple[key: A, val: int] =
## Returns the ``(key, value)`` pair with the smallest ``val``. Efficiency: O(n)
@@ -2383,8 +2377,6 @@ proc len*[A](t: CountTable[A]): int =
proc del*[A](t: var CountTable[A], key: A) {.since: (1, 1).} =
## Deletes ``key`` from table ``t``. Does nothing if the key does not exist.
##
## O(n) complexity.
##
## See also:
## * `pop proc<#pop,CountTable[A],A,int>`_
## * `clear proc<#clear,CountTable[A]>`_ to empty the whole table
@@ -2397,7 +2389,7 @@ proc del*[A](t: var CountTable[A], key: A) {.since: (1, 1).} =
a.del('c')
assert a == toCountTable("aa")
remove(t, key)
delImplNoHCode(cntMakeEmpty, cntCellEmpty, cntCellHash)
proc pop*[A](t: var CountTable[A], key: A, val: var int): bool {.since: (1, 1).} =
## Deletes the ``key`` from the table.
@@ -2405,8 +2397,6 @@ proc pop*[A](t: var CountTable[A], key: A, val: var int): bool {.since: (1, 1).}
## mapping of the key. Otherwise, returns ``false``, and the ``val`` is
## unchanged.
##
## O(n) complexity.
##
## See also:
## * `del proc<#del,CountTable[A],A>`_
## * `clear proc<#clear,CountTable[A]>`_ to empty the whole table
@@ -2423,7 +2413,7 @@ proc pop*[A](t: var CountTable[A], key: A, val: var int): bool {.since: (1, 1).}
result = index >= 0
if result:
val = move(t.data[index].val)
remove(t, key)
delImplIdx(t, index, cntMakeEmpty, cntCellEmpty, cntCellHash)
proc clear*[A](t: var CountTable[A]) =
## Resets the table so that it is empty.
@@ -2717,8 +2707,6 @@ proc len*[A](t: CountTableRef[A]): int =
proc del*[A](t: CountTableRef[A], key: A) {.since: (1, 1).} =
## Deletes ``key`` from table ``t``. Does nothing if the key does not exist.
##
## O(n) complexity.
##
## See also:
## * `pop proc<#pop,CountTableRef[A],A,int>`_
## * `clear proc<#clear,CountTableRef[A]>`_ to empty the whole table
@@ -2730,8 +2718,6 @@ proc pop*[A](t: CountTableRef[A], key: A, val: var int): bool {.since: (1, 1).}
## mapping of the key. Otherwise, returns ``false``, and the ``val`` is
## unchanged.
##
## O(n) complexity.
##
## See also:
## * `del proc<#del,CountTableRef[A],A>`_
## * `clear proc<#clear,CountTableRef[A]>`_ to empty the whole table