fixes#22479, fixes#24374, depends on #24429 and #24430
When instantiating generic types which directly have nominal types
(object, distinct, ref/ptr object but not enums[^1]) as their values,
the nominal type is now copied (in the case of ref objects, its child as
well) so that it receives a fresh ID and `typeInst` field. Previously
this only happened if it contained any generic types in its structure,
as is the case for all other types.
This solves #22479 and #24374 by virtue of the IDs being unique, which
is what destructors check for. Technically types containing generic
param fields work for the same reason. There is also the benefit that
the `typeInst` field is correct. However issues like #22445 aren't
solved because the compiler still uses structural object equality checks
for inheritance etc. which could be removed in a later PR.
Also fixes a pre-existing issue where destructors bound to object types
with generic fields would not error when attempting to define a user
destructor after the fact, but the error message doesn't show where the
implicit destructor was created now since it was only created for
another instance. To do this, a type flag is used that marks the generic
type symbol when a generic instance has a destructor created. Reusing
`tfCheckedForDestructor` for this doesn't work.
Maybe there is a nicer design that isn't an overreliance on the ID
mechanism, but the shortcomings of `tyGenericInst` are too ingrained in
the compiler to use for this. I thought about maybe adding something
like `tyNominalGenericInst`, but it's really much easier if the nominal
type itself directly contains the information of its generic parameters,
or at least its "symbol", which the design is heading towards.
[^1]: See [this
test](21420d8b09/lib/std/enumutils.nim (L102))
in enumutils. The field symbols `b0`/`b1` always have the uninstantiated
type `B` because enum fields don't expect to be generic, so no generic
instance of `B` matches its own symbols. Wouldn't expect anyone to use
generic enums but maybe someone does.
fixes#17571
Objects in the VM are represented as object constructor nodes that
contain every single field, including ones in different case branches.
This is so that every field has a unique invariant index in the object
constructor that can be written to and read from. However when
converting this node back into semantic code, fields from inactive case
branches can remain in the constructor which causes bad codegen,
generating assignments to fields from other case branches.
To fix this, fields from inactive branches are now detected in
`semmacrosanity.annotateType` (called in `fixupTypeAfterEval`) and
marked to prevent the codegen of their assignments. In #24441 these
fields were excluded from the resulting node, but this causes issues
when the node is directly supposed to go back into the VM, for example
as `const` values. I don't know if this is the only case where this
happens, so I wasn't sure about how to keep that implementation working.
fixes#24434
In https://github.com/nim-lang/Nim/pull/24432
```nim
let ex = "NIM_EXTERNC N_NIMCALL(void, nimLoadProcs$1)(void) {$2}$N$N" %
[(i.ord - '0'.ord).rope, extract(el)]
```
```nim
procs.addDeclWithVisibility(ExternC):
procs.addProcHeader(ccNimCall, "nimLoadProcs" & $(i.ord - '0'.ord), "void", cProcParams())
```
extern "C" makes a function-name in C++ have C linkage; it should be
effaced with C compiler
Follows up #24423, needed more refactoring than I expected, sorry for
ugly diff.
With this pretty much all of the raw C code generating parts of the
codegen are abstracted into the cbuilder API (to my knowledge at least).
The current design of NIFC does not implement everything the codegen
generates, such things have mostly not been adapted, they are the
following along with how I'm guessing they could be implemented:
* C++ specific codegen: Maybe a dialect of NIFC for generating C++?
* `codegenDecl` pragma: Could be passed as a pragma to NIFC
* C macros, currently only used for line info IIRC i.e. `nimln_(123)`:
Just inline them when generating NIFC
* Other C defines & `#line`: Maybe as NIFC directives or line infos?
* There is also [this
`#ifndef`](21420d8b09/compiler/cgen.nim (L2249))
when generating headers but NIFC shouldn't need it
* `alignof`/`offsetof`: Is in `cbuilder` but not implemented in NIFC,
should be easy
For now we can disable C++ and the `codegenDecl` pragma when generating
NIFC but since cbuilder is mostly designed to generate NIFC as a flag
when booting the compiler, this hinders the ability to run the CI
against NIFC. Maybe we could also make cbuilder able to generate both C
and NIFC at runtime, this would be a large refactor but wouldn't be too
difficult.
Other missing abstractions before being able to generate NIFC are:
* Primitive types and symbols i.e. `int`, `void*`, `NI`, `NIM_NULL` are
currently still constant string literals, `NU8`, `NU16` etc are also
sometimes generated like `"NU" & $bits`.
* NIFC identifiers, i.e. adding `.c` to imported symbols and properly
mangling generated ones. Not sure how difficult this is going to be.
fixes#24402
```nim
iterator myPairsInline*[T](twoDarray: seq[seq[T]]): (int, seq[T]) {.inline.} =
for indexValuePair in twoDarray.pairs:
yield indexValuePair
proc innerTestTotalMem() =
var my2dArray: seq[seq[int32]] = @[]
# fill with some data...
for i in 0'i32..100:
var z = @[i, i+1]
my2dArray.add z
for oneDindex, innerArray in myPairsInline(my2dArray):
discard
innerTestTotalMem()
```
In `for oneDindex, innerArray in myPairsInline(my2dArray)`, `oneDindex`
and `innerArray` becomes `cursors` because they satisfy the criterion of
`isSimpleIteratorVar`. On the one hand, it is not correct to have them
point to the temporary generated by tuple unpacking, which left the
memory of the temporary uncleaned up. On the other hand, we don't need
to generate a temporary for a symbol node when unpacking the tuple.
split from #24425
Matching `tyGenericBody` performs a match on the last child of the
generic body, in this case the uninstantied `tyObject` type. If the
object contains no generic fields, this ends up being the same type as
all instantiated ones, but if it does, a new type is created. This fails
the `sameObjectTypes` check that subtype matching for object types uses.
To fix this, also consider that the pattern type could be the generic
uninstantiated object type of the matched type in subtype matching.
fixes#17958
In `transf`, conversions in subscript expressions are skipped (with
`skipConv`'s rules). This is because array indexing can produce
conversions to the range type that is the array's index type, which
causes a `RangeDefect` rather than an `IndexDefect` (and also
`--rangeChecks` and `--indexChecks` are both considered). However this
causes problems when explicit conversions are used, between types of
different bitsizes, because those also get skipped.
To fix this, we only skip the conversion if:
* it's a hidden (implicit) conversion
* it's a range check conversion (produces `nkChckRange`)
* the subscript is on an array type and the result type of the
conversion has the same bounds as the array index type
And `skipConv` rules also still apply (int/float classification).
Another idea would be to prevent the implicit conversion to the array
index type from being generated. But there is no good way to do this:
matching to the base type instead prevents types like `uint32` from
implicitly converting (i.e. it can convert to `range[0..3]` but not
`int`), and analyzing whether this is an array bound check is easier in
`transf`, since `sigmatch` just produces a type conversion.
The rules for skipping the conversion could also receive some other
tweaks: We could add a rule that changing bitsizes also doesn't skip the
conversion, but this breaks the `uint32` case. We could simplify it to
only removing implicit skips to specifically fix#17958, but this is
wrong in general.
We could also add something like `nkChckIndex` that generates index
errors instead but this is weird when it doesn't have access to the
collection type and it might be overkill.
The lower half of cgen contains the main proc and HCR init code which
cause a large diff, so they are excluded from this PR. In general things
like generated defines, line directives, the stacktrace macros (`nimfr_`
etc) are also not done, since there are not exact equivalents for these
in NIFC (NIFC does generate line directives but based on the NIF line
info mechanism).
fixes#14067, fixes#15004, fixes#19019
Anonymous procs are [added to
scope](8091d76306/compiler/semstmts.nim (L2466))
with the name `:anonymous`. This means that if they have the same
signature in a scope, they can consider each other as redefinitions. To
prevent this, mark their symbols as `sfGenSym` so they do not get added
to scope or cause any name conflicts. The commented out `and not isAnon`
check wouldn't work because `isAnon` would not be true if the proc is
being resemmed, in which case the name field in the proc AST would have
the symbol of the anonymous proc rather than being empty.
There is a separate problem of default values in generic/normal procs
not opening new scopes which is partially responsible for #19019.
This adapts most of ccgstmts to cbuilder, missing C++ exceptions and
`genCaseGeneric`/`genIfForCaseUntil`. Rebased version of #24399 which
was split into some other PRs and since has had conflicts with #24416.
fixes#24415
Since #23978 (which is in 2.0), all generic types that alias to another
type now insert a `tyAlias` layer in their value. However the
`skipGenericAlias` etc code which `sigmatch` uses is not updated for
this, so `tyAlias` is now skipped in these.
The relevant code in sigmatch is:
67ad1ae159/compiler/sigmatch.nim (L1668-L1673)
This behavior is also suspicious IMO, not skipping a structural
`tyGenericInst` alias can be useful for code like #10220, but this is
currently arbitrarily decided based on "depth" and whether the alias is
to another `tyGenericInst` type or not. Maybe in the future we could
enforce the use of a nominal type.
fixes issue described in https://forum.nim-lang.org/t/12579
In #24065 explicit generic parameter matching was made to fail matches
on arguments with unresolved types in generic contexts (the sigmatch
diff, following #24010), similar to what is done for regular calls since
#22029. However unlike regular calls, a failed match in a generic
context for a standalone explicit generic instantiation did not convert
the expression into one with `tyFromExpr` type, which means it would
error immediately given any unresolved parameter. This is now done to
fix the issue.
For explicit generic instantiations on single non-overloaded symbols, a
successful match is still instantiated. For multiple overloads (i.e.
symchoice), if any of the overloads fail the match, the entire
expression is considered untyped and any instantiations are not used, so
as to not void overloads that would match later. This means even
symchoices without unresolved arguments aren't instantiated, which may
be too restrictive, but it could also be too lenient and we might need
to make symchoice instantiations always untyped. The behavior for
symchoice is not sound anyway given it causes #9997 so this is something
to consider for a redesign.
Diff follows #24276.
Not all uses of switch statements are implemented, `genCaseGeneric` and
`genIfForCaseUntil` still exist. The main purpose is to finish
`ccgreset` and `ccgtrav`.
Finishes `genEnumInfo` as followup to #24351. As #24381 mentions this
covers every use of `for` loops in the codegen.
---------
Co-authored-by: Andreas Rumpf <rumpf_a@web.de>
Doing this early is useful so we can move the indentation logic into
`Builder` itself rather than mix it with the block logic in `ccgstmts`
(the `if` statements in #24381 have not been indented properly either).
However it also means `Builder` is now used for code that still
generates raw C code, so the diff won't be as clean when these get
updated.
follows up #24381
This is about another 30% of ccgexprs (remaining is up to around line
3000), stopped right before the point where #24391 is required.
The `Genode::log` and `Genode::Cstring` calls in `genEcho` are left as
is but could be mangled in the future.
C++ member procs are not implemented, and `codegenDecl` in general is
also not adapted, although it had to be included in the generation of
proc params for simplicity. Guessing `codegenDecl` and C++ stuff are
supposed to map to pragmas on NIFC, or are just not supported.
Most of what ccgexprs uses is now ported to cbuilder, so this PR makes
around ~25% of ccgexprs use it, along with adding `if` stmts (no
`while`/`switch` and `for` which is only used as `for (tmp = a; tmp < b;
tmp++)`). The `if` builder does not add indents for blocks since we
can't make `Builder` an object yet rather than an alias to `string`,
this will likely be one of the last refactors.
Somewhat unrelated but `ccgtypes` is not ready yet because proc
signatures are not implemented.
fixes#24378
```nim
type Win = typeof(`body`)
doAssert not supportsCopyMem((int, Win))
```
`semAfterMacroCall` doesn't skip the children aliases types in the tuple
typedesc construction while the normal program seem to skip the aliases
types somewhere
`(int, Win)` is kept as `(int, alias string)` instead of expected `(int,
string)`
The types of the typed operations are not necessarily correct, the mixed
types of the operands in some cases are left the same to keep the C
integer promotion working as expected. To fix any wrong types the
integer promotion can be done explicitly instead, but this is a behavior
change to codegen.
Different from `intLiteral`, we add procs that just try to generate
integer values, assuming they're not an edge case like `<= low(int32)`
or `> high(int32)`.
refs #23625, refs #24289
Encountered in #24360 but could not reproduce minimally: overloading on
static parameters can work with the normal compile commands but crash
`nim check`. Static overloading relies on `tryConstExpr` which recovers
from things like `globalError` and fails softly, in this case this can
happen when a variable etc. is not available to evaluate in the VM. But
with `nim check`, the compiler does not throw an exception in this case,
and instead tries to keep generating the entire expression in the VM,
which can cause crashes.
To fix this, when the compiler has no error outputs even on `nim check`,
we raise a global error so that the VM code generation stops early. This
fixes both `tryConstExpr` and speeds up `nim check`, because no error
outputs means we don't need cascading errors.
pre-existing issues:
```nim
block:
type
FooObj = object
data: int
Foo = ref ref FooObj
proc delete(self: Foo) =
echo self.data
var s: Foo
new(s, delete)
```
it crashed with arc/orc in 1.6.x and 2.x.x
```nim
block:
type
Foo = ref int
proc delete(self: Foo) =
echo self[]
var s: Foo
new(s, delete)
```
The simple fix is to add a type restriction for the type `T` for arc/orc
versions
```nim
proc new*[T: object](a: var ref T, finalizer: proc (x: T) {.nimcall.})
```
fixes#13417, fixes#19703
When passing an expression to an `openarray` iterator parameter: If the
expression is a statement list (considered "complex"), it's assigned in
a non-deep-copying way to a temporary variable first, then this variable
is used as a parameter. If it's not a statement list, i.e. a call or a
symbol, the parameter is substituted directly with the given expression.
In the case of calls, this results in the call potentially being
executed more than once, or can cause redefined variables in the
codegen.
To fix this, calls are also considered as "complex" assignments to
openarrays, as long as the return type of the call is not `openarray` as
the generated assignment in that case has issues/is unimplemented
(caused a segfault [here in
datamancer](47ba4d81bf/src/datamancer/dataframe.nim (L1580))).
As for why creating a temporary isn't the default only with exceptions
for things like `nkSym`, the "non-deep-copying" way of assignment
apparently still causes arrays to be copied according to a comment in
the code. I'm not sure to what extent this is true: if it still happens
on ARC/ORC, if it happens for every array length, or if we can fix it by
passing arrays by reference. Otherwise, a more general way to assign to
openarrays might be needed, but I'm not sure if the compiler can easily
do this.
This PR is somewhat large, worst case it can be split into one with just
assignments and one with just fields/derefs etc.
Assignments with calls as values have not been touched so they can be
done when calls are implemented. Similarly codegen with complex logic
i.e. `genEnumInfo`, `genTypeInfoV2`, `unaryExpr` is not completely
ported yet so they can be done in standalone PRs.
closes https://github.com/nim-lang/RFCs/issues/380, fixes#4773, fixes
#14729, fixes#16755, fixes#18150, fixes#22984, refs #11167 (only some
comments fixed), refs #12620 (needs tiny workaround)
The compiler gains a concept of root "nominal" types (i.e. objects,
enums, distincts, direct `Foo = ref object`s, generic versions of all of
these). Exported top-level routines in the same module as the nominal
types that their parameter types derive from (i.e. with
`var`/`sink`/`typedesc`/generic constraints) are considered attached to
the respective type, as the RFC states. This happens for every argument
regardless of placement.
When a call is overloaded and overload matching starts, for all
arguments in the call that already have a type, we add any operation
with the same name in the scope of the root nominal type of each
argument (if it exists) to the overload match. This also happens as
arguments gradually get typed after every overload match. This restricts
the considered overloads to ones attached to the given arguments, as
well as preventing `untyped` arguments from being forcefully typed due
to unrelated overloads. There are some caveats:
* If no overloads with a name are in scope, type bound ops are not
triggered, i.e. if `foo` is not declared, `foo(x)` will not consider a
type bound op for `x`.
* If overloads in scope do not have enough parameters up to the argument
which needs its type bound op considered, then type bound ops are also
not added. For example, if only `foo()` is in scope, `foo(x)` will not
consider a type bound op for `x`.
In the cases of "generic interfaces" like `hash`, `$`, `items` etc. this
is not really a problem since any code using it will have at least one
typed overload imported. For arbitrary versions of these though, as in
the test case for #12620, a workaround is to declare a temporary
"template" overload that never matches:
```nim
# neither have to be exported, just needed for any use of `foo`:
type Placeholder = object
proc foo(_: Placeholder) = discard
```
I don't know what a "proper" version of this could be, maybe something
to do with the new concepts.
Possible directions:
A limitation with the proposal is that parameters like `a: ref Foo` are
not attached to any type, even if `Foo` is nominal. Fixing this for just
`ptr`/`ref` would be a special case, parameters like `seq[Foo]` would
still not be attached to `Foo`. We could also skip any *structural* type
but this could produce more than one nominal type, i.e. `(Foo, Bar)`
(not that this is hard to implement, it just might be unexpected).
Converters do not use type bound ops, they still need to be in scope to
implicitly convert. But maybe they could also participate in the nominal
type consideration: if `Generic[T] = distinct T` has a converter to `T`,
both `Generic` and `T` can be considered as nominal roots.
The other restriction in the proposal, being in the same scope as the
nominal type, could maybe be worked around by explicitly attaching to
the type, i.e.: `proc foo(x: T) {.attach: T.}`, similar to class
extensions in newer OOP languages. The given type `T` needs to be
obtainable from the type of the given argument `x` however, i.e.
something like `proc foo(x: ref T) {.attach: T.}` doesn't work to fix
the `ref` issue since the compiler never obtains `T` from a given `ref
T` argument. Edit: Since the module is queried now, this is likely not
possible.
---------
Co-authored-by: Andreas Rumpf <rumpf_a@web.de>