[GitHub] [tvm] tqchen commented on issue #9022: [Bug] BuiltinLower does not use alloca for storage on kDLCPU target devices

2021-09-17 Thread GitBox
tqchen commented on issue #9022: URL: https://github.com/apache/tvm/issues/9022#issuecomment-921945722 Thanks @manupa-arm. I agree that putting TVMBAW as a peer to heap is not right(that was meant as an example to demonstrate the viewpoint. I do not necessary want to enforce heap as a

[GitHub] [tvm] tqchen commented on issue #9022: [Bug] BuiltinLower does not use alloca for storage on kDLCPU target devices

2021-09-17 Thread GitBox
tqchen commented on issue #9022: URL: https://github.com/apache/tvm/issues/9022#issuecomment-921820526 Please allow me to explain the overall rationale here, in particular over the term "constraint" - C0: On one hand, we want a "default" memory to be generically accessible (per

[GitHub] [tvm] tqchen commented on issue #9022: [Bug] BuiltinLower does not use alloca for storage on kDLCPU target devices

2021-09-16 Thread GitBox
tqchen commented on issue #9022: URL: https://github.com/apache/tvm/issues/9022#issuecomment-921220669 Thanks @manupa-arm . I understand that proposal R4 can also work by having a pass to convert "global" to something more specialize as a pass (essentially R1 and R4 are not that different

[GitHub] [tvm] tqchen commented on issue #9022: [Bug] BuiltinLower does not use alloca for storage on kDLCPU target devices

2021-09-16 Thread GitBox
tqchen commented on issue #9022: URL: https://github.com/apache/tvm/issues/9022#issuecomment-921134533 Thanks @manupa-arm . Trying to capture some of the discussions. - Right now the "global" scope translate to something that can be accessed by CPU, and there was no requirement of

[GitHub] [tvm] tqchen commented on issue #9022: [Bug] BuiltinLower does not use alloca for storage on kDLCPU target devices

2021-09-16 Thread GitBox
tqchen commented on issue #9022: URL: https://github.com/apache/tvm/issues/9022#issuecomment-920896447 So in the above post I tried to summarize the state. Now let me try to share some of my thoughts based on the summary. First of all, R0 and R1 are not that different in nature.

[GitHub] [tvm] tqchen commented on issue #9022: [Bug] BuiltinLower does not use alloca for storage on kDLCPU target devices

2021-09-16 Thread GitBox
tqchen commented on issue #9022: URL: https://github.com/apache/tvm/issues/9022#issuecomment-920882592 Thanks for the discussions. Before we suggest a resolution, it would be helpful to summarize the discussions so far. # Semantics of Allocate and storage_scope Allocate

[GitHub] [tvm] tqchen commented on issue #9022: [Bug] BuiltinLower does not use alloca for storage on kDLCPU target devices

2021-09-16 Thread GitBox
tqchen commented on issue #9022: URL: https://github.com/apache/tvm/issues/9022#issuecomment-920841982 @manupa-arm in cpu we do not necessarily differentiate local from global for now as they are from the same namespace. I can understand the need from the micro side, and I believe

[GitHub] [tvm] tqchen commented on issue #9022: [Bug] BuiltinLower does not use alloca for storage on kDLCPU target devices

2021-09-15 Thread GitBox
tqchen commented on issue #9022: URL: https://github.com/apache/tvm/issues/9022#issuecomment-920486224 Right, this the gets to the target dependent generation regime where TargetKind attribute is indeed the right solution. We should also send a PR to add comments to that code block so we

[GitHub] [tvm] tqchen commented on issue #9022: [Bug] BuiltinLower does not use alloca for storage on kDLCPU target devices

2021-09-15 Thread GitBox
tqchen commented on issue #9022: URL: https://github.com/apache/tvm/issues/9022#issuecomment-920463412 @mbs-octoml I believe the current behavior is intended. In the context of CPU, we want to preserve small alloca until the code generation point. And then the code will generate