-
Notifications
You must be signed in to change notification settings - Fork 467
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
re-entrancy, vm bugs, double spending oh my #375
Comments
some interesting reading from ethereum land to avoid reentrancy issues like the dao Static Call: ethereum/EIPs#214 |
sth important to talk about is the structure of our code implemening the VM vs what sort of contracts can be written given the defined execution structure. Currently this is very conflated in the code base making it very hard to reason about things. Do independent of how we want to move forward I believe it is critical to transition the current protoype impl of the vm into one that is well structured using abstractions that make it easy to reason about such things + test for them.
|
@dignifiedquire specifically, what is conflated? Be as precise as possible
…On Sun, Apr 29, 2018, 8:49 PM Friedel Ziegelmayer ***@***.***> wrote:
sth important to talk about is the structure of our code implemening the
VM vs what sort of contracts can be written given the defined execution
structure. Currently this is very conflated in the code base making it very
hard to reason about things. Do independent of how we want to move forward
I believe it is critical to transition the current protoype impl of the vm
into one that is well structured using abstractions that make it easy to
reason about such things + test for them.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#375 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABL4HN2Ay6duprjxEUwe0jL4-HjQdZt5ks5ttai4gaJpZM4Trsmu>
.
|
on my phone atm, will write up sth more detailed later
…On 29. Apr 2018, 14:00 +0200, Whyrusleeping ***@***.***>, wrote:
@dignifiedquire specifically, what is conflated? Be as precise as possible
On Sun, Apr 29, 2018, 8:49 PM Friedel Ziegelmayer ***@***.***>
wrote:
> sth important to talk about is the structure of our code implemening the
> VM vs what sort of contracts can be written given the defined execution
> structure. Currently this is very conflated in the code base making it very
> hard to reason about things. Do independent of how we want to move forward
> I believe it is critical to transition the current protoype impl of the vm
> into one that is well structured using abstractions that make it easy to
> reason about such things + test for them.
>
>
> —
> You are receiving this because you were assigned.
> Reply to this email directly, view it on GitHub
> <#375 (comment)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/ABL4HN2Ay6duprjxEUwe0jL4-HjQdZt5ks5ttai4gaJpZM4Trsmu>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
This is me trying to make sense of the questions at hand with a few observations: assumptionsI'm making the following assumptions about the future of the VM (which may be wrong):
store vs. cacheIf these are true, we could pull the WithStorage outside of the method execution so that the VM has a reference to the memory (I think we should do this anyway). If we were to persist the memory at the beginning of each call to send and restore the persisted value (which may have changed), we should have all our bases covered. If we use actor caching, each method invocation will use the same actor, so there will never be a question about which version of an actor's memory you are using. understanding re-entrancySo with:
The first call to f() will get Mb' and the second will get Mb''. This is sane in the sense that, at each step, there is only one answer to the question of "what is B's Memory?". There is a deeper question of whether it is sane for B to message C and have it's memory changed out from under it by the time Send returns. If the answer to this is no, then I think banning re-entrancy is the only answer. partial banIf we do ban re-entrancy, there's still a question about whether we ban messages that have no method. I can imagine an actor short on funds could try calling in some debts before acting. This would require a message that looks re-entrant, but doesn't actually call code that could alter memory. |
It doesnt appear that anyone is actually talking about re-entrancy here. The worry seems to be cyclic actor invocations. In that case, this should be treated in the same way that it would if this were all running inside one program, and each actor were an object. all modifications to memory should persist throughout the entire call stack, even across actors. If a revert occurs (the program 'throws'), then all changes for this entire message processing event are reverted. |
The issue was that we were saving in both places, first in the inner and then it would get overwritten by the outer save. In addition there was one place where we weren't saving (transfer).
We do have too little tests covering this issue, I will not dispute that. But, and this is very important, we are not writing and designing a user facing VM right now. So the even if our implementation allows in theory for these kinds of bugs, they are not exploitable, unless a majority of the miners suddenly adopts a new actor.
This means you are likely using WithStorage(func() {
// change things in the storage
})
// do a Send to another actor, which should be able to view changes if exposed
// though there might be a bug in the current implementation
ret, err := ctx.Send(..)
WithStorage(func() {
// use ret to change the storage
})
Yes, and for now I think that is fine, as we are not implementing a user facing VM.
I don't think this should be the first step. Documenting the current model, as the current intentions are should be the first step.
Could you explain what you mean with this?
I think this is a must, the current implementation was not meant to stay in it's form, with manual
I think this is an important discussion to be had, but not today. Figuring the right path forward here is not going to be simple and will take time and research. We should schedule this as part of the work to be done to ship a user facing VM I would suggest.
For now, ever is a very strong word that I wouldn't use quite yet.
This is enforced right now, in the sense that the exposed ABI does not allow you to retrieve other actors.
No, Yes, I agree the actual worries called reentrancy in this thread seem to focus on cyclic/recursive actor invocations, not on the actual reentrancy problem. Expanding on the issues with the code that I see today.
Moving forward:The work that needs to be done has not really changed from my perspective. The things that need to be figured out, as described in #325 are stil pressing and have always entailed for me ensuring that the implementation of our VM is modeled according to the abstractions that the filled out design that #325 gives, is done well. The main additional thing, as mentioned above, is that we need to document the intended execution model for V1 (initial release, only builtin actors, no user actors, no real VM). After that we should
|
We don't have to 'save' in the inner, but if we don't, the outer save has to include the inner changes. This will happen if we're always talking to the same memory -- or if we ensure a nesting discipline which reloads after events which can change the underlying storage. The former is probably preferable, since it avoids superfluously saving. However, it assumes that both inner and outer processes have the same interpretation of the raw memory. That's not true in the very general case I think we aim for. To support that, we would need to get the save/load discipline right.
Specifically, there cannot be any
Right -- so we should probably forbid |
Here are some notes that I wrote up from our discussions in Lisbon: https://docs.google.com/document/d/1UgVjJ5yInTuya9cAbO6lmqTqH-l1_rxRKAUOXfu-7pk/edit# |
The immediate issue has been fixed and this is on everyone's minds so closing. |
We hit a few VM problems recently that rise to the level of a mini-postmortem. Please comment on anything that I missed or got wrong.
Background
The problem was that when sending messages between actors we were saving actors in an outer context (ApplyMessage) rather than in the inner context (Send). ApplyMessage is the entrypoint and calls Send. Send is used by actors to send messages to each other. However! Actors (in particular their balance) were saved in ApplyMessage. There were two problems with this. First, we should be saving in Send so when actors transfer value in an inner call it is recorded. Second, in ApplyMessage we continue to use an original copy of the actors after calling Send. Send and downstream callers could (do) mutate the actors, so we should reload the actor state if we want to use it after calling Send. @dignifiedquire fixed the proximate issue by moving the saves in #363.
Two potentially useful bits surfaced in discussion in slack. First, since the current vm model is to save and load actors while sending messages (as opposed to accumulating changes on one instance when sending messages) I proposed that our current implementation needed to make the following guarantees with respect to re-entrancy (here updated a bit for clarity):
I commented that I wasn't even sure if this was right because tricky [I'm still not sure it is right; and there might also be a smaller basis that spans the space of concerns]. I added the last bullet point a few messages after the first three so it was easy to miss, which is a contributing factor to why we ended up with the second problem below.
Also in that thread @dignifiedquire noted that there are three separate things we need to save:
These happen in three different places (ApplyMessage, Send, in the actor with WithStorage).
https://youtu.be/l-O5IHVhWj0?t=1m25s
Root causes & contributing factors
Given that we are still early in development it's not surprising that there are issues. Let's shake em out and see what we can do in the future to avoid this situation again.
Root causes:
Contributing factors:
If you see others please add them to this issue.
Potential for the same thing with actor memory
I'm not sure exactly what the plan is for actor memory but there is no mechanism guaranteeing that the re-entrancy guidelines [that I am not sure are correct] are followed for it. It's up to the Actor's implementer to get it right. The tooling doesn't help at the moment. For example when using the predominant pattern (WithStorage) if an actor A calls out to other actors which call back into A the changes to A earlier in the call are not visible.
Proposed Actions
Potential future directions
Why and others have suggested that we move to a model more like Ethereum's where a single instance of an actor is propagated through the call chain, accumulating changes. Basically, have the vm use an actor cache. This has the massive advantage of seeming at first blush to be much easier to reason about than the load-reload model. It also could bring potential performance improvements with it.
I would to propose that we have an additional path forward that we should consider, which is to just not allow re-entrant calls at all, or not allow them by default. The history of smart contract hacks is the history of re-entrancy bugs. It's so hard to get right that I think we should consider exploring the stance that re-entrancy considered dangerous, therefore let's not allow it, or at least not by default. Yes but contracts get more complicated. In some ways: I guess they are more complicated in that they need to be refactored into more pieces, or require users to sometimes use an extra address. But also they are way less complicated if you don't have re-entrancy because you don't have to worry about mind-bending edge cases and ordering quite as much. And it's probably less complicated to implement in the vm. Plus I'm not convinced that most applications even need re-entrancy (I'm not convinced the other way either, I see it as a open question), maybe those that do would better off factored into a small number of highly specialized services. God this stick I keep poking into my eye really hurts...
In any case I would like to have the discussion about re-entrancy based on concrete evidence: let's take all of our contracts, some contracts we hear may be hard without re-entrancy, and a few popular ethereum contracts and see how they would need to change if we didn't allow re-entrancy, and use that to guide the conversation. So, proposed additional action item:
Edit: sorry I accidentally hit create. Since then I added the final paragraphs, proofread, and fixed some typos in the text.
The text was updated successfully, but these errors were encountered: