Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix duplicated entries for resolver classes that use inheritance #499

Merged
merged 6 commits into from
Jan 12, 2020

Conversation

nphyatt
Copy link
Contributor

@nphyatt nphyatt commented Dec 19, 2019

This is an attempt to at least partially solve what I believe is a memory leak issue regarding metadata-storage.

If you're using a workflow where you build the schema dynamically on every request the queries and mutations arrays that are properties of the global metadata storage instance quickly grow in size for classes utilizing inheritance. From what I can tell it filters for the objects it's looking for in order to copy and modify metadata from the parent resolver class but creates new entries and adds them to an ever growing list. This solution simply modifies those object in place relying on pass by reference so that the array doesn't grow forever and cause memory issues as well as degraded performance on each subsequent request.

In my workflow simply clearing the global metadata cache didn't work because I have 2 schemas. 1 which is created statically when the app is started and then an additional schema which is created dynamically on each request. Clearing the cache broke my static schema.

This PR only partially solves the issue though as the queries list does seem to continue to grow (all be it much more slowly) adding a single duplicate item for each @Query or @Mutation decorator that I call during the dynamic creation of the schema. For example:

function makeResolver(type: ClassType, msg: string) {
  @Resolver(() => type)
  class DynamicResolver {

    @Query(() => GraphQLString, { name: `hello${msg}` })
    hello() {
       return `hello ${msg}`;
    }  
  }
}

Every time I prepare the Resolvers before calling buildSchema the Query decorator executes and adds a new item the my list of queries. This would be fine & necessary if I were intending to keep the schemas around for future use but they are meant to be thrown away after each request .

I'm using Containers and reseting after each request but the metadata storage appears to be global and it's not clear to me currently whether there is even a way to associate metadata items with the id of a container so that perhaps they could be cleared more individually at the same time you reset the container.

Perhaps my use case is rather uncommon or perhaps I'm missing something but I think this PR solves a bug that is there regardless of my particular use cases as you can see duplicate items without even having build the schema multiple times. Any guidance you might have on how I could clear other items from the metadata storage based on Container ID or what I might attempt in terms of an approach to make a PR to support this would be greatly appreciated!

Thanks for the amazing library!

@codecov
Copy link

codecov bot commented Dec 19, 2019

Codecov Report

Merging #499 into master will decrease coverage by 0.24%.
The diff coverage is 81.81%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #499      +/-   ##
==========================================
- Coverage   95.14%   94.89%   -0.25%     
==========================================
  Files          75       76       +1     
  Lines        1318     1333      +15     
  Branches      252      259       +7     
==========================================
+ Hits         1254     1265      +11     
- Misses         61       65       +4     
  Partials        3        3
Impacted Files Coverage Δ
src/metadata/metadata-storage.ts 100% <100%> (ø) ⬆️
src/metadata/utils.ts 90% <71.42%> (-4.12%) ⬇️
src/helpers/isThrowing.ts 20% <0%> (-60%) ⬇️
src/resolvers/helpers.ts 100% <0%> (ø) ⬆️
src/resolvers/create.ts 100% <0%> (ø) ⬆️
src/resolvers/convert-args.ts 100% <0%> (ø) ⬆️
src/utils/isPromiseLike.ts 100% <0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2ea5691...d0f9690. Read the comment docs.

@MichalLytek
Copy link
Owner

If you're using a workflow where you build the schema dynamically on every request

This is a huge antipattern as TypeGraphQL was never designed to work in that way.
The design goals was to use classes to describe statically the shape of the types and the schema queries/mutations, the same as the schema-first approach is about designing the static schema SDL.

The only "dynamic" part was added for generics support, so you can provide "schema name" of a field/query to prevent duplicates in schema when property names are used by default.

TypeGraphQL is not designed for that case as the schema building is a super slow process, the decorator metadata are stored globally to be able to share definitions into a separate npm packages.

What is your use case that you can't just "prebuild" a few schemas on bootstrap and then redirect to a proper apollo server instance port based on the request user?

@MichalLytek MichalLytek added Community 👨‍👧 Something initiated by a community Discussion 💬 Brainstorm about the idea Out of scope ✖️ Proposal doesn't fit in project goals labels Dec 20, 2019
@nphyatt
Copy link
Contributor Author

nphyatt commented Dec 20, 2019

Sorry I probably should have addressed the 2 things separately but this PR I think fixes a bug that is there regardless of whether or not you use the workflow I'm describing. It's relatively harmless for most people probably as 20-30 additional entries in the array is pretty benign but it's there none the less. Easiest way to reproduce would be create couple Resolver classes using inheritance and log the this.queries.length when you build. You'll see more entries than queries but perhaps thats intentional for some reason I'm missing.

Obviously this was much easier to see when using my workflow because every time I built the schema the array would grow (and by a lot more than the number of queries I had) and the filtering of the arrays etc. would get slower and slower.

The reason I'm building the one schema on every request is that the schema is dependent on database entries that can change fields name, available resolvers, etc. So for example if a user goes into an admin UI and clicks to hide a query property of rename a field I'd like the resulting schema to update.

The easiest way make sure the graph is always in sync with the database I can think of is to build the schema on every request but then I run into the expanding arrays issue and the performance of buildSchema goes down with every additional call to it. The next easiest way that would also avoid performance issues would be to keep the schema cached in memory then only rebuild and replace when someone modified the database although that still has the same memory leak issue and performance of buildSchema get worse every time someone modifies the database but I suppose memory leaks more slowly and the performance of buildSchema doesn't affect the api requests. The instance could be restarted every time the database is modified but doing a rolling restart of any containers running the service is cumbersome and ideally I'd want the user to see their changes take effect ASAP.

Sound like supporting this sort of workflow is out of scope for your vision of this library which seems fair. Thanks for cool library!

@MichalLytek
Copy link
Owner

The reason I'm building the one schema on every request is that the schema is dependent on database entries that can change fields name, available resolvers, etc. So for example if a user goes into an admin UI and clicks to hide a query property of rename a field I'd like the resulting schema to update.

Don't feel offended but I think it's a terrible design 😱
So as you rename a db column (sick!), you rename the schema field, so to make client query work you have to build the query string on every request based on the metadata fetched from the server, like the the list User type field names? Whoa!😵

Maybe it would be better if you just built-in the dynamic nature of fields into the schema? Like for this kind of query:

{
  user(id: 2137) {
    fields {
      name
      value
    }
  }
}

So you will return this kind of response:

{
  "user": {
    "fields": [
      { "name": "firstname", "value": "John" },
      { "name": "age", "value": 12 }
    ]
  }
}

The whole point of GraphQL by design is a constant contract between client and server, so you can be 100% sure what you can ask and what you will receive. With your dynamic rebuild for every request, it's not true anymore and you need to do hackish query with introspection before every request 🙅‍♂

@nphyatt
Copy link
Contributor Author

nphyatt commented Dec 22, 2019

Lol. No offense I oversimplified my use case. I think you’re right that would be a terrible design if I were only making names dynamic. The reality is I’m making the entire graph dynamic creation and all types, queries, and mutations. The service I’m actually working on is a service that allows users to build graphql APIs via a UI. So for example in the UI you enter some credentials for a SQL database and I can connect to it, introspect it and turn the tables into types, the columns into fields, and then make some basic CRUD resolvers around the tables. Obviously as the user adds more tables, or makes migrations, or adds an Auth provider & Auth to a field or query etc I’ll need to rebuild the schema with the new configuration.

I agree it would be self defeating to change the schema all the time but that’s not really the goal. The goal is they can build and test their schema as they build it and of course once it’s built they get the benefits of graphql contract. I know your library wasn’t built with this use case in mind but it actually works surprisingly well for it. I was originally using the raw graphql classes and wrapping resolvers with Auth etc but I’ve been playing around with using this library since I use it for the static API I built to import the databases etc in the more traditional way.

Anyway building the users schema on every single request is probably unnecessary. Like I said I could build it and cache it until I detect that they’ve made changes. However I’ll still need to rebuild the same schema on occasion and I'd prefer not to restart the instance each time. And then there is the problem that I’d need to build different graphs for each user in the same instance which has a similar effect to rebuilding the same graph in that the arrays in this PR grow very quickly do to the fact that all the schemas share a global storage singleton.

Looking at the code it seems like it would be feasible to add a way register a unique id for the schema & metadata calls then then create a new instance of storage for each id and allow the decorators to register themselves to an id but I very likely could be missing some of the complexity since I’ve mostly just looked at the metadata storage and query decorator at this point. I the thing I’m proposing looking something like:

...
@Query(() => type, {schemaId: “xxxx”})
async createRow(...) {
...
}
...
const schema = buildSchema({resolvers: [...], schemaId: “xxxx”})
...
resetSchemaMetadata({schemaId: “xxxx”})

Something along those lines anyway. The idea would be the ability to scope & clear the metadata cache for folks building more than one schema or needing to rebuild a schema occasionally and not wanting to reboot the process.

@MichalLytek
Copy link
Owner

Lol. No offense I oversimplified my use case. (...) The service I’m actually working on is a service that allows users to build graphql APIs via a UI.

Ok, that makes sense 👍 No we know your use case and real root of the problem, instead of just applying a fix for something 😉

I was originally using the raw graphql classes and wrapping resolvers with Auth etc

That's what I wanted to propose - it's really weird that you have the metadata (SQL database layout) and you dynamically create kinda-static classes with decorators that are again translated to metadata and then into graphql-js objects 😄

Maybe you should try different wrappers around the builder pattern, like graphql-nexus or something 😉

the arrays in this PR grow very quickly do to the fact that all the schemas share a global storage singleton

It's not that the schemas need to have something global or read the metadata in runtime. It's about that the Reflect.metadata and other decorator-based tools have to put the data somewhere when decorator is called. So we have a singleton storage that register the metadata related to the classes to then use the classes in runtime as the resolvers execution model.

So the problem is that when you create new classes, it run the decorators and register new metadata but the old, not used anymore classes (as they are replaced by a new schema) are still in the storage, while they should be cleaned. One way to achieve that is to have some imperative API like you proposed, which is error prone. A second way is to make the metadata storage garbage collectable, so e.g. instead of storing the metadata and references to classes in a static array, it should be a WeakMap so it can be cleaned up automatically by V8 runtime when not needed anymore.

As the first version of TypeGraphQL was a proof of concept, I haven't thought about such things and even don't consider the resolvers build schema option during schema build. I just put everything into an array and then emit all the things from the array. v0.18 changed that by filtering the metadata array but I think that in vNext I should just make the metadata storage as a weak map as now whole pipeline is around resolvers classes.

What do you think? And why you've not created an issue as I asked in readme and started with PR without even describing the context and use cases that motivated the changes? 😕

@nphyatt
Copy link
Contributor Author

nphyatt commented Dec 23, 2019

Sorry for not opening an issue. In trying to figure out what was going on with my use case I thought I found a legitimate bug. The PR here doesn't actually fix my issue as the global metadata storage is still used and so every time a @Query or @Mutation decorator is called the array still grows. The issue that this PR attempts to resolve is probably not particularly troublesome to most people but exacerbates the issue I was having. I created a quick demo here if you're interested: https://github.com/nphyatt/type-graphql-demo

Just run npm i & npm start and you'll see in the logs that the metadata storage has 2 entries for the schemaName getAllDogs & getAllPersons

Using ts-node version 8.5.4, typescript version 3.7.4
Query Count By Name { getAlldogs: 2, getAllpersons: 2, getAge: 1, getBreed: 1 }
Listening on 4000!

I would expect 1 entry per schema entry point you end up with one for the decorator call and 1 for each class it inherits from. Maybe not really a bug since the logic still works because you're using arr.find and adding the additional items via unshift to the front of the array. I wasn't sure whether this was intentional or not but it seemed unnecessary to and while it doesn't fix my use case I thought you might want the PR as in my static schema I used this style of resolver creation for Pagination (and I assume others do as well) and it seemed unnecessary to have the duplicate entries). If it's intentional for some reason that's fine I can close the PR.

As for using something like graphql-nexus I definitely looked into it but I think your library has more mature and easy ways to do things like Auth (on both objects & fields), Input Validation (if a user specifies a column should be an email or something), Middleware (for doing additional processing etc), dependency injection. I know it's kind of weird but Each sort of static class is really the same underlying implemenation with different Auth privileges and middlewares and "data sources" as determined by the user.

That means I can write a function that returns a SqlPlugin class which has all the same methods, same types of args, but adds different query fields that return different types and inject a different db connections into the class and attach different Auth rules and middlewares. To me it actually ended up easier to read & modularize when using your library vs graphql-nexus or graphql-js apis directly. I could be wrong though and making poor design choices...it's certainly happened before 😄

The WeakMap idea sounds like a solid idea to me. I'd be happy to take a stab at it if you like or open an issue to discuss it further and close or merge the PR based on your assessment of what I mentioned above.

@MichalLytek
Copy link
Owner

I would expect 1 entry per schema entry point you end up with one for the decorator call and 1 for each class it inherits from.

The reason why I don't:

modify objects by reference so as not to create duplicates

Is that I try to follow FP and immutability rules. So rather than modifying the object, I would make a copy with changes (like now) and then delete the metadata of superclass as they are not needed for resolvers, only for types inheritance.

The WeakMap idea sounds like a solid idea to me.

I've tried that on the vNext branch and it's working well, much better than arrays as no need to find or filter:
a195e8a

The classes as keys would be referenced in schema, so when you create a new schema and delete the old one, it could be GC then 😉

BTW, how do you swap the schema after applying the changes? You close the apollo server and create a new instance?

@nphyatt
Copy link
Contributor Author

nphyatt commented Dec 30, 2019

Awesome! I'll check out the vNext branch.

Currently I'm simply creating a new ApolloServer (using apollo-server-express ) instance on each request and then rather than listening on a port a grab the grab the Router via server.getMiddleware and handle the request and then let go of the reference to the server. I'm not sure whether that would be enough to cause node to release everything for gc or not? I intend to run this in a lambda environment to recreating the server on each request doesn't seem that unreasonable given that environment. You can "warm up a lambda function" and keep some things in the global memory space and if I did that I would probably need to use redis or something to set a flag that marks a graph as "stale" and forces creation of a new instance. I'm not sure what the best data structure would be for that cache as I probably wouldn't want to keep to many graph instances in memory but if I only keep a couple then I probably won't get a ton of hits either. If the overhead for building the schema isn't huge then it's probably easiest just to construct it on every request. In my demos it's been maybe 40-60ms and I'd imagine it's better if you're using a WeakMap as you probably don't have to filter through arrays so much anymore but I've not done any real testing on this so I'll defer to your thoughts.

@MichalLytek
Copy link
Owner

I'm not sure whether that would be enough to cause node to release everything for gc or not?

If you just register a new one without unregistering the old middleware, it may still be attached to express router but just don't called as the new middleware doesn't call next()? I don't know how to check that. Maybe you should have your own express middleware based on graphql-express that will use the cached schema or rebuild it?

@nphyatt
Copy link
Contributor Author

nphyatt commented Dec 30, 2019

Maybe you should have your own express middleware based on graphql-express that will use the cached schema or rebuild it?

That's basically what I do. I don't actually attach the router to the express app. My middleware looks something like...

app.use(`/:graphId/graphql`, (req, res, next) => {
  const  { graphId }= req.params;
  const config = await db.find(Config, graphId);
  if (!config) return res.sendStatus(404);

  const resolvers = await buildResolvers(config);
  const authChecker = await buildAuthChecker(config);

  const schema = buildSchema({
    resolvers,
    authChecker,
    authMode: 'error'
  });

  const server = new ApolloServer({
    schema,
    context: async ({ req }) => { ... },
  });

  const router = server.getMiddleware({ path: "/" });

  router(req, res, next);
})

I think that should be enough to allow things to be gc as getMiddleware just returns a new express.Router() not attached to the app instance but I'd need to test more to verify.

So rather than modifying the object, I would make a copy with changes (like now) and then delete the metadata of superclass as they are not needed for resolvers, only for types inheritance.

I can update this PR to do this if you would like these changes. I guess maybe I'd make the mapSuperResolverHandlers and mapSuperFieldResolverHandlers map the full array while updating the correct entries rather than just updating a filtered list and then you could assign the entire array rather than unshifting the filtered entries. something like:

export function mapSuperResolverHandlers<T extends BaseResolverMetadata>(
  definitions: T[],
  superResolver: Function,
  resolverMetadata: ResolverClassMetadata,
): T[] {
  return definitions.map(metadata => {
    return metadata.target === superResolver ? {
      ...metadata,
      target: resolverMetadata.target,
      resolverClassMetadata: resolverMetadata,
    } : {
        ...metadata
      }
  });
}

...

private buildExtendedResolversMetadata() {
...
  this.queries = mapSuperResolverHandlers(this.queries, superResolver, def);
...
}

src/metadata/utils.ts Outdated Show resolved Hide resolved
@MichalLytek
Copy link
Owner

@nphyatt Could you add a test case that ensures that future refactors won't affect this fix and we won't have a regression in that scope?

@nphyatt
Copy link
Contributor Author

nphyatt commented Jan 7, 2020

Ok added tests that fail on master but pass after this PR. Added a metadata-storage.ts since I didn't really see any other logical place for them. Let me know if you'd prefer something else.

Copy link
Owner

@MichalLytek MichalLytek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it looks great! 🎉

@MichalLytek MichalLytek changed the title fix(metadata-storage): fix duplicate entries for resolver classes that use inheritance Fix duplicated entries for resolver classes that use inheritance Jan 12, 2020
@MichalLytek MichalLytek merged commit 6e66faa into MichalLytek:master Jan 12, 2020
@MichalLytek MichalLytek added Enhancement 🆕 New feature or request Solved ✔️ The issue has been solved and removed Discussion 💬 Brainstorm about the idea Out of scope ✖️ Proposal doesn't fit in project goals labels Jan 12, 2020
@MichalLytek MichalLytek added this to the 1.0.0 release milestone Jan 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community 👨‍👧 Something initiated by a community Enhancement 🆕 New feature or request Solved ✔️ The issue has been solved
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants