Nick Fishman

  • Archive
  • RSS
I propose a revised version of the famous proverb: “The way to a man’s heart is through his startup.”
Pop-upView Separately

I propose a revised version of the famous proverb: “The way to a man’s heart is through his startup.”

    • #ampcloud
  • 1 year ago
  • Comments
  • Permalink
Share

Short URL

TwitterFacebookPinterestGoogle+

Speeding up Mongoose queries by requesting only the fields you need

I’m currently building a startup (ampcloud) with Node.js, MongoDB, Mongoose, and a handful of other tools. After spending quite a few years in the Django world, it’s been fun doing a mental context switch into the land of JavaScript, callbacks, and closures. Occasionally I’ve run into some gotchas, and this particular one is a great example.

Let’s say you’re building a blog, and part of your database schema looks something like this:

var CommentSchema = new Schema({
  title: {type: String},
  body: {type: String},
  createdAt: {type: Date}
});

var PostSchema = new Schema({
  author: {type: String},
  title: {type: String},
  createdAt: {type: Date},
  slug: {type: String},
  comments: [CommentSchema]
});

module.exports.Post = mongoose.model('Post', PostSchema);

Every post is stored as a separate document in MongoDB, but all comments are embedded within it. This means that when you fetch a post, you’ll get all the comments back with it.

Now let’s say you want to display a list of the 20 most recent blog posts on your home page. Assuming you’re using Express, you would write a view like:

app.get('/', function(req, res) {
  Post
    .find()
    .asc('createdAt')
    .limit(20)
    .run(function(err, posts) {
      if (err) {
         res.render('error', {status: 500});
      } else {
        res.render('allposts', {posts: posts});
      }
    });
});

You’d also want to add an index to allow efficient querying by date created:

PostSchema.index({createdAt: 1});

Your blog will probably work well at first, but you’ll run into problems as soon as one of your amazing posts goes viral and gets thousands of comments. You’ll notice that your main page starts taking a lot longer to load. Even when you’re the only one browsing your blog, it just won’t feel as snappy anymore.

Beware: Mongoose fetches all fields by default

The culprit is the comments field. Because a Mongoose query requests all fields of a document by default, every site visitor will cause it to request and parse the entire list of comments. Every time. You don’t even need the list of comments to render the main page.

Let’s get rid of the comments field by adding the following line to the query chain:

    .exclude('comments')

The final result:

app.get('/', function(req, res) {
  Post
    .find()
    .asc('createdAt')
    .limit(20)
    .exclude('comments')
    .run(function(err, posts) {
      if (err) {
         res.render('error', {status: 500});
      } else {
        res.render('allposts', {posts: posts});
      }
    });
});

You’ll find that this performs a lot better. The problem isn’t so much that MongoDB can’t return the data quickly enough. Rather, Node.js has to spend much of its time parsing extra JSON into JavaScript objects, which is both unnecessary and time-consuming.

Not surprisingly, I recently encountered this issue in production. I made the fix right at 3:00 GMT, and the load dropped dramatically.

Takeaway: think about your queries

When your models start accumulating lots of data, think about whether you can request a subset of fields when making queries. See the Mongoose query documentation for details.

Caveat: Keep in mind that you won’t gain much by excluding fields that store primitive types like Strings, Numbers, or Dates. Even worse, your code will probably get harder to read and maintain. Only make such optimizations when you have to.

Some final notes

The above schema suffers from a fundamental flaw: it doesn’t scale well. If a blog post gets thousands of comments, you’ll probably want to paginate the comments and only show several hundred at a time. But with this schema, you can’t ask MongoDB for a subset of comments. You can only get all or nothing.

To make this production ready, you’d probably want to separate Comment and Post into separate Mongoose models, instead of nesting Comments within Posts as embedded documents. Each Comment would be a separate MongoDB document, you’d store the Post id within the Comment, and you could efficiently query for random subsets of comments on a particular blog post.

    • #mongodb
    • #mongoose
    • #nodejs
    • #tech
    • #ampcloud
  • 1 year ago
  • 4
  • Comments
  • Permalink
Share

Short URL

TwitterFacebookPinterestGoogle+

Nick Fishman

Portrait/Logo

About

I'm a software engineer and entrepreneur. I like to solve high-impact problems with technology. I'm also the CTO and co-founder of sonicpanther.
Follow @nickfishman

Social

  • @nickfishman on Twitter
  • Google
  • Linkedin Profile
  • nickfishman on github
  • RSS
  • Random
  • Archive
  • Mobile

© 2013 Nick Fishman. All rights reserved..

Effector Theme by Pixel Union