Skip to content
This repository has been archived by the owner on Mar 5, 2020. It is now read-only.

shawnbot/unist-sitter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 

Repository files navigation

unist-sitter

This repo is not yet a thing, but I hope that it will be... soon.

What the heck does "unist-sitter" mean? It's a marriage of two awesome pieces of software:

Basically, it should be possible to do this:

const TreeSitter = require('tree-sitter')
const Ruby = require('tree-sitter-ruby')
const {UnistNode} = require('unist-sitter')
const {select} = require('unist-util-select')

const parser = new TreeSitter()
parser.setLanguage(Ruby)

const tree = await parser.parse(`puts "yo #{ENV['DOG']}"`)
const unistTree = UnistNode(tree.rootNode)
console.log(select(unistTree, 'interpolation'))

Syntax nodes

The syntax node API would need only be a single function that either:

  1. Converts a tree-sitter SyntaxNode object into a unist Node. For instance:

    function UnistNode(from) {
      const {type, children = []} = from
      return {
        type,
        position: createUnistPosition(from),
        children: children.map(sitterNodeToUnist)
      }
    }

    If references to the "original" nodes weren't preserved, the resulting object model could be smaller in memory and more easily serializable.

  2. Creates a wrapper (proxy) for the SyntaxNode, exposing the unist Node interface (a la unist-util-parents). This might be more memory-hungry, but it would also allow for editing of the underlying tree-sitter nodes while preserving the ability to navigate the tree with unist utilities.

    Wrapping offers more opportunities for dynamic getters that make it easier to form CSS-like selectors that match syntax nodes, e.g. by identifier:

    class UnistNode {
      // instead of: select('method_call:has(:scope > identifier[text=foo])'
      // you can do: select('method_call[identifier=foo]')
      get identifier() {
        const node = select(this, ':has(:scope > identifier')
        return node ? node.text : undefined
      }
    }

File objects

At the simplest level, this would map all of the information necessary to create a unist-sitter parse tree from a single file to unified's vfile object model. Something like:

const {FileNode} = require('unist-sitter')

const foo = FileNode({path: 'foo.rb'})

This would makes it possible to use lots of cool utilities, like vfile-find-down to traverse the file system.

Presumably, they would also be easy to read from the filesystem and parse with a given tree-sitter grammar:

const Ruby = require('tree-sitter-ruby')
await foo.read() // sets .contents
await foo.parse(Ruby) // sets .syntax ?

Directory objects

This is where things get kind of wild. Now imagine a unist tree structure that represents both the file system and file-level syntax trees. Suddenly, searching for an ambiguously named method call becomes a whole lot easier:

const {DirectoryNode} = require('unist-sitter')
const visit = require('unist-util-visit')

const root = DirectoryNode(process.cwd())
visit(root, 'file[lang=ruby] method_call:has(:host > identifier[text=get])', node => {
})

What sorcery is this? Well, if a directory node lazily evaluates its children property...

const {readdirSync} = require('fs')
const {join} = require('path')

class DirectoryNode extends VFile {
  get type() { return 'directory' }
  
  get children() {
    return readdirSync(this.path, {withFileTypes: true})
      .map(dirent => {
        const path = join(this.path, dirent.name)
        return dirent.isDirectory()
          ? DirectoryNode({path})
          : dirent.isFile() ? FileNode({path}) : null
      })
      .filter(Boolean)
  }
}

and a file node does the same (assuming some knowledge of how filename extensions map to parser grammars)...

class FileNode extends VFile {
  get type() { return 'file' }
  
  parse() {
    const parser = getParser(this.lang)
    const tree = parser.parse(this.contents)
    return UnistNode(tree.rootNode)
  }

  // returns "rb" if the filename ends with ".rb", etc.
  get lang() {
    return this.extname.replace(/^\./, '')
  }  

  get children() {
    return [this.parse()]
  }  
}

Implementing custom node types for different languages would super-charge the unist utilities and allow you to do things like parse ERB as both HTML and Ruby:

const ERB = getParser('embedded-template')
const HTML = getParser('html')
const Ruby = getParser('ruby')

class ERBNode extends FileNode {
  get lang() {
    return 'erb'
  }
  
  parse() {
    const tree = ERB.parse(this.contents)
    const html = selectAll(tree, 'output').map(node => node.text).join('')
    const ruby = selectAll(tree, 'directive').map(erbDirectiveToRuby).join('\n')
    return {
      html: HTML.parse(html),
      ruby: Ruby.parse(ruby)
    }
  }
  
  get children() {
    const {html, ruby} = this.parse()
    return [html, ruby]
  }
}

Anyway, more soon! πŸš€

About

tree-sitter + unist = πŸŒˆπŸ¦„β€οΈ

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published