主题

语言服务器索引格式规范 - 0.5.0

LSIF 的 0.5.0 版本目前正在构建中。

语言服务器索引格式

语言服务器索引格式 (LSIF) 的目的是定义一种标准格式，供语言服务器或其他编程工具将其关于工作区的知识转储出来。此转储稍后可用于回答同一工作区的语言服务器 LSP 请求，而无需运行语言服务器本身。由于许多信息会因工作区的更改而失效，因此转储的信息通常不包括在修改文档时使用的请求。例如，代码完成请求的结果通常不属于此类转储。

更新日志

0.5.0 版本

在 0.4.0 版本中，增加了对按项目（按其反向依赖顺序）转储大型系统，然后通过使用其对应的标识符链接结果集，在数据库中再次组合转储的支持。格式的使用表明，缺少一些功能才能使其正常工作：

支持对项目进行逻辑分组。为此，添加了一个 Group 顶点。
了解标识符的唯一性。为此，为 Moniker 添加了一个 unique 属性。
nextMoniker 边被更通用的 attach 边取代。这是可能的，因为标识符现在带有 unique 属性，该属性以前编码在 nextMoniker 边的方向中。
在支持多态性的编程语言中，运行时调用可以绑定到与静态已知不同的类型。一个例子是面向对象编程语言中被覆盖的方法。由于转储可以按项目创建，我们需要向转储中添加额外的信息，以便可以捕获这些多态绑定。因此引入了引用链接的通用概念（参见多项目一节）。简而言之，它允许工具使用属性值 referenceLinks 注释 item 边。
为了更好地将输出分片，items 边带有一个额外的属性 shard。在 0.5 规范的早期版本中，此属性名为 document。

0.4.0 规范的旧版本可在此处获取

版本 0.4.0

截至 0.4.0 版本，LSIF 格式的重点是简化语言工具提供商的转储生成。然而，这使得转储的消费者很难高效地将其导入数据库，除非数据库格式与 LSIF 格式一一映射。本规范版本试图通过要求工具提供商在某些数据准备好供使用时发出额外的事件来平衡这一点。它还增加了按文档分区数据的支持。

由于 0.4.0 版本对 LSIF 的某些方面进行了更深层次的更改，因此可以在此处获取旧的 0.3.x 版规范。

动机

主要设计目标

该格式不应暗示使用某种特定的持久化技术。
定义的数据应尽可能地模仿语言服务器协议进行建模，以便无需进一步转换即可通过 LSP 提供数据。
存储的数据是通常从 LSP 请求返回的结果数据。转储不包含任何程序符号信息，LSIF 也不定义任何符号语义（例如，符号在哪里定义或引用，或者一个方法何时覆盖另一个方法）。因此，LSIF 不定义符号数据库。请注意，这与 LSP 本身一致，LSP 本身也不定义任何符号语义。
输出格式将基于 JSON，与 LSP 相同。

LSIF 中适合支持的 LSP 请求是

相应的 LSP 请求具有以下两种形式之一

request(uri, method) -> result
request(uri, position, method) -> result

其中 method 是 JSON-RPC 请求方法。

具体示例如下

request(
  'file:///Users/dirkb/sample/test.ts',
  'textDocument/foldingRange'
) -> FoldingRange[];
request(
  'file:///Users/dirkb/sample/test.ts',
  { line: 10, character: 17 },
  'textDocument/hover'
) -> Hover;

请求的输入元组可以是 [uri, method] 或 [uri, position, method]，输出是某种形式的结果。对于相同的 uri 和 [uri, position] 元组，有许多不同的请求可执行。

因此，转储格式应支持以下功能

输入数据必须易于查询（例如，文档和位置）。
每个元素都有一个唯一的 ID（可以是字符串或数字）。
应尽可能早地发出数据，以便进行流式传输，而不是占用大量内存。例如，基于文档语法的发出数据应在每个文件解析进行时完成。
以后添加额外的请求应该很容易。
工具应该很容易使用转储，例如将其导入数据库而无需将转储保留在内存中。

我们得出结论，最灵活的发出方式是图，其中边表示方法，顶点是 [uri]、[uri, position] 或请求结果。然后，这些数据可以存储为 JSON 或读入可以表示这些顶点和关系的数据库。

假设有一个文件 /Users/dirkb/sample.ts，我们想用它存储折叠范围信息，那么索引器会发出两个顶点：一个表示 URI 为 file:///Users/dirkb/sample.ts 的文档，另一个表示折叠结果。此外，还会发出一条边，表示 textDocument/foldingRange 请求。

{ id: 1, type: "vertex", label: "document",
  uri: "file:///Users/dirkb/sample.ts", languageId: "typescript"
}
{ id: 2, type: "vertex", label: "foldingRangeResult",
  result: [ { ... }, { ... }, ... ]
}
{ id: 3, type: "edge", label: "textDocument/foldingRange", outV: 1, inV: 2 }

相应的图如下所示

Folding Range Result

范围

对于以位置作为输入的请求，我们还需要存储位置。通常，LSP 请求对于指向文档中相同单词/名称的位置返回相同的结果。以以下 TypeScript 示例为例

function bar() {
}

对于表示 bar 中的 b 的位置的悬停请求，将返回与表示 a 或 r 的位置相同的结果。为了使转储更紧凑，它将使用范围而不是单个位置来捕获此信息。在这种情况下将发出以下顶点。请注意，行和字符从零开始，与 LSP 中相同

{ id: 4, type: "vertex", label: "range",
  start: { line: 0, character: 9}, end: { line: 0, character: 12 }
}

为了将范围绑定到文档，我们使用一个特殊的标记为 contains 的边，它从文档指向一组范围。

{ id: 5, type: "edge", label: "contains", outV: 1, inVs: [4] }

LSIF 支持 contains 关系的 1:n 边，这在图中可以很容易地映射到 n 个 1:1 边。LSIF 支持此功能有两个原因：(a) 使输出更紧凑，因为一个文档通常包含数百个此类范围；(b) 简化 LSIF 转储消费者的导入和批量处理。

为了将悬停结果绑定到范围，我们使用与折叠范围相同的模式。我们发出一个表示悬停结果的顶点，以及一个表示 textDocument/hover 请求的边。

{
  id: 6,
  type: "vertex",
  label: "hoverResult",
  result: {
    contents: [
      { language: "typescript", value: "function bar(): void" }
    ]
  }
}
{ id: 7, type: "edge", label: "textDocument/hover", outV: 4, inV: 6 }

相应的图如下所示

Hover Result

文档中包含关系中发出的范围必须遵循以下规则

给定范围 ID 只能包含在一个文档中，换句话说：即使范围具有相同的开始/结束值，也不得在文档之间共享。
不能有两个范围相等。
不能有两个范围重叠，除非一个范围完全包含在另一个范围中，否则不得在文档中占据相同的位置。

如果文档中的某个位置映射到某个范围，并且有多个范围覆盖该位置，则应使用以下算法

按包含关系对范围进行排序，最内层的排在最前面
对于范围中的每个范围，执行以下操作：
1. 检查该范围是否具有传出边 textDocument/${method}
2. 如果存在，则使用它
结束
返回 null

通常，无论您悬停在函数的定义上还是函数的引用上，悬停结果都是相同的。许多 LSP 请求（如 textDocument/definition、textDocument/references 或 textDocument/typeDefinition）也是如此。在简单的模型中，每个范围都将具有所有这些 LSP 请求的出边，并指向相应的结果。为了优化这一点并使图更容易理解，引入了 ResultSet 的概念。结果集充当一个枢纽，能够存储许多范围共有的信息。ResultSet 本身不携带任何信息。因此它看起来像这样

export interface ResultSet {
}

上面示例中，使用结果集的悬停的相应输出如下所示

{ id: 1, type: "vertex", label: "document",
  uri: "file:///Users/dirkb/sample.ts", languageId: "typescript"
}
{ id: 2, type: "vertex", label: "resultSet" }
{ id: 3, type: "vertex", label: "range",
  start: { line: 0, character: 9}, end: { line: 0, character: 12 }
}
{ id: 4, type: "edge", label: "contains", outV: 1, inVs: [3] }
{ id: 5, type: "edge", label: "next", outV: 3, inV: 2 }
{ id: 6, type: "vertex", label: "hoverResult",
  result: {
    "contents":[ {
      language: "typescript", value:"function bar(): void"
    }]
  }
}
{ id: 7, type: "edge", label: "textDocument/hover", outV: 2, inV: 6 }

Result Set

结果集通过 next 边链接到范围。结果集还可以通过使用 next 边链接到另一个结果集来转发信息。

将结果与 ResultSet 存储的模式也将用于其他请求。因此，对于请求 [document, position, method]，查找算法如下

找到 [document, position] 的所有范围。如果不存在，则返回 null 作为结果。
按包含关系对范围进行排序，最内层的排在最前面。
对于范围中的每个范围，执行以下操作：
1. 将范围分配给 out。
2. 当 out !== null 时
  1. 检查 out 是否具有传出边 textDocument/${method}。如果存在，则使用它并返回相应的结果。
  2. 检查 out 是否具有传出 next 边。如果存在，则将 out 设置为目标顶点。否则将 out 设置为 null。
3. 结束
结束
否则返回 null

语言特性

请求：`textDocument/definition`

将范围、结果集或文档与请求边连接到方法结果的相同模式也用于其他请求。接下来我们看看使用以下 TypeScript 示例的 textDocument/definition 请求

function bar() {
}

function foo() {
  bar();
}

这将发出以下顶点和边来建模 textDocument/definition 请求

// The document
{ id: 4, type: "vertex", label: "document",
  uri: "file:///Users/dirkb/sample.ts", languageId: "typescript"
}

// The result set
{ id: 6, type: "vertex", label: "resultSet" }

// The bar declaration
{ id: 9, type: "vertex", label: "range",
  start: { line: 0, character: 9 }, end: { line: 0, character: 12 }
}
{ id: 10, type: "edge", label: "next", outV: 9, inV: 6 }


// The bar reference
{ id: 20, type: "vertex", label: "range",
  start: { line: 4, character: 2 }, end: { line: 4, character: 5 }
}
{ id: 21, type: "edge", label: "next", outV: 20, inV: 6}

// The definition result linked to the bar result set
{ id: 22, type: "vertex", label: "definitionResult" }
{ id: 23, type: "edge", label: "textDocument/definition", outV: 6, inV: 22 }
{ id: 24, type: "edge", label: "item", outV: 22, inVs: [9], shard: 4 }

Definition Result

上面的定义结果只有一个值（ID 为 '9' 的范围），我们可以直接发出它。但是，我们引入定义结果顶点有两个原因

为了与所有指向结果的其他请求保持一致。
为了支持定义可以分布在多个范围甚至多个文档中的语言。为了支持多个文档，使用 1:N item 边将范围添加到定义结果中。从概念上讲，定义结果是一个数组，item 边将项目添加到其中。

考虑以下 TypeScript 示例

interface X {
  foo();
}
interface X {
  bar();
}
let x: X;

在 let x: X 中对 X 运行 转到定义 将显示一个对话框，允许用户在 interface X 的两个定义之间进行选择。在这种情况下，发出的 JSON 如下所示

{ id : 38, type: "vertex", label: "definitionResult" }
{ id : 40, type: "edge", label: "item", outV: 38, inVs: [9, 13], shard: 4 }

item 边作为附加属性分片，指示这些声明的源顶点（例如文档或项目）。我们添加了此信息，以使其仍然易于发出数据，同时也使其在存储到数据库时易于处理和分片数据。如果没有该信息，我们要么需要指定数据发出的顺序（例如，一个项边并且只引用使用 contains 边已添加到文档中的范围），要么我们强制处理工具将大量顶点和边保留在内存中。拥有此 shard 属性的方法看起来是一个公平的平衡。

请求：`textDocument/declaration`

有些编程语言具有声明和定义的概念（例如 C/C++）。如果是这种情况，转储可以包含一个相应的 declarationResult 顶点和一条 textDocument/declaration 边来存储信息。它们的处理方式与为 textDocument/definition 请求发出的实体类似。

更多关于请求的信息：`textDocument/hover`

在 LSP 中，悬停定义如下

export interface Hover {
  /**
   * The hover's content
   */
  contents: MarkupContent | MarkedString | MarkedString[];

  /**
   * An optional range
   */
  range?: Range;
}

其中可选的 range 是悬停单词的名称范围。

旁注：这是一种也用于其他 LSP 请求的模式，其中结果包含位置参数指向的单词的单词范围。

这使得悬停在每个位置都不同，因此我们无法真正将其与结果集一起存储。但是，等等，该范围是已发出并用于开始计算结果的 bar 引用之一的范围。为了使悬停仍然可重用，我们要求索引服务器在结果中未定义范围时填充起始范围。因此，对于在范围 { line: 4, character: 2 }, end: { line: 4, character: 5 } 上执行的悬停请求，悬停结果将是

{ id: 6, type: "vertex", label: "hoverResult",
  result: {
    contents: [ { language: "typescript", value: "function bar(): void" } ],
    range: { line: 4, character: 2 }, end: { line: 4, character: 5 }
  }
}

请求：`textDocument/references`

存储引用将以与存储悬停或跳转到定义范围相同的方式完成。它使用一个引用结果顶点和 item 边将范围添加到结果中。

请看以下示例

function bar() {
}

function foo() {
  bar();
}

请求：`textDocument/implementation`

支持 textDocument/implementation 请求是通过重用我们为 textDocument/references 请求实现的内容来完成的。在大多数情况下，textDocument/implementation 返回符号声明指向的引用结果的声明值。对于结果不同的情况，LSIF 提供了 ImplementationResult。为了嵌套实现结果，item 边支持 property 值 "implementationResults"。

相应的 ImplementationResult 如下所示

interface ImplementationResult {

  label: `implementationResult`
}

请求：`textDocument/typeDefinition`

支持 textDocument/typeDefinition 是直接的。该边要么记录在范围上，要么记录在 ResultSet 上。

相应的 TypeDefinitionResult 如下所示

interface TypeDefinitionResult {

  label: `typeDefinitionResult`
}

对于以下 TypeScript 示例

interface I {
  foo(): void;
}

let i: I;

文档请求

语言服务器协议还支持仅针对文档的请求（不含任何位置信息）。这些请求是 textDocument/foldingRange、textDocument/documentLink 和 textDocument/documentSymbol。我们遵循与以前相同的模式来建模这些请求，不同之处在于结果链接到文档而不是范围。

请求：`textDocument/foldingRange`

对于折叠范围结果，它看起来像这样

function hello() {
  console.log('Hello');
}

function world() {
  console.log('world');
}

function space() {
  console.log(' ');
}
hello();space();world();

{ id: 2, type: "vertex", label: "document",
  uri: "file:///Users/dirkb/sample.ts", languageId: "typescript"
}
{ id: 112, type: "vertex", label: "foldingRangeResult", result:[
  { startLine: 0, startCharacter: 16, endLine: 2, endCharacter: 1 },
  { startLine: 4, startCharacter: 16, endLine: 6, endCharacter: 1 },
  { startLine: 8, startCharacter: 16, endLine: 10, endCharacter: 1 }
]}
{ id: 113, type: "edge", label: "textDocument/foldingRange", outV: 2, inV: 112 }

相应的 FoldingRangeResult 定义如下

export interface FoldingRangeResult {
  label: 'foldingRangeResult';

  result: lsp.FoldingRange[];
}

请求：`textDocument/documentLink`

同样，对于文档链接，我们定义了一个结果类型和一个相应的边，将其链接到文档。由于链接位置通常出现在注释中，因此范围不表示任何符号声明或引用。因此，我们将范围内联到结果中，就像我们对折叠范围所做的那样。

export interface DocumentLinkResult {
  label: 'documentLinkResult';

  result: lsp.DocumentLink[];
}

请求：`textDocument/documentSymbol`

接下来我们看 textDocument/documentSymbol 请求。此请求通常以分层形式返回文档的大纲视图。但是，并非文档中声明或定义的所有编程符号都包含在结果中（例如，局部变量通常会被省略）。此外，大纲项需要提供额外的信息，例如完整范围和符号类型。我们可以通过两种方式建模：要么我们像处理折叠范围和文档链接一样，将信息作为字面值存储在文档符号结果中，要么我们用一些额外信息扩展范围顶点并在文档符号结果中引用这些范围。由于范围的额外信息在其他场景中也可能有用，我们通过在 range 顶点上定义 tag 属性来支持向这些范围添加额外的标签。

目前支持以下标签

/**
 * The range represents a declaration
 */
export interface DeclarationTag {

  /**
   * A type identifier for the declaration tag.
   */
  type: 'declaration';

  /**
   * The text covered by the range
   */
  text: string;

  /**
   * The kind of the declaration.
   */
  kind: lsp.SymbolKind;

  /**
   * The full range of the declaration not including leading/trailing whitespace
   * but everything else, e.g comments and code. The range must be included in
   * fullRange.
   */
  fullRange: lsp.Range;

  /**
   * Optional detail information for the declaration.
   */
  detail?: string;
}

/**
 * The range represents a definition
 */
export interface DefinitionTag {
  /**
   * A type identifier for the declaration tag.
   */
  type: 'definition';

  /**
   * The text covered by the range
   */
  text: string;

  /**
   * The symbol kind.
   */
  kind: lsp.SymbolKind;

  /**
   * The full range of the definition not including leading/trailing whitespace
   * but everything else, e.g comments and code. The range must be included in
   * fullRange.
   */
  fullRange: lsp.Range;

  /**
   * Optional detail information for the definition.
   */
  detail?: string;
}

/**
 * The range represents a reference
 */
export interface ReferenceTag {

  /**
   * A type identifier for the reference tag.
   */
  type: 'reference';

  /**
   * The text covered by the range
   */
  text: string;
}

/**
 * The type of the range is unknown.
 */
export interface UnknownTag {

  /**
   * A type identifier for the unknown tag.
   */
  type: 'unknown';

  /**
   * The text covered by the range
   */
  text: string;
}

为以下 TypeScript 示例发出标签

function hello() {
}

hello();

将如下所示

{ id: 2, type: "vertex", label: "document",
  uri: "file:///Users/dirkb/sample.ts", languageId: "typescript"
}
{ id: 4, type: "vertex", label: "resultSet" }
{ id: 7, type: "vertex", label: "range",
  start: { line: 0, character: 9 }, end: { line: 0, character: 14 },
  tag: {
    type: "definition", text: "hello", kind: 12,
    fullRange: {
      start: { line: 0, character: 0 }, end: { line: 1, character: 1 }
    }
  }
}

文档符号结果然后建模如下

export interface RangeBasedDocumentSymbol {

  id: RangeId

  children?: RangeBasedDocumentSymbol[];
}

export interface DocumentSymbolResult extends V {

  label: 'documentSymbolResult';

  result: lsp.DocumentSymbol[] | RangeBasedDocumentSymbol[];
}

给定的 TypeScript 示例

namespace Main {
  function hello() {
  }
  function world() {
    let i: number = 10;
  }
}

生成以下输出

// The document
{ id: 2 , type: "vertex", label: "document",
  uri: "file:///Users/dirkb/sample.ts", languageId: "typescript"
}
// The declaration of Main
{ id: 7 , type: "vertex", label: "range",
  start: { line: 0, character: 10 }, end: { line: 0, character: 14 },
  tag: {
    type: "definition", text: "Main", kind: 7,
    fullRange: {
      start: { line: 0, character: 0 }, end: { line: 5, character: 1 }
    }
  }
}
// The declaration of hello
{ id: 18 , type: "vertex", label: "range",
  start: { line: 1, character: 11 }, end: { line: 1, character: 16 },
  tag: {
    type: "definition", text: "hello", kind: 12,
    fullRange: {
      start: { line: 1, character: 2 }, end: { line: 2, character: 3 }
    }
  }
}
// The declaration of world
{ id: 29 , type: "vertex", label: "range",
  start: { line: 3, character: 11 }, end: { line: 3, character: 16 },
  tag: {
    type: "definition", text: "world", kind: 12,
    fullRange: {
      start: { line: 3, character: 2 }, end: { line: 4, character: 3 }
    }
  }
}
// The document symbol
{ id: 39 , type: "vertex", label: "documentSymbolResult",
  result: [ { id: 7 , children: [ { id: 18 }, { id: 29 } ] } ]
}
{ id: 40 , type: "edge", label: "textDocument/documentSymbol",
  outV: 2, inV: 39
}

请求：`textDocument/diagnostic`

在转储中缺少但有用的唯一信息是与文档相关的诊断。LSP 中的诊断被建模为从服务器发送到客户端的推送通知。这与基于请求方法名称建模的转储不兼容。但是，推送通知可以模拟为请求，其中请求的结果是作为参数在推送期间发送的值。

在转储中，我们将诊断模型如下

我们引入了一个伪请求 textDocument/diagnostic。
我们引入了一个诊断结果，其中包含与文档相关的诊断。

结果如下所示

export interface DiagnosticResult {

  label: 'diagnosticResult';

  result: lsp.Diagnostic[];
}

给定的 TypeScript 示例

function foo() {
  let x: string = 10;
}

生成以下输出

{ id: 2, type: "vertex", label: "document",
  uri: "file:///Users/dirkb/sample.ts", languageId: "typescript"
}
{ id: 18, type: "vertex", label: "diagnosticResult",
  result: [
    {
      severity: 1, code: 2322,
      message: "Type '10' is not assignable to type 'string'.",
      range: {
        start : { line: 1, character: 5 }, end: { line: 1, character: 6 }
      }
    }
  ]
}
{ id: 19, type: "edge", label: "textDocument/diagnostic", outV: 2, inV: 18 }

由于诊断在转储中并不常见，因此没有努力在诊断中重用范围。

项目顶点

通常，语言服务器在某种项目上下文中运行。在 TypeScript 中，项目是使用 tsconfig.json 文件定义的。C# 和 C++ 有自己的方式。项目文件通常包含编译选项和其他参数的信息。在转储中包含这些信息可能很有价值。因此，LSIF 定义了一个项目顶点。此外，所有属于该项目的文档都使用 contains 边连接到项目。如果前面的示例中有 tsconfig.json，则首先发出的边和顶点将如下所示

{ id: 1, type: "vertex", label: "project",
  resource: "file:///Users/dirkb/tsconfig.json", kind: "typescript"
}
{ id: 2, type: "vertex", label: "document",
  uri: "file:///Users/dirkb/sample.ts", languageId: "typescript"
}
{ id: 3, type: "edge", label: "contains", outV: 1, inVs: [2] }

project 顶点的定义如下所示

export interface Project extends V {

	/**
	 * The label property.
	 */
	label: VertexLabels.project;

	/**
	 * The project kind like 'typescript' or 'csharp'. See also the language ids
	 * in the specification.
   * See https://msdocs.cn/language-server-protocol/specification
	 */
	kind: string;

	/**
	 * The resource URI of the project file.
	 */
	resource?: Uri;

	/**
	 * Optional the content of the project file, `base64` encoded.
	 */
	contents?: string;
}

嵌入内容

将文档或项目文件的内容也嵌入到转储中可能很有价值。例如，如果文档的内容是程序元数据生成的虚拟文档。因此，索引格式支持 document 和 project 顶点上的可选 contents 属性。如果使用，内容需要进行 base64 编码。

高级概念

事件

为了方便处理 LSIF 转储（例如将其导入数据库），转储会为文档和项目发出开始和结束事件。在文档的结束事件发出后，转储不得包含任何进一步引用该文档的数据。例如，在该文档中的任何范围都不能在 item 边中引用。也不能引用链接到该文档中范围的结果集或其他顶点。但是，该文档可以在 contains 边中引用，将文档添加到项目中。文档的开始/结束事件如下所示：

// The actual document
{ id: 4, type: "vertex", label: "document",
  uri: "file:///Users/dirkb/sample.ts", languageId: "typescript",
  contents: "..."
}
// The begin event
{ id: 5, type: "vertex", label: "$event",
  kind: "begin", scope: "document" , data: 4
}
// The end event
{ id: 53, type: "vertex", label: "$event",
  kind: "end", scope: "document" , data: 4
}

在文档顶点 4 和文档开始事件 5 之间，不能发出任何特定于文档 4 的信息。请注意，在给定时间点可以打开多个文档，这意味着存在 n 个不同的文档开始事件，而没有相应的文档结束事件。

项目的事件看起来类似

{ id: 2, type: "vertex", label: "project", kind: "typescript" }
{ id: 4, type: "vertex", label: "document",
  uri: "file:///Users/dirkb/sample.ts", languageId: "typescript",
  contents: "..."
}
{ id: 5, type: "vertex", label: "$event",
  kind: "begin", scope: "document" , data: 4
}
{ id: 3, type: "vertex", label: "$event",
  kind: "begin", scope: "project", data: 2
}
{ id: 53, type: "vertex", label: "$event",
  kind: "end", scope: "document", data: 4
}
{ id: 54, type: "edge", label: "contains", outV: 2, inVs: [4] }
{ id: 55, type: "vertex", label: "$event",
  kind: "end", scope: "project", data: 2
}

项目导出和外部导入（标记）

0.5.0 版本中的更改

LSIF 的一个用例是为产品的发布版本（无论是库还是程序）创建转储。如果项目 P2 引用库 P1，那么如果这两个转储中的信息可以关联起来，那也将很有用。为了实现这一点，LSIF 引入了可选的标识符，可以通过相应的边链接到范围。这些标识符可以用来描述项目导出什么以及导入什么。让我们首先看看导出情况。

考虑以下名为 index.ts 的 TypeScript 文件

export function func(): void {
}

export class Emitter {
  private doEmit() {
  }

  public emit() {
    this.doEmit();
  }
}

{ id: 4, type: "vertex", label: "document",
  uri: "file:///Users/dirkb/index.ts", languageId: "typescript",
  contents: "..."
}
{ id: 11, type: "vertex", label: "resultSet" }
{ id: 12, type: "vertex", label: "moniker",
  kind: "export", scheme: "tsc", identifier: "lib/index:func", unique: "group"
}
{ id: 13, type: "edge", label: "moniker", outV: 11, inV: 12 }
{ id: 14, type: "vertex", label: "range",
  start: { line: 0, character: 16 }, end: { line: 0, character: 20 }
}
{ id: 15, type: "edge", label: "next", outV: 14, inV: 11 }

{ id: 18, type: "vertex", label: "resultSet" }
{ id: 19, type: "vertex", label: "moniker",
  kind: "export", scheme: "tsc", identifier: "lib/index:Emitter",
  unique: "group"
}
{ id: 20, type: "edge", label: "moniker", outV: 18, inV: 19 }
{ id: 21, type: "vertex", label: "range",
  start: { line: 3, character: 13 }, end: { line: 3, character: 20 }
}
{ id: 22, type: "edge", label: "next", outV: 21, inV: 18 }

{ id: 25, type: "vertex", label: "resultSet" }
{ id: 26, type: "vertex", label: "moniker",
  kind: "export", scheme: "tsc", identifier: "lib/index:Emitter.doEmit",
  unique: "group"
}
{ id: 27, type: "edge", label: "moniker", outV: 25, inV: 26 }
{ id: 28, type: "vertex", label: "range",
  start: { line: 4, character: 10 }, end: { line: 4, character: 16 }
}
{ id: 29, type: "edge", label: "next", outV: 28, inV: 25 }

{ id: 32, type: "vertex", label: "resultSet" }
{ id: 33, type: "vertex", label: "moniker",
  kind: "export", scheme: "tsc", identifier: "lib/index:Emitter.emit",
  unique: "group"
}
{ id: 34, type: "edge", label: "moniker", outV: 32, inV: 33 }
{ id: 35, type: "vertex", label: "range",
  start: { line: 7, character: 9 }, end: { line: 7, character: 13 }
}
{ id: 36, type: "edge", label: "next", outV: 35, inV: 32 }

这描述了 index.ts 中导出的声明，该声明带有一个标识符（例如，字符串格式的句柄），该标识符绑定到相应的范围声明。生成的标识符必须与位置无关且稳定，以便它可以用于在其他项目或文档中标识符号。它应该足够唯一，以避免在其他项目中匹配其他标识符，除非它们实际指向相同的符号。因此，标识符具有以下属性：

scheme 用于指示如何解释 identifiers。
identifier 用于实际标识符号。其结构对方案所有者是不透明的。在上面的示例中，标识符由 TypeScript 编译器 tsc 创建，并且只能与同样具有 tsc 方案的标识符进行比较。
kind 用于指示标识符是导出、导入还是项目本地。
unique 用于指示标识符的唯一性。有关此内容的更多信息，请参阅多项目部分。

另请注意，方法 Emitter#doEmit 具有导出标识符，尽管该方法是私有的。私有元素是否具有标识符取决于编程语言。由于 TypeScript 无法强制可见性（它编译为不具有此概念的 JS），我们将其视为可见。即使 TypeScript 语言服务器也是如此。查找所有引用确实可以找到对私有方法的所有引用，即使它被标记为可见性违规。

多项目系统

0.5.0 版本新增

当今大多数软件系统都由多个项目组成。总是为系统的所有项目创建 LSIF 转储，即使只有一个项目发生更改，也并不可行，特别是如果只更改了项目内部。因此，LSIF 自 0.4.0 版本以来允许为每个项目创建 LSIF 转储，并将其再次链接到 DB 中的更大系统。但是 0.4.0 缺少一些概念来实现这一点。为了说明这些概念，请考虑以下示例：

项目 P1

项目 P1 由一个 p1Main.ts 文件组成，内容如下：

export interface Disposable {
	dispose(): void;
}

let d: Disposable;
d.dispose();

项目 P2

项目 P2 依赖于 P1，由一个 p2Main.ts 文件组成，内容如下：

import { Disposable } from 'p1';

class Widget implements Disposable {
	public dispose(): void {
	}
}

let w: Widget;
w.dispose();

现在，如果用户搜索 Widget#dispose 的引用，则预期结果中会包含 P1 中 d.dispose 的引用。但是，在处理 P1 时，工具不知道 P2。在处理 P2 时，它通常不知道 P1 的来源。它只知道其 API 形状（例如，在 TypeScript 中，相应的 d.ts 文件）。

为了使其工作，我们首先需要将项目分组为更大的单元，以便我们知道 d.dispose 实际匹配哪些项目。假设有一个完全不相关的项目 PX 也使用来自 P1 的 Disposable，但 P2 从未与 PX 链接到同一个系统。因此，类型为 Widget 的对象永远不能流入 PX 中的代码，因此 PX 中的引用不应列出。因此，我们引入了组的概念，以逻辑上将项目分组到更大的系统中。项目属于一个组，组通过 URI 标识。让我们看看 P1 和 P2 的具体转储：

{id: 2, type: "vertex", label: "group",
  uri: "https://github.com/microsoft/lsif-node.git/samples/ts-cascade",
  conflictResolution: "takeDB", name: "ts-cascade",
  rootUri: "file:///Users/dirkb/samples/ts-cascade"
}
{id: 4, type: "vertex", label: "project", kind: "typescript", name: "p1" }
{id: 5, type: "edge", label: "belongsTo", outV: 4, inV:2 }

作为组 URI，使用了 GitHub 仓库中的路径。但是，如果 URI 应该与仓库无关，它也可以是类似 lsif-group:://com.microsoft/vscode/lsif-node/samples/ts-cascade 的形式。如果公司在许多不同的仓库系统中存储代码，这将很有用。id 为 5 的边将项目绑定到组。

项目 P2 的转储如下所示：

{id: 2, type: "vertex", label: "group",
  uri: "https://github.com/Microsoft/lsif-node.git/samples/ts-cascade",
  conflictResolution: "takeDB", name: "ts-cascade",
  rootUri: "file:///Users/dirkb/samples/ts-cascade"
}
{id: 4, type: "vertex", label: "project", kind: "typescript", name: "p2" }
{id: 5, type: "edge", label: "belongsTo", outV: 4, inV: 2 }

请注意，这会将 P2 绑定到 P1 所属的同一组。为了避免任何类型的组管理，组带有一个属性 conflictResolution，用于告诉数据库在数据库已包含具有给定 URL 的组时使用哪个组信息。takeDB 表示采用已存储在数据库中的信息，takeDump 表示转储中的信息应覆盖数据库值。

在可能的情况下，组 URI 应按层次结构组织，以允许将项目分组到更广泛的范围。例如，URI https://github.com/microsoft 应捕获 GitHub Microsoft 组织下的所有项目。

现在让我们看看如何确保搜索 Widget#dispose 的引用也能找到 P1 中的 d.dispose() 匹配。首先让我们看看 P1 的转储中关于 Disposable#dispose 会有哪些信息

// The result set for the Disposable#dispose symbol
{ id: 21, type: "vertex", label: "resultSet" }
// The export moniker of Disposable#dispose in P1 (note kind export).
{ id: 22, type: "vertex", label: "moniker",
  scheme: "tsc", identifier: "p1/lib/p1Main:Disposable.dispose",
  unique: "group", kind:"export"
}
{ id: 23, type: "edge", label: "moniker", outV: 21, inV: 22 }
// The actual definition of the symbol
{ id: 24, type: "vertex", label: "range",
  start: { line: 1, character: 1 }, end: { line: 1, character: 8 },
  tag: {
    type: definition, text: "dispose", kind: 7,
    fullRange: {
      start : { line: 1, character:1 }, end: { line: 1, character: 17 }
    }
  }
}
// Bind the reference result to the result set
{ id: 57, type: "vertex", label: "referenceResult" }
{ id: 58, type: "edge", label: "textDocument/references", outV: 21, inV: 57 }

这里有趣的是第 22 行，它定义了 Disposable#dispose 的标识符。它有一个新的属性 unique，表示该标识符在项目的 group 中是唯一的，但不一定在组外部是唯一的。unique 的其他可能值是

document 表示标识符仅在文档中唯一。例如用于局部变量或私有成员。
project 表示标识符仅在项目中唯一。例如用于项目内部符号。
group 表示标识符在项目组中唯一。例如用于导出的成员。
scheme 表示标识符在其方案中唯一。例如，如果标识符是为特定包管理器生成的（参见下面的 npm 示例），那么这些标识符通常在其方案中是唯一的（例如，所有为 npm 生成的标识符都带有 npm 方案并且是唯一的）
global 表示标识符全局唯一（例如，其标识符独立于方案或种类而唯一）

为 P2 生成转储时，Widget#dispose 的信息将如下所示：

// The import moniker for importing Disposable#dispose into P2
{ id: 22, type: "vertex", label: "moniker",
  scheme: "tsc", identifier: "p1/lib/p1Main:Disposable.dispose",
  unique: "group", kind: "import"
}

// The result set for Widget#dispose
{ id: 78, type: "vertex", label: "resultSet" }
// The moniker for Widget#dispose. Note that the moniker is local since the
// Widget class is not exported
{ id: 79, type: "vertex", label: "moniker",
  scheme: "tsc", identifier: "2Q46RTVRZTuVW1ajf68/Vw==",
  unique: "document", kind: "local"
}
{ id: 80, type: "edge", label: "moniker", outV: 78, inV: 79 }
// The actual definition of the symbol
{ id: 81, type: "vertex", label: "range",
  start: { line: 3, character: 8 }, end: { line: 3, character: 15 },
  tag: {
    type: "definition", text: "dispose", kind: 6,
    fullRange: {
      start: { line: 3, character: 1 }, end: { line: 4, character: 2 }
    }
  }
}
// Bind the reference result to Widget#dispose
{ id: 116, type: "vertex", label: "referenceResult" }
{ id: 117, type: "edge", label: "textDocument/references", outV: 78, inV: 116}
{ id: 118, type: "edge", label: "item",
  outV: 116, inVs: [43], shard: 52, property: "referenceResults"
}
// Link the reference result set of Disposable#dispose to this result set
// using a moniker
{ id: 119, type: "edge", label: "item",
  outV: 116, inVs: [22], shard: 52, property: "referenceLinks"
}
{ id: 120, type: "edge", label: "item",
  outV: 43, inVs: [81], shard: 52, property: "definitions"
}
{ id: 121, type: "edge", label: "item",
  outV: 43, inVs: [96], shard: 52, property: "references"
}

值得注意的部分是

id 为 22 的顶点：是来自 P1 的 Disposable#dispose 的导入标识符。
id 为 119 的边：这会向 Widget#dispose 的引用结果添加一个引用链接。带有 referenceLinks 的项边在概念上类似于带有 referenceResults 属性的项边。它们允许复合引用结果。不同之处在于，referenceResults 项边使用顶点 ID 引用另一个结果，因为引用结果是同一转储的一部分。referenceLinks 项边使用标识符引用另一个结果。因此，实际的解析需要在同时拥有 P1 和 P2 数据的数据库中进行。与 referenceResults 项边一样，语言服务器负责对最终范围进行去重。

包管理器

0.5.0 版本中的更改

在大多数编程语言中，导出的元素在其他项目中的可见性取决于文件如何打包成库或程序。在 TypeScript 中，标准包管理器是 npm。

假设存在以下 package.json 文件

{
  "name": "lsif-ts-sample",
  "version": "1.0.0",
  "description": "",
  "main": "lib/index.js",
  "author": "MS",
  "license": "MIT",
}

对于以下 TypeScript 文件（与上述相同）

export function func(): void {
}

export class Emitter {
  private doEmit() {
  }

  public emit() {
    this.doEmit();
  }
}

然后这些标识符可以转换为依赖于 npm 的标识符。我们没有替换标识符，而是发出了第二组标识符，并使用 attach 边将 tsc 标识符链接到相应的 npm 标识符。

{ id: 991, type: "vertex", label: "packageInformation",
  name: "lsif-ts-sample", manager: "npm", version: "1.0.0"
}

{ id: 987, type: "vertex", label: "moniker",
  kind: "export", scheme: "npm", identifier: "lsif-ts-sample::func",
  unique: "scheme"
}
{ id: 986, type: "edge", label: "packageInformation", outV: 987, inV: 991 }
{ id: 985, type: "edge", label: "attach", outV: 987, inV: 12 }

{ id: 984, type: "vertex", label: "moniker",
  kind: "export", scheme: "npm", identifier: "lsif-ts-sample::Emitter",
  unique: "scheme"
}
{ id: 983, type: "edge", label: "packageInformation", outV: 984, inV: 991 }
{ id: 982, type: "edge", label: "attach", outV: 984, inV: 19 }

{ id: 981, type: "vertex", label: "moniker",
  kind: "export", scheme: "npm",
  identifier: "lsif-ts-sample::Emitter.doEmit", unique: "scheme"
}
{ id: 980, type: "edge", label: "packageInformation", outV: 981, inV: 991 }
{ id: 979, type: "edge", label: "attach", outV: 981, inV: 26 }

{ id: 978, type: "vertex", label: "moniker",
  kind: "export", scheme: "npm",
  identifier: "lsif-ts-sample::Emitter.emit", unique: "scheme"
}
{ id: 977, type: "edge", label: "packageInformation", outV: 978, inV: 991 }
{ id: 976, type: "edge", label: "attach", outV: 978, inV: 33 }

需要注意的事项

发出了一个特殊的 packageInformation 顶点，指向相应的 npm 包信息。
npm 标记指的是包名。
其 unique 值为 scheme，表示标识符在所有 npm 标识符中是唯一的。
由于文件 index.ts 是 npm 主文件，因此标记标识符没有文件路径。这与将此模块导入 TypeScript 或 JavaScript 的情况相似，其中只使用模块名称而没有文件路径（例如 import * as lsif from 'lsif-ts-sample'）。
attach 边从 npm 标识符顶点指向 tsc 标识符顶点。

对于 LSIF，我们建议使用第二个工具来使索引器发出的标识符与包管理器相关。这支持使用不同的包管理器并允许整合自定义构建工具。在 TypeScript 实现中，这是通过一个特定的 npm 工具完成的，该工具根据 npm 包信息附加标识符。

报告导入外部符号采用相同的方法。LSIF 会发出 import 类型的标记。考虑以下 TypeScript 示例

import * as mobx from 'mobx';

let map: mobx.ObservableMap = new mobx.ObservableMap();

其中 mobx 是 npm mobx 包。运行 tsc 索引工具会生成

{ id: 41, type: "vertex", label: "document",
  uri: "file:///samples/node_modules/mobx/lib/types/observablemap.d.ts",
  languageId: "typescript", contents: "..."
}
{ id: 55, type: "vertex", label: "resultSet" }
{ id: 57, type: "vertex", label: "moniker",
  kind: "import", scheme: "tsc",
  identifier: "node_modules/mobx/lib/mobx:ObservableMap", unique: 'group'
}
{ id: 58, type: "edge", label: "moniker", outV: 55, inV: 57 }
{ id: 59, type: "vertex", label: "range",
  start: { line: 17, character: 538 }, end: { line: 17, character: 551 }
}
{ id: 60, type: "edge", label: "next", outV: 59, inV: 55 }

这里有三点需要注意：首先，TypeScript 使用声明文件来处理外部导入的符号。这带来了一个很好的效果，即标记信息可以附加到这些文件中的声明范围。在其他语言中，信息可能会附加到实际引用符号的文件。或者会为引用的项生成一个虚拟文档。其次，工具只为实际引用的符号生成此信息，而不是为所有可用符号生成。第三，这些标记是 tsc 特定的，并指向 node_modules 文件夹。

然而，通过 npm 工具处理这些信息将生成以下信息

{ id: 991, type: "vertex", label: "packageInformation",
  name: "mobx", manager: "npm", version: "5.6.0",
  repository: { type: "git", url: "git+https://github.com/mobxjs/mobx.git" }
}
{ id: 978, type: "vertex", label: "moniker",
  kind: "import", scheme: "npm", identifier: "mobx::ObservableMap",
  unique: 'scheme'
}
{ id: 977, type: "edge", label: "packageInformation", outV: 978, inV: 991 }
{ id: 976, type: "edge", label: "attach", outV: 978, inV: 57 }

这使得标识符特定于 npm mobx 包。此外，还发出了关于 mobx 包本身的信息。

通常，标记会附加到结果集，因为它们对于所有指向结果集的范围都是相同的。但是，对于不使用结果集的转储，标记也可以在范围上发出。

对于处理转储并将其导入数据库的工具，有时了解结果是文件本地的还是非本地的（例如，函数参数只能在文件内部导航）是很有用的。为了帮助后处理工具有效地决定这一点，LSIF 生成工具也应该为局部变量生成标识符。要使用的相应类型是 local。标识符在文档中仍应是唯一的。

对于以下示例

function foo(x: number): void {
}

x 的标记如下所示

{ id: 13, type: "vertex", label: "resultSet" }
{ id: 14, type: "vertex", label: "moniker",
  kind: "local", scheme: "tsc", identifier: "SfeOP6s53Y2HAkcViolxYA==",
  unique: 'document'
}
{ id: 15, type: "edge", label: "moniker", outV: 13, inV: 14 }
{ id: 16, type: "vertex", label: "range",
  start: { line: 0, character: 13 }, end: { line: 0, character: 14 },
  tag: {
    type: "definition", text: "x", kind: 7,
    fullRange: {
      start: { line: 0, character: 13 }, end: { line: 0, character: 22 }
    }
  }
}
{ id: 17, type: "edge", label: "next", outV: 16, inV: 13 }

此外，以 $ 开头的标记方案是保留的，LSIF 工具不应使用。

结果范围

LSIF 中的范围目前具有两种含义

它们充当文档中对 LSP 请求敏感的区域（例如，我们用它们来判断给定位置是否存在相应的 LSP 请求结果）
它们作为导航目标（例如，它们是“转到声明”导航的结果）。

为了满足第一点，LSIF 规定范围不能重叠或相同。然而，对于第二种含义，此约束并非必要。为了支持相同或重叠的目标范围，我们引入了一个顶点 resultRange。不允许将 resultRange 用作 contains 边的目标。

元数据顶点

0.5.0 版本中的更改

为了支持版本控制，LSIF 定义了一个元数据顶点，如下所示

export interface MetaData {

  /**
   * The label property.
   */
  label: 'metaData';

  /**
   * The version of the LSIF format using semver notation. See
   * https://semver.org/. Please note the version numbers starting with 0
   * don't adhere to semver and adopters have to assume the each new version
   * is breaking.
   */
  version: string;

  /**
   * The string encoding used to compute line and character values in
   * positions and ranges. Currently only 'utf-16' is support due to the
   * limitations in LSP.
   */
  positionEncoding: 'utf-16',

  /**
   * Information about the tool that created the dump
   */
  toolInfo?: {
    name: string;
    version?: string;
    args?: string[];
  }
}

发出约束

存在以下发出约束（其中一些已在文档中提及）

一个顶点必须在其在边中被引用之前发出。
一个 range 和 resultRange 只能包含在一个文档中。
resultRange 不能用作 contains 边中的目标。
文档结束事件发出后，只有通过该文档发出的结果集、引用或实现结果才能在边中引用。例如，不允许引用该文档中的范围或结果范围。这也包括向范围或结果集添加标识符。可以说，文档数据不能再被更改。
如果范围指向结果集并且发出标记，则它们必须在结果集上发出，而不能在单个范围上发出。

附加信息

工具

lsif-protocol：协议定义为 TypeScript 接口
lsif-util：LSIF 开发实用工具
lsif-tsc：TypeScript 的 LSIF 索引器
lsif-npm：NPM 标记链接器

未决问题

在为 TypeScript 和 npm 实现此功能时，我们收集了一系列我们已经意识到的以 GitHub 问题形式存在的未决问题。