主题

语言服务器索引格式规范 - 0.6.0

LSIF 0.6.0 版本目前正在构建中。

语言服务器索引格式

语言服务器索引格式 (LSIF) 的目的是为语言服务器或其他编程工具定义一种标准格式，用于转储它们关于工作区的知识。此转储以后可用于在不运行语言服务器本身的情况下，回答针对同一工作区的语言服务器 LSP 请求。由于许多信息会因工作区更改而失效，因此转储的信息通常不包括在修改文档时使用的请求。例如，代码完成请求的结果通常不属于此类转储。

更新日志

0.6.0 版

存储实现者的反馈表明，将项目分组为更大的存储单元的概念不应在 LSIF 本身中定义。它应该留给存储后端。因此，0.5.0 版中引入的 Group 顶点被再次删除。由于 Group 顶点中捕获的一些信息通常很有用，因此引入了 Source 顶点来存储这些信息。

通过 LSIF 支持文件语义着色。为了支持这一点，添加了“textDocument/semanticTokens/full”请求。

0.5.0 版

在 0.4.0 版本中，增加了按项目（按其反向依赖顺序）转储大型系统，然后通过使用其对应的代号链接结果集将转储重新组合到数据库中的支持。格式的使用表明缺少一些功能才能使其正常工作

支持逻辑分组项目。为此，添加了一个 Group 顶点。
知道一个代号有多唯一。为此，在 Moniker 中添加了一个 unique 属性。
nextMoniker 边被更通用的 attach 边替换。这是可能的，因为代号现在带有 unique 属性，该属性以前在 nextMoniker 边的方向中编码。
在支持多态性的编程语言中，运行时调用可以绑定到与静态已知不同的类型。一个例子是面向对象编程语言中被覆盖的方法。由于转储可以按项目创建，我们需要向转储添加额外的信息，以便可以捕获这些多态绑定。因此引入了引用链接的通用概念（参见多个项目部分）。简而言之，它允许工具使用属性值 referenceLinks 来注释 item 边。
为了更好地将输出分片，items 边带有一个附加属性 shard。该属性在 0.5 规范的早期版本中名为 document。

规范的旧 0.4.0 版本可在此处获取

版本 0.4.0

截至 0.4.0 版本，LSIF 格式的重点是简化语言工具提供商的转储生成。然而，这使得转储的消费者很难高效地将其导入数据库，除非数据库格式与 LSIF 格式一一映射。本规范版本试图通过要求工具提供商在某些数据准备好供使用时发出额外的事件来平衡这一点。它还增加了按文档分区数据的支持。

由于 0.4.0 版本对 LSIF 的某些方面进行了更深层次的更改，因此可以在此处获取旧的 0.3.x 版规范。

动机

主要设计目标

该格式不应暗示使用某种特定的持久化技术。
定义的数据应尽可能地模仿语言服务器协议进行建模，以便无需进一步转换即可通过 LSP 提供数据。
存储的数据是通常从 LSP 请求返回的结果数据。转储不包含任何程序符号信息，LSIF 也不定义任何符号语义（例如，符号在哪里定义或引用，或者一个方法何时覆盖另一个方法）。因此，LSIF 不定义符号数据库。请注意，这与 LSP 本身一致，LSP 本身也不定义任何符号语义。
输出格式将基于 JSON，与 LSP 相同。

LSIF 中适合支持的 LSP 请求是

textDocument/documentSymbol
textDocument/foldingRange
textDocument/documentLink
textDocument/definition
textDocument/declaration
textDocument/typeDefinition
textDocument/hover
textDocument/references
textDocument/implementation

相应的 LSP 请求具有以下两种形式之一

request(uri, method) -> result
request(uri, position, method) -> result

其中 method 是 JSON-RPC 请求方法。

具体示例如下

request(
  'file:///Users/dirkb/sample/test.ts',
  'textDocument/foldingRange'
) -> FoldingRange[];
request(
  'file:///Users/dirkb/sample/test.ts',
  { line: 10, character: 17 },
  'textDocument/hover'
) -> Hover;

请求的输入元组可以是 [uri, method] 或 [uri, position, method]，输出是某种形式的结果。对于相同的 uri 和 [uri, position] 元组，有许多不同的请求可执行。

因此，转储格式应支持以下功能

输入数据必须易于查询（例如，文档和位置）。
每个元素都有一个唯一的 ID（可以是字符串或数字）。
应尽可能早地发出数据，以便进行流式传输，而不是占用大量内存。例如，基于文档语法的发出数据应在每个文件解析进行时完成。
以后添加额外的请求应该很容易。
工具应该很容易消费转储，例如，将其导入数据库而无需将转储保留在内存中。

我们得出结论，最灵活的发出方式是图，其中边表示方法，顶点是 [uri]、[uri, position] 或请求结果。然后，这些数据可以存储为 JSON 或读入可以表示这些顶点和关系的数据库。

假设有一个文件 /Users/dirkb/sample.ts，我们想用它存储折叠范围信息，那么索引器会发出两个顶点：一个表示 URI 为 file:///Users/dirkb/sample.ts 的文档，另一个表示折叠结果。此外，还会发出一条边，表示 textDocument/foldingRange 请求。

{ id: 1, type: "vertex", label: "document",
  uri: "file:///Users/dirkb/sample.ts", languageId: "typescript"
}
{ id: 2, type: "vertex", label: "foldingRangeResult",
  result: [ { ... }, { ... }, ... ]
}
{ id: 3, type: "edge", label: "textDocument/foldingRange", outV: 1, inV: 2 }

相应的图如下所示

Folding Range Result

范围

对于以位置作为输入的请求，我们还需要存储位置。通常，LSP 请求对于指向文档中相同单词/名称的位置返回相同的结果。以以下 TypeScript 示例为例

function bar() {
}

对于表示 bar 中的 b 的位置的悬停请求，将返回与表示 a 或 r 的位置相同的结果。为了使转储更紧凑，它将使用范围而不是单个位置来捕获此信息。在这种情况下将发出以下顶点。请注意，行和字符从零开始，与 LSP 中相同

{ id: 4, type: "vertex", label: "range",
  start: { line: 0, character: 9}, end: { line: 0, character: 12 }
}

为了将范围绑定到文档，我们使用一个特殊的标记为 contains 的边，它从文档指向一组范围。

{ id: 5, type: "edge", label: "contains", outV: 1, inVs: [4] }

LSIF 支持 contains 关系的 1:n 边，这在图中可以很容易地映射到 n 个 1:1 边。LSIF 支持此功能有两个原因：(a) 使输出更紧凑，因为一个文档通常包含数百个此类范围；(b) 简化 LSIF 转储消费者的导入和批量处理。

为了将悬停结果绑定到范围，我们使用与折叠范围相同的模式。我们发出一个表示悬停结果的顶点，以及一个表示 textDocument/hover 请求的边。

{
  id: 6,
  type: "vertex",
  label: "hoverResult",
  result: {
    contents: [
      { language: "typescript", value: "function bar(): void" }
    ]
  }
}
{ id: 7, type: "edge", label: "textDocument/hover", outV: 4, inV: 6 }

相应的图如下所示

Hover Result

文档中包含关系中发出的范围必须遵循以下规则

给定范围 ID 只能包含在一个文档中，换句话说：即使范围具有相同的开始/结束值，也不得在文档之间共享。
不能有两个范围相等。
不能有两个范围重叠，除非一个范围完全包含在另一个范围中，否则不得在文档中占据相同的位置。

如果文档中的某个位置映射到某个范围，并且有多个范围覆盖该位置，则应使用以下算法

按包含关系对范围进行排序，最内层的排在最前面
对于范围中的每个范围，执行以下操作：
1. 检查该范围是否具有传出边 textDocument/${method}
2. 如果存在，则使用它
结束
返回 null

通常，无论您悬停在函数的定义上还是函数的引用上，悬停结果都是相同的。许多 LSP 请求（如 textDocument/definition、textDocument/references 或 textDocument/typeDefinition）也是如此。在简单的模型中，每个范围都将具有所有这些 LSP 请求的出边，并指向相应的结果。为了优化这一点并使图更容易理解，引入了 ResultSet 的概念。结果集充当一个枢纽，能够存储许多范围共有的信息。ResultSet 本身不携带任何信息。因此它看起来像这样

export interface ResultSet {
}

上面示例中，使用结果集的悬停的相应输出如下所示

{ id: 1, type: "vertex", label: "document",
  uri: "file:///Users/dirkb/sample.ts", languageId: "typescript"
}
{ id: 2, type: "vertex", label: "resultSet" }
{ id: 3, type: "vertex", label: "range",
  start: { line: 0, character: 9}, end: { line: 0, character: 12 }
}
{ id: 4, type: "edge", label: "contains", outV: 1, inVs: [3] }
{ id: 5, type: "edge", label: "next", outV: 3, inV: 2 }
{ id: 6, type: "vertex", label: "hoverResult",
  result: {
    "contents":[ {
      language: "typescript", value:"function bar(): void"
    }]
  }
}
{ id: 7, type: "edge", label: "textDocument/hover", outV: 2, inV: 6 }

Result Set

结果集通过 next 边链接到范围。结果集还可以通过使用 next 边链接到另一个结果集来转发信息。

将结果与 ResultSet 存储的模式也将用于其他请求。因此，对于请求 [document, position, method]，查找算法如下

找到 [document, position] 的所有范围。如果不存在，则返回 null 作为结果。
按包含关系对范围进行排序，最内层的排在最前面。
对于范围中的每个范围，执行以下操作：
1. 将范围分配给 out。
2. 当 out !== null 时
  1. 检查 out 是否具有传出边 textDocument/${method}。如果存在，则使用它并返回相应的结果。
  2. 检查 out 是否具有传出 next 边。如果存在，则将 out 设置为目标顶点。否则将 out 设置为 null。
3. 结束
结束
否则返回 null

语言特性

请求：`textDocument/definition`

将范围、结果集或文档与请求边连接到方法结果的相同模式也用于其他请求。接下来我们看看使用以下 TypeScript 示例的 textDocument/definition 请求

function bar() {
}

function foo() {
  bar();
}

这将发出以下顶点和边来建模 textDocument/definition 请求

// The document
{ id: 4, type: "vertex", label: "document",
  uri: "file:///Users/dirkb/sample.ts", languageId: "typescript"
}

// The result set
{ id: 6, type: "vertex", label: "resultSet" }

// The bar declaration
{ id: 9, type: "vertex", label: "range",
  start: { line: 0, character: 9 }, end: { line: 0, character: 12 }
}
{ id: 10, type: "edge", label: "next", outV: 9, inV: 6 }


// The bar reference
{ id: 20, type: "vertex", label: "range",
  start: { line: 4, character: 2 }, end: { line: 4, character: 5 }
}
{ id: 21, type: "edge", label: "next", outV: 20, inV: 6}

// The definition result linked to the bar result set
{ id: 22, type: "vertex", label: "definitionResult" }
{ id: 23, type: "edge", label: "textDocument/definition", outV: 6, inV: 22 }
{ id: 24, type: "edge", label: "item", outV: 22, inVs: [9], shard: 4 }

Definition Result

上面的定义结果只有一个值（ID 为 '9' 的范围），我们可以直接发出它。但是，我们引入定义结果顶点有两个原因

为了与所有指向结果的其他请求保持一致。
为了支持定义可以分布在多个范围甚至多个文档中的语言。为了支持多个文档，使用 1:N item 边将范围添加到定义结果中。从概念上讲，定义结果是一个数组，item 边将项目添加到其中。

考虑以下 TypeScript 示例

interface X {
  foo();
}
interface X {
  bar();
}
let x: X;

在 let x: X 中对 X 运行 转到定义 将显示一个对话框，允许用户在 interface X 的两个定义之间进行选择。在这种情况下，发出的 JSON 如下所示

{ id : 38, type: "vertex", label: "definitionResult" }
{ id : 40, type: "edge", label: "item", outV: 38, inVs: [9, 13], shard: 4 }

item 边具有一个附加属性 shard，它指示这些声明的来源顶点（例如文档或项目）。我们添加了这些信息，以便仍然可以轻松地发出数据，但也使其在存储到数据库时易于处理和分片。如果没有这些信息，我们要么需要指定数据的发出顺序（例如，一个 item 边并且只引用已经使用 contains 边添加到文档的范围），要么我们强制处理工具将大量顶点和边保存在内存中。这种拥有 shard 属性的方法看起来是一个公平的平衡。

请求：`textDocument/declaration`

有些编程语言具有声明和定义的概念（例如 C/C++）。如果是这种情况，转储可以包含一个相应的 declarationResult 顶点和一条 textDocument/declaration 边来存储信息。它们的处理方式与为 textDocument/definition 请求发出的实体类似。

更多关于请求的信息：`textDocument/hover`

在 LSP 中，悬停定义如下

export interface Hover {
  /**
   * The hover's content
   */
  contents: MarkupContent | MarkedString | MarkedString[];

  /**
   * An optional range
   */
  range?: Range;
}

其中可选的 range 是悬停单词的名称范围。

旁注：这是一种也用于其他 LSP 请求的模式，其中结果包含位置参数指向的单词的单词范围。

这使得悬停在每个位置都不同，因此我们无法真正将其与结果集一起存储。但是，等等，该范围是已发出并用于开始计算结果的 bar 引用之一的范围。为了使悬停仍然可重用，我们要求索引服务器在结果中未定义范围时填充起始范围。因此，对于在范围 { line: 4, character: 2 }, end: { line: 4, character: 5 } 上执行的悬停请求，悬停结果将是

{ id: 6, type: "vertex", label: "hoverResult",
  result: {
    contents: [ { language: "typescript", value: "function bar(): void" } ],
    range: { line: 4, character: 2 }, end: { line: 4, character: 5 }
  }
}

请求：`textDocument/references`

存储引用将以与存储悬停或跳转到定义范围相同的方式完成。它使用一个引用结果顶点和 item 边将范围添加到结果中。

请看以下示例

function bar() {
}

function foo() {
  bar();
}

请求：`textDocument/implementation`

支持 textDocument/implementation 请求是通过重用我们为 textDocument/references 请求实现的内容来完成的。在大多数情况下，textDocument/implementation 返回符号声明指向的引用结果的声明值。对于结果不同的情况，LSIF 提供了 ImplementationResult。为了嵌套实现结果，item 边支持 property 值 "implementationResults"。

相应的 ImplementationResult 如下所示

interface ImplementationResult {

  label: `implementationResult`
}

请求：`textDocument/typeDefinition`

支持 textDocument/typeDefinition 是直接的。该边要么记录在范围上，要么记录在 ResultSet 上。

相应的 TypeDefinitionResult 如下所示

interface TypeDefinitionResult {

  label: `typeDefinitionResult`
}

对于以下 TypeScript 示例

interface I {
  foo(): void;
}

let i: I;

文档请求

语言服务器协议还支持仅针对文档的请求（没有任何位置信息）。这些请求是 textDocument/foldingRange、textDocument/documentLink、textDocument/documentSymbol 和 textDocument/semanticTokens/full。我们遵循与之前相同的模式来建模这些请求，不同之处在于结果链接到文档而不是范围。

请求：`textDocument/foldingRange`

对于折叠范围结果，它看起来像这样

function hello() {
  console.log('Hello');
}

function world() {
  console.log('world');
}

function space() {
  console.log(' ');
}
hello();space();world();

{ id: 2, type: "vertex", label: "document",
  uri: "file:///Users/dirkb/sample.ts", languageId: "typescript"
}
{ id: 112, type: "vertex", label: "foldingRangeResult", result:[
  { startLine: 0, startCharacter: 16, endLine: 2, endCharacter: 1 },
  { startLine: 4, startCharacter: 16, endLine: 6, endCharacter: 1 },
  { startLine: 8, startCharacter: 16, endLine: 10, endCharacter: 1 }
]}
{ id: 113, type: "edge", label: "textDocument/foldingRange", outV: 2, inV: 112 }

相应的 FoldingRangeResult 定义如下

export interface FoldingRangeResult {
  label: 'foldingRangeResult';

  result: lsp.FoldingRange[];
}

请求：`textDocument/documentLink`

同样，对于文档链接，我们定义了一个结果类型和一个相应的边，将其链接到文档。由于链接位置通常出现在注释中，因此范围不表示任何符号声明或引用。因此，我们将范围内联到结果中，就像我们对折叠范围所做的那样。

export interface DocumentLinkResult {
  label: 'documentLinkResult';

  result: lsp.DocumentLink[];
}

请求：`textDocument/documentSymbol`

接下来我们看 textDocument/documentSymbol 请求。此请求通常以分层形式返回文档的概要视图。然而，并非文档中声明或定义的所有编程符号都包含在结果中（例如，局部变量通常被省略）。此外，概要项需要提供额外的信息，如完整范围和符号类型。我们可以通过两种方式建模：要么我们像处理折叠范围和文档链接那样，将信息作为文字存储在文档符号结果中，要么我们扩展范围顶点并添加一些额外信息，并在文档符号结果中引用这些范围。由于范围的附加信息在其他场景中也可能有用，我们通过在 range 顶点上定义一个 tag 属性来支持向这些范围添加额外的标签。

目前支持以下标签

/**
 * The range represents a declaration
 */
export interface DeclarationTag {

  /**
   * A type identifier for the declaration tag.
   */
  type: 'declaration';

  /**
   * The text covered by the range
   */
  text: string;

  /**
   * The kind of the declaration.
   */
  kind: lsp.SymbolKind;

  /**
   * The full range of the declaration not including leading/trailing whitespace
   * but everything else, e.g comments and code. The range must be included in
   * fullRange.
   */
  fullRange: lsp.Range;

  /**
   * Optional detail information for the declaration.
   */
  detail?: string;
}

/**
 * The range represents a definition
 */
export interface DefinitionTag {
  /**
   * A type identifier for the declaration tag.
   */
  type: 'definition';

  /**
   * The text covered by the range
   */
  text: string;

  /**
   * The symbol kind.
   */
  kind: lsp.SymbolKind;

  /**
   * The full range of the definition not including leading/trailing whitespace
   * but everything else, e.g comments and code. The range must be included in
   * fullRange.
   */
  fullRange: lsp.Range;

  /**
   * Optional detail information for the definition.
   */
  detail?: string;
}

/**
 * The range represents a reference
 */
export interface ReferenceTag {

  /**
   * A type identifier for the reference tag.
   */
  type: 'reference';

  /**
   * The text covered by the range
   */
  text: string;
}

/**
 * The type of the range is unknown.
 */
export interface UnknownTag {

  /**
   * A type identifier for the unknown tag.
   */
  type: 'unknown';

  /**
   * The text covered by the range
   */
  text: string;
}

为以下 TypeScript 示例发出标签

function hello() {
}

hello();

将如下所示

{ id: 2, type: "vertex", label: "document",
  uri: "file:///Users/dirkb/sample.ts", languageId: "typescript"
}
{ id: 4, type: "vertex", label: "resultSet" }
{ id: 7, type: "vertex", label: "range",
  start: { line: 0, character: 9 }, end: { line: 0, character: 14 },
  tag: {
    type: "definition", text: "hello", kind: 12,
    fullRange: {
      start: { line: 0, character: 0 }, end: { line: 1, character: 1 }
    }
  }
}

文档符号结果然后建模如下

export interface RangeBasedDocumentSymbol {

  id: RangeId

  children?: RangeBasedDocumentSymbol[];
}

export interface DocumentSymbolResult extends V {

  label: 'documentSymbolResult';

  result: lsp.DocumentSymbol[] | RangeBasedDocumentSymbol[];
}

给定的 TypeScript 示例

namespace Main {
  function hello() {
  }
  function world() {
    let i: number = 10;
  }
}

生成以下输出

// The document
{ id: 2 , type: "vertex", label: "document",
  uri: "file:///Users/dirkb/sample.ts", languageId: "typescript"
}
// The declaration of Main
{ id: 7 , type: "vertex", label: "range",
  start: { line: 0, character: 10 }, end: { line: 0, character: 14 },
  tag: {
    type: "definition", text: "Main", kind: 7,
    fullRange: {
      start: { line: 0, character: 0 }, end: { line: 5, character: 1 }
    }
  }
}
// The declaration of hello
{ id: 18 , type: "vertex", label: "range",
  start: { line: 1, character: 11 }, end: { line: 1, character: 16 },
  tag: {
    type: "definition", text: "hello", kind: 12,
    fullRange: {
      start: { line: 1, character: 2 }, end: { line: 2, character: 3 }
    }
  }
}
// The declaration of world
{ id: 29 , type: "vertex", label: "range",
  start: { line: 3, character: 11 }, end: { line: 3, character: 16 },
  tag: {
    type: "definition", text: "world", kind: 12,
    fullRange: {
      start: { line: 3, character: 2 }, end: { line: 4, character: 3 }
    }
  }
}
// The document symbol
{ id: 39 , type: "vertex", label: "documentSymbolResult",
  result: [ { id: 7 , children: [ { id: 18 }, { id: 29 } ] } ]
}
{ id: 40 , type: "edge", label: "textDocument/documentSymbol",
  outV: 2, inV: 39
}

请求：`textDocument/diagnostic`

在转储中缺少但有用的唯一信息是与文档相关的诊断。LSP 中的诊断被建模为从服务器发送到客户端的推送通知。这与基于请求方法名称建模的转储不兼容。但是，推送通知可以模拟为请求，其中请求的结果是作为参数在推送期间发送的值。

在转储中，我们将诊断模型如下

我们引入了一个伪请求 textDocument/diagnostic。
我们引入了一个诊断结果，其中包含与文档相关的诊断。

结果如下所示

export interface DiagnosticResult {

  label: 'diagnosticResult';

  result: lsp.Diagnostic[];
}

给定的 TypeScript 示例

function foo() {
  let x: string = 10;
}

生成以下输出

{ id: 2, type: "vertex", label: "document",
  uri: "file:///Users/dirkb/sample.ts", languageId: "typescript"
}
{ id: 18, type: "vertex", label: "diagnosticResult",
  result: [
    {
      severity: 1, code: 2322,
      message: "Type '10' is not assignable to type 'string'.",
      range: {
        start : { line: 1, character: 5 }, end: { line: 1, character: 6 }
      }
    }
  ]
}
{ id: 19, type: "edge", label: "textDocument/diagnostic", outV: 2, inV: 18 }

由于诊断在转储中并不常见，因此没有努力在诊断中重用范围。

请求：`textDocument/semanticTokens/full`

最后，textDocument/semanticTokens/full 边和 SemanticTokensResult 类型定义了一种导出文本中范围语义信息的方法。此机制主要是一种分类关注点的方法，以便启用额外的、更丰富的语义代码着色，指示无法仅通过语法解析器确定的代码信息（着色已解析的类型、按可见性格式化等）

function hello() {
  console.log('Hello');
}

{ id: 2, type: "vertex", label: "document",
  uri: "file:///Users/dirkb/sample.ts", languageId: "typescript"
}
{ id: 112, type: "vertex", label: "semanticTokensResult", result: {"data": [1, 2, 7, 0, 0] } }
{ id: 113, type: "edge", label: "textDocument/semanticTokens", outV: 2, inV: 112 }

对应的 SemanticTokensResult 定义如下

export interface SemanticTokensResult {
  label: 'semanticTokensResult';

  result: lsp.SemanticTokens;
}

上面的示例定义了一个在“console”处的单个 5 整数编码范围。

SemanticTokens 类型、其 data 成员以及其中整数的编码都与语言服务器协议中这些相同概念的表示一致。

每个 5 整数编码令牌中的第 4 个和第 5 个整数分别代表 SemanticTokensType 和 SemanticTokensModifiers。与 LSP 非常相似，这些整数通过功能顶点中的条目映射到令牌类型名称和令牌修饰符名称。

例如，下面声明的 object 语义令牌类型映射到 0。任何第 4 个整数为 0 的令牌都将被视为 object，用于着色目的。

{
    "semanticTokensProvider": {
      "tokenTypes": [ "object" ],
      "tokenModifiers": [ "static" ]
  },
  "label": "capabilities"
}

有关更多信息，请参阅 LSP 语义令牌协议。

项目顶点

通常，语言服务器在某种项目上下文中运行。在 TypeScript 中，项目是使用 tsconfig.json 文件定义的。C# 和 C++ 有自己的方式。项目文件通常包含编译选项和其他参数的信息。在转储中包含这些信息可能很有价值。因此，LSIF 定义了一个项目顶点。此外，所有属于该项目的文档都使用 contains 边连接到项目。如果前面的示例中有 tsconfig.json，则首先发出的边和顶点将如下所示

{ id: 1, type: "vertex", label: "project",
  resource: "file:///Users/dirkb/tsconfig.json", kind: "typescript"
}
{ id: 2, type: "vertex", label: "document",
  uri: "file:///Users/dirkb/sample.ts", languageId: "typescript"
}
{ id: 3, type: "edge", label: "contains", outV: 1, inVs: [2] }

project 顶点的定义如下所示

export interface Project extends V {

	/**
	 * The label property.
	 */
	label: VertexLabels.project;

	/**
	 * The project kind like 'typescript' or 'csharp'. See also the language ids
	 * in the specification.
   * See https://msdocs.cn/language-server-protocol/specification
	 */
	kind: string;

	/**
	 * The resource URI of the project file.
	 */
	resource?: Uri;

	/**
	 * Optional the content of the project file, `base64` encoded.
	 */
	contents?: string;
}

嵌入内容

将文档或项目文件的内容也嵌入到转储中可能很有价值。例如，如果文档的内容是程序元数据生成的虚拟文档。因此，索引格式支持 document 和 project 顶点上的可选 contents 属性。如果使用，内容需要进行 base64 编码。

高级概念

事件

为了简化 LSIF 转储的处理，例如将其导入数据库，转储会发出文档和项目的开始和结束事件。在发出文档的结束事件后，转储不得包含任何进一步引用该文档的数据。例如，该文档中的范围不能在 item 边中引用。结果集或链接到该文档中范围的其他顶点也不能。但是，该文档可以在 contains 边中引用，将文档添加到项目中。文档的开始/结束事件如下所示

// The actual document
{ id: 4, type: "vertex", label: "document",
  uri: "file:///Users/dirkb/sample.ts", languageId: "typescript",
  contents: "..."
}
// The begin event
{ id: 5, type: "vertex", label: "$event",
  kind: "begin", scope: "document" , data: 4
}
// The end event
{ id: 53, type: "vertex", label: "$event",
  kind: "end", scope: "document" , data: 4
}

在文档顶点 4 和文档开始事件 5 之间，不能发出任何特定于文档 4 的信息。请注意，在给定时间点可以打开多个文档，这意味着存在 n 个不同的文档开始事件，而没有相应的文档结束事件。

项目的事件看起来类似

{ id: 2, type: "vertex", label: "project", kind: "typescript" }
{ id: 4, type: "vertex", label: "document",
  uri: "file:///Users/dirkb/sample.ts", languageId: "typescript",
  contents: "..."
}
{ id: 5, type: "vertex", label: "$event",
  kind: "begin", scope: "document" , data: 4
}
{ id: 3, type: "vertex", label: "$event",
  kind: "begin", scope: "project", data: 2
}
{ id: 53, type: "vertex", label: "$event",
  kind: "end", scope: "document", data: 4
}
{ id: 54, type: "edge", label: "contains", outV: 2, inVs: [4] }
{ id: 55, type: "vertex", label: "$event",
  kind: "end", scope: "project", data: 2
}

项目导出和外部导入（标记）

0.5.0 版中已更改

LSIF 的一个用例是为产品的发布版本（无论是库还是程序）创建转储。如果项目 P2 引用库 P1，那么如果这两个转储中的信息可以关联起来，也将非常有用。为了实现这一点，LSIF 引入了可选的代号，可以使用相应的边将其链接到范围。代号可用于描述项目导出和导入的内容。让我们首先看看导出情况。

考虑以下名为 index.ts 的 TypeScript 文件

export function func(): void {
}

export class Emitter {
  private doEmit() {
  }

  public emit() {
    this.doEmit();
  }
}

{ id: 4, type: "vertex", label: "document",
  uri: "file:///Users/dirkb/index.ts", languageId: "typescript",
  contents: "..."
}
{ id: 11, type: "vertex", label: "resultSet" }
{ id: 12, type: "vertex", label: "moniker", kind: "export",
  scheme: "tsc", identifier: "lib/index:func", unique: "workspace"
}
{ id: 13, type: "edge", label: "moniker", outV: 11, inV: 12 }
{ id: 14, type: "vertex", label: "range",
  start: { line: 0, character: 16 }, end: { line: 0, character: 20 }
}
{ id: 15, type: "edge", label: "next", outV: 14, inV: 11 }

{ id: 18, type: "vertex", label: "resultSet" }
{ id: 19, type: "vertex", label: "moniker",
  kind: "export", scheme: "tsc", identifier: "lib/index:Emitter",
  unique: "workspace"
}
{ id: 20, type: "edge", label: "moniker", outV: 18, inV: 19 }
{ id: 21, type: "vertex", label: "range",
  start: { line: 3, character: 13 }, end: { line: 3, character: 20 }
}
{ id: 22, type: "edge", label: "next", outV: 21, inV: 18 }

{ id: 25, type: "vertex", label: "resultSet" }
{ id: 26, type: "vertex", label: "moniker",
  kind: "export", scheme: "tsc", identifier: "lib/index:Emitter.doEmit",
  unique: "workspace"
}
{ id: 27, type: "edge", label: "moniker", outV: 25, inV: 26 }
{ id: 28, type: "vertex", label: "range",
  start: { line: 4, character: 10 }, end: { line: 4, character: 16 }
}
{ id: 29, type: "edge", label: "next", outV: 28, inV: 25 }

{ id: 32, type: "vertex", label: "resultSet" }
{ id: 33, type: "vertex", label: "moniker",
  kind: "export", scheme: "tsc", identifier: "lib/index:Emitter.emit",
  unique: "workspace"
}
{ id: 34, type: "edge", label: "moniker", outV: 32, inV: 33 }
{ id: 35, type: "vertex", label: "range",
  start: { line: 7, character: 9 }, end: { line: 7, character: 13 }
}
{ id: 36, type: "edge", label: "next", outV: 35, inV: 32 }

这描述了 index.ts 中导出的声明，其中包含一个绑定到相应范围声明的代号（例如，字符串格式的句柄）。生成的代号必须与位置无关且稳定，以便它可以在其他项目或文档中用于标识符号。它应该足够唯一，以避免在其他项目中匹配其他代号，除非它们确实引用相同的符号。因此，代号具有以下属性

scheme 用于指示 identifiers 的解释方式。
identifier 实际上用于标识符号。其结构对 scheme 所有者不透明。在上面的示例中，代号由 TypeScript 编译器 tsc 创建，并且只能与同样具有 tsc scheme 的代号进行比较。
kind 用于指示代号是导出、导入还是项目本地。
unique 用于指示代号的唯一性。有关此内容的更多信息，请参阅多项目部分。

另请注意，Emitter#doEmit 方法有一个导出代号，尽管该方法是私有的。私有元素是否具有代号取决于编程语言。由于 TypeScript 无法强制执行可见性（它编译为不具有此概念的 JS），我们将其视为可见。即使 TypeScript 语言服务器也是如此。查找所有引用确实会查找所有对私有方法的引用，即使它被标记为可见性违规。

多项目系统

0.5.0 版新增，0.6.0 版修改

当今大多数软件系统都由多个项目组成。总是为系统中的所有项目创建 LSIF 转储，即使只有一个项目发生更改，也不是很可行，尤其是当项目中只有内部更改时。因此，自 0.4.0 起，LSIF 允许按项目创建 LSIF 转储，然后再次将它们链接到数据库中的大型系统。然而，0.4.0 缺乏一些概念来实现这一点。为了说明它们，请考虑以下示例

项目 P1

项目 P1 包含一个 p1Main.ts 文件，内容如下

export interface Disposable {
	dispose(): void;
}

let d: Disposable;
d.dispose();

项目 P2

项目 P2 依赖于 P1，包含一个 p2Main.ts 文件，内容如下

import { Disposable } from 'p1';

class Widget implements Disposable {
	public dispose(): void {
	}
}

let w: Widget;
w.dispose();

现在，如果用户搜索对 Widget#dispose 的引用，则预期 P1 中对 d.dispose 的引用会包含在结果中。但是，当处理 P1 时，工具不知道 P2。当处理 P2 时，它通常不知道 P1 的来源。它只知道它的 API 形状（例如，在 TypeScript 中是相应的 d.ts 文件）。

为了使这项工作顺利进行，我们首先需要将项目放入更大的单元中，这样我们才能知道在哪些项目中 d.dispose 确实匹配。假设有一个完全不相关的项目 PX 也使用 P1 中的 Disposable，但 P2 从未与 PX 链接到同一个系统。因此，类型为 Widget 的对象永远不能流向 PX 中的代码，因此不应列出 PX 中的引用。项目的分组方式主要取决于编程语言。此外，该信息是否有用也取决于 LSIF 转储存储的存储后端。然而，源中生成转储的良好指示。因此，我们引入了 Source 顶点的概念，以指示转储的来源。源顶点是转储中的根顶点，不连接到任何其他节点。让我们看看 P1 和 P2 的具体转储

{ id:2, type: "vertex", label: "source",
  workspaceRoot: "file:///Users/dirkb/samples/ts-cascade",
  repository: {
    type: "git",
    url: "git+https://github.com/samples/ts-cascade.git"
  }
}

源顶点包含以下有用信息

workspaceRoot：它是创建转储时使用的工作区根 URI。它允许对转储中其他 URI（如文档 URI）进行相对解释。
repository：如果可用，它指示存储源代码的存储库。

项目 P2 的转储包含相同的源顶点

{ id:2, type: "vertex", label: "source",
  workspaceRoot: "file:///Users/dirkb/samples/ts-cascade",
  repository: {
    type: "git",
    url: "git+https://github.com/samples/ts-cascade.git"
  }
}

请注意，P1 和 P2 具有相同的源信息，这为存储后端跨这两个项目解析引用提供了很好的指示。然而，项目分组可能不限于源存储库。因此，存储后端应定义一种分层分组项目的方式。例如，这将允许进行以下搜索：在组织 O 中查找函数 foo 的所有引用。

现在让我们看看如何确保搜索 Widget#dispose 的引用也能在 P1 中找到 d.dispose() 匹配。首先让我们看看 P1 的转储中关于 Disposable#dispose 会有哪些信息

// The result set for the Disposable#dispose symbol
{ id: 21, type: "vertex", label: "resultSet" }
// The export moniker of Disposable#dispose in P1 (note kind export).
{ id: 22, type: "vertex", label: "moniker",
  scheme: "tsc", identifier: "p1/lib/p1Main:Disposable.dispose",
  unique: "workspace", kind:"export"
}
{ id: 23, type: "edge", label: "moniker", outV: 21, inV: 22 }
// The actual definition of the symbol
{ id: 24, type: "vertex", label: "range",
  start: { line: 1, character: 1 }, end: { line: 1, character: 8 },
  tag: {
    type: definition, text: "dispose", kind: 7,
    fullRange: {
      start : { line: 1, character:1 }, end: { line: 1, character: 17 }
    }
  }
}
// Bind the reference result to the result set
{ id: 57, type: "vertex", label: "referenceResult" }
{ id: 58, type: "edge", label: "textDocument/references", outV: 21, inV: 57 }

这里有趣的是第 22 行，它定义了 Disposable#dispose 的代号。它现在有一个新属性 unique，表明该代号在项目 workspace 内部是唯一的，但不一定在外部。 unique 的其他可能值为

document 表示该代号只在文档内部唯一。例如，用于局部变量或私有成员。
project 表示该代号只在项目内部唯一。例如，用于项目内部符号。
workspace 表示该代号在项目工作区内部唯一。例如，用于导出的成员。
scheme 表示该代号在代号的方案内部唯一。例如，如果代号是为特定包管理器生成的（参见下面的 npm 示例），那么这些代号通常在代号的方案内部唯一（例如，所有为 npm 生成的代号都带有 npm 方案并且是唯一的）
global 表示该代号全局唯一（例如，其标识符独立于方案或种类唯一）

生成 P2 的转储时，Widget#dispose 的信息将如下所示

// The import moniker for importing Disposable#dispose into P2
{ id: 22, type: "vertex", label: "moniker",
  scheme: "tsc", identifier: "p1/lib/p1Main:Disposable.dispose",
  unique: "workspace", kind: "import"
}

// The result set for Widget#dispose
{ id: 78, type: "vertex", label: "resultSet" }
// The moniker for Widget#dispose. Note that the moniker is local since the
// Widget class is not exported
{ id: 79, type: "vertex", label: "moniker",
  scheme: "tsc", identifier: "2Q46RTVRZTuVW1ajf68/Vw==",
  unique: "document", kind: "local"
}
{ id: 80, type: "edge", label: "moniker", outV: 78, inV: 79 }
// The actual definition of the symbol
{ id: 81, type: "vertex", label: "range",
  start: { line: 3, character: 8 }, end: { line: 3, character: 15 },
  tag: {
    type: "definition", text: "dispose", kind: 6,
    fullRange: {
      start: { line: 3, character: 1 }, end: { line: 4, character: 2 }
    }
  }
}
// Bind the reference result to Widget#dispose
{ id: 116, type: "vertex", label: "referenceResult" }
{ id: 117, type: "edge", label: "textDocument/references", outV: 78, inV: 116}
{ id: 118, type: "edge", label: "item",
  outV: 116, inVs: [43], shard: 52, property: "referenceResults"
}
// Link the reference result set of Disposable#dispose to this result set
// using a moniker
{ id: 119, type: "edge", label: "item",
  outV: 116, inVs: [22], shard: 52, property: "referenceLinks"
}
{ id: 120, type: "edge", label: "item",
  outV: 43, inVs: [81], shard: 52, property: "definitions"
}
{ id: 121, type: "edge", label: "item",
  outV: 43, inVs: [96], shard: 52, property: "references"
}

值得注意的部分是

id: 22 的顶点：是来自 P1 的 Disposable#dispose 的导入代号。
id: 119 的边：这会为 Widget#dispose 的引用结果添加一个引用链接。带有 referenceLinks 的项边在概念上类似于带有 referenceResults 属性的项边。它们允许复合引用结果。不同之处在于，referenceResults 项边使用顶点 id 引用另一个结果，因为引用结果是同一转储的一部分。而 referenceLinks 项边使用代号引用另一个结果。因此，实际的解析需要在数据库中进行，该数据库包含 P1 和 P2 的数据。与 referenceResults 项边一样，语言服务器负责对最终范围进行去重。

包管理器

0.5.0 版中已更改

在大多数编程语言中，导出元素在其他项目中如何可见取决于文件如何打包成库或程序。在 TypeScript 中，标准包管理器是 npm。

假设存在以下 package.json 文件

{
  "name": "lsif-ts-sample",
  "version": "1.0.0",
  "description": "",
  "main": "lib/index.js",
  "author": "MS",
  "license": "MIT",
}

对于以下 TypeScript 文件（与上面相同）

export function func(): void {
}

export class Emitter {
  private doEmit() {
  }

  public emit() {
    this.doEmit();
  }
}

然后这些代号可以转换为依赖于 npm 的代号。我们不是替换代号，而是发出第二组代号，并使用 attach 边将 tsc 代号链接到相应的 npm 代号

{ id: 991, type: "vertex", label: "packageInformation",
  name: "lsif-ts-sample", manager: "npm", version: "1.0.0"
}

{ id: 987, type: "vertex", label: "moniker",
  kind: "export", scheme: "npm", identifier: "lsif-ts-sample::func",
  unique: "scheme"
}
{ id: 986, type: "edge", label: "packageInformation", outV: 987, inV: 991 }
{ id: 985, type: "edge", label: "attach", outV: 987, inV: 12 }

{ id: 984, type: "vertex", label: "moniker",
  kind: "export", scheme: "npm", identifier: "lsif-ts-sample::Emitter",
  unique: "scheme"
}
{ id: 983, type: "edge", label: "packageInformation", outV: 984, inV: 991 }
{ id: 982, type: "edge", label: "attach", outV: 984, inV: 19 }

{ id: 981, type: "vertex", label: "moniker",
  kind: "export", scheme: "npm",
  identifier: "lsif-ts-sample::Emitter.doEmit", unique: "scheme"
}
{ id: 980, type: "edge", label: "packageInformation", outV: 981, inV: 991 }
{ id: 979, type: "edge", label: "attach", outV: 981, inV: 26 }

{ id: 978, type: "vertex", label: "moniker",
  kind: "export", scheme: "npm",
  identifier: "lsif-ts-sample::Emitter.emit", unique: "scheme"
}
{ id: 977, type: "edge", label: "packageInformation", outV: 978, inV: 991 }
{ id: 976, type: "edge", label: "attach", outV: 978, inV: 33 }

需要注意的事项

发出一个特殊的 packageInformation 顶点，指向相应的 npm 包信息。
npm 标记指的是包名。
其 unique 值为 scheme，表示代号标识符在所有 npm 代号中是唯一的。
由于文件 index.ts 是 npm 主文件，因此标记标识符没有文件路径。这与将此模块导入 TypeScript 或 JavaScript 的情况相似，其中只使用模块名称而没有文件路径（例如 import * as lsif from 'lsif-ts-sample'）。
attach 边从 npm 代号顶点指向 tsc 代号顶点。

对于 LSIF，我们建议使用第二个工具来使索引器发出的代号依赖于包管理器。这支持使用不同的包管理器并允许合并自定义构建工具。在 TypeScript 实现中，这是通过一个 npm 特定工具完成的，该工具根据 npm 包信息附加代号。

报告导入外部符号采用相同的方法。LSIF 会发出 import 类型的标记。考虑以下 TypeScript 示例

import * as mobx from 'mobx';

let map: mobx.ObservableMap = new mobx.ObservableMap();

其中 mobx 是 npm mobx 包。运行 tsc 索引工具会生成

{ id: 41, type: "vertex", label: "document",
  uri: "file:///samples/node_modules/mobx/lib/types/observablemap.d.ts",
  languageId: "typescript", contents: "..."
}
{ id: 55, type: "vertex", label: "resultSet" }
{ id: 57, type: "vertex", label: "moniker",
  kind: "import", scheme: "tsc",
  identifier: "node_modules/mobx/lib/mobx:ObservableMap", unique: 'workspace'
}
{ id: 58, type: "edge", label: "moniker", outV: 55, inV: 57 }
{ id: 59, type: "vertex", label: "range",
  start: { line: 17, character: 538 }, end: { line: 17, character: 551 }
}
{ id: 60, type: "edge", label: "next", outV: 59, inV: 55 }

这里有三点需要注意：首先，TypeScript 使用声明文件来处理外部导入的符号。这带来了一个很好的效果，即标记信息可以附加到这些文件中的声明范围。在其他语言中，信息可能会附加到实际引用符号的文件。或者会为引用的项生成一个虚拟文档。其次，工具只为实际引用的符号生成此信息，而不是为所有可用符号生成。第三，这些标记是 tsc 特定的，并指向 node_modules 文件夹。

然而，通过 npm 工具处理这些信息将生成以下信息

{ id: 991, type: "vertex", label: "packageInformation",
  name: "mobx", manager: "npm", version: "5.6.0",
  repository: { type: "git", url: "git+https://github.com/mobxjs/mobx.git" }
}
{ id: 978, type: "vertex", label: "moniker",
  kind: "import", scheme: "npm", identifier: "mobx::ObservableMap",
  unique: 'scheme'
}
{ id: 977, type: "edge", label: "packageInformation", outV: 978, inV: 991 }
{ id: 976, type: "edge", label: "attach", outV: 978, inV: 57 }

这使得代号特定于 npm mobx 包。此外，还发出了关于 mobx 包本身的信息。

通常，标记会附加到结果集，因为它们对于所有指向结果集的范围都是相同的。但是，对于不使用结果集的转储，标记也可以在范围上发出。

对于处理转储并将其导入数据库的工具，有时了解结果是文件本地的还是非文件本地的很有用（例如，函数参数只能在文件内部导航）。为了帮助后处理工具高效地决定这一点，LSIF 生成工具也应该为局部变量生成代号。要使用的相应类型是 local。标识符在文档内部仍应是唯一的。

对于以下示例

function foo(x: number): void {
}

x 的标记如下所示

{ id: 13, type: "vertex", label: "resultSet" }
{ id: 14, type: "vertex", label: "moniker",
  kind: "local", scheme: "tsc", identifier: "SfeOP6s53Y2HAkcViolxYA==",
  unique: 'document'
}
{ id: 15, type: "edge", label: "moniker", outV: 13, inV: 14 }
{ id: 16, type: "vertex", label: "range",
  start: { line: 0, character: 13 }, end: { line: 0, character: 14 },
  tag: {
    type: "definition", text: "x", kind: 7,
    fullRange: {
      start: { line: 0, character: 13 }, end: { line: 0, character: 22 }
    }
  }
}
{ id: 17, type: "edge", label: "next", outV: 16, inV: 13 }

此外，以 $ 开头的标记方案是保留的，LSIF 工具不应使用。

结果范围

LSIF 中的范围目前具有两种含义

它们充当文档中对 LSP 请求敏感的区域（例如，我们用它们来判断给定位置是否存在相应的 LSP 请求结果）
它们作为导航目标（例如，它们是“转到声明”导航的结果）。

为了满足第一点，LSIF 规定范围不能重叠或相同。然而，对于第二种含义，此约束并非必要。为了支持相同或重叠的目标范围，我们引入了一个顶点 resultRange。不允许将 resultRange 用作 contains 边的目标。

元数据顶点

0.5.0 版中已更改

为了支持版本控制，LSIF 定义了一个元数据顶点，如下所示

export interface MetaData {

  /**
   * The label property.
   */
  label: 'metaData';

  /**
   * The version of the LSIF format using semver notation. See
   * https://semver.org/. Please note the version numbers starting with 0
   * don't adhere to semver and adopters have to assume the each new version
   * is breaking.
   */
  version: string;

  /**
   * The string encoding used to compute line and character values in
   * positions and ranges. Currently only 'utf-16' is support due to the
   * limitations in LSP.
   */
  positionEncoding: 'utf-16',

  /**
   * Information about the tool that created the dump
   */
  toolInfo?: {
    name: string;
    version?: string;
    args?: string[];
  }
}

发出约束

0.6.0 中已扩展

存在以下发出约束（其中一些已在文档中提及）

一个顶点必须在其在边中被引用之前发出。
一个 range 和 resultRange 只能包含在一个文档中。
resultRange 不能用作 contains 边中的目标。
在文档结束事件发出后，只有通过该文档发出的结果集、引用或实现结果才能在边中引用。例如，不允许引用该文档中的范围或结果范围。这也包括向范围或结果集添加代号。可以说，文档数据不能再被修改。
如果范围指向结果集并且发出标记，则它们必须在结果集上发出，而不能在单个范围上发出。
如果一个范围在 items 边中被引用，则该范围必须已使用 contains 边附加到文档。这确保了范围的目标文档是已知的。(@since 0.6.0)

附加信息

工具

lsif-protocol：协议定义为 TypeScript 接口
lsif-util：LSIF 开发实用工具
lsif-tsc：TypeScript 的 LSIF 索引器
lsif-npm：NPM 标记链接器

未决问题

在为 TypeScript 和 npm 实现此功能时，我们收集了一系列我们已经意识到的以 GitHub 问题形式存在的未决问题。