暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

TiDB sqlparser源码阅读(一)

李小姐的秋田犬 2019-07-22
628

TiDB sqlparser源码阅读(一)

文件树结构

  • parser

  • ast

    • ast.go

    • ddl.go

    • dml.go

    • expression.go

    • flag.go

    • functions.go

    • misc.go

    • stats.go

  • auth

  • charset

  • format

  • goyacc

  • model

    • ddl.go

    • flags.go

    • model.go

  • mysql

  • opcode

  • terror

  • types

  • lexer.go

  • misc.go

  • parser.go

  • parser.y

  • yy_parser.go

模块说明

主要说明相关的ast和model部分的代码用途,并介绍主要的相关接口和类,相关的yacc部分和整体sql层整体框架可以参考TiDB 源码阅读系列文章(五)TiDB SQL Parser 的实现



parser.ast

抽象语法树(Abstract Syntax Tree,AST),或简称语法树(Syntax tree),是源代码语法结构的一种抽象表示。它以树状的形式表现编程语言的语法结构,树上的每个节点都表示源代码中的一种结构。

这部分是主要的语法解析部分,mysql sql语句主要包括以下几种: 

tidb主要的语法解析为ddl和dml
  • ddl

资料定义语言(Data Definition Language,DDL)是SQL语言集中负责资料结构定义与资料库物件定义的语言,由CREATE、ALTER与DROP三个语法所组成,最早是由Codasyl(Conference on Data Systems Languages)资料模型开始,现在被纳入SQL指令中作为其中一个子集。

  • dml

数据操纵语言(Data Manipulation Language, DML)是用于资料库操作,对资料库其中的物件和资料执行存取工作的编程语句,通常是资料库专用编程语言之中的一个子集,例如在资讯软体产业通行标准的SQL语言中,以INSERT、UPDATE、DELETE三种指令为核心,分别代表插入(意指新增或创建)、更新(修改)与删除(销毁)。在使用资料库的系统开发过程中,其中应用程式必然会使用的指令;而加上 SQL的SELECT语句,欧美地区的开发人员把这四种指令,以“CRUD”(分别为 Create, Read, Update, Delete英文四字首字母缩略的术语)来称呼;而亚洲地区使用汉语的开发人员,或可能以四个汉字:增 查 改 删 来略称。

主要接口

Node: ast中的一个抽象节点

  1. // Node is the basic element of the AST.

  2. // Interfaces embed Node should have 'Node' name suffix.

  3. type Node interface {

  4. // Restore returns the sql text from ast tree

  5. Restore(ctx *RestoreCtx) error

  6. // Accept accepts Visitor to visit itself.

  7. // The returned node should replace original node.

  8. // ok returns false to stop visiting.

  9. //

  10. // Implementation of this method should first call visitor.Enter,

  11. // assign the returned node to its method receiver, if skipChildren returns true,

  12. // children should be skipped. Otherwise, call its children in particular order that

  13. // later elements depends on former elements. Finally, return visitor.Leave.

  14. Accept(v Visitor) (node Node, ok bool)

  15. // Text returns the original text of the element.

  16. Text() string

  17. // SetText sets original text to the Node.

  18. SetText(text string)

  19. }

StmtNode: ast中的语句节点(包括完整的语句和子语句),实现了Node接口

  1. // StmtNode represents statement node.

  2. // Name of implementations should have 'Stmt' suffix.

  3. type StmtNode interface {

  4. Node

  5. statement()

  6. }

DDLNode: ast中的ddl节点,实现了Stmt接口

  1. // DDLNode represents DDL statement node.

  2. type DDLNode interface {

  3. StmtNode

  4. ddlStatement()

  5. }

DMLNode: ast中的dml节点,实现了Stmt接口

  1. // DMLNode represents DML statement node.

  2. type DMLNode interface {

  3. StmtNode

  4. dmlStatement()

  5. }

parser.ast.ddl

ddl主要处理包括与数据库、表结构约束schema相关的sql的处理 其中主要包括的是alter,create database,create index,drop,rename,truncate等相关的操作 其处理的相关语句根据ddl.go开始的声明可以看出

  1. var (

  2. _ DDLNode = &AlterTableStmt{}

  3. _ DDLNode = &CreateDatabaseStmt{}

  4. _ DDLNode = &CreateIndexStmt{}

  5. _ DDLNode = &CreateTableStmt{}

  6. _ DDLNode = &CreateViewStmt{}

  7. _ DDLNode = &DropDatabaseStmt{}

  8. _ DDLNode = &DropIndexStmt{}

  9. _ DDLNode = &DropTableStmt{}

  10. _ DDLNode = &RenameTableStmt{}

  11. _ DDLNode = &TruncateTableStmt{}


  12. _ Node = &AlterTableSpec{}

  13. _ Node = &ColumnDef{}

  14. _ Node = &ColumnOption{}

  15. _ Node = &ColumnPosition{}

  16. _ Node = &Constraint{}

  17. _ Node = &IndexColName{}

  18. _ Node = &ReferenceDef{}

  19. )

该文件主要包含相关的Stmt处理和相应的各类Option

parser.ast.dml

dml主要处理CRUD相关的操作,包括delete,insert,union,update,select,show,以及相关的group by,join,limit,order by等操作 相关操作亦可从dml.go开头的声明处窥见

  1. var (

  2. _ DMLNode = &DeleteStmt{}

  3. _ DMLNode = &InsertStmt{}

  4. _ DMLNode = &UnionStmt{}

  5. _ DMLNode = &UpdateStmt{}

  6. _ DMLNode = &SelectStmt{}

  7. _ DMLNode = &ShowStmt{}

  8. _ DMLNode = &LoadDataStmt{}

  9. _ DMLNode = &SplitRegionStmt{}


  10. _ Node = &Assignment{}

  11. _ Node = &ByItem{}

  12. _ Node = &FieldList{}

  13. _ Node = &GroupByClause{}

  14. _ Node = &HavingClause{}

  15. _ Node = &Join{}

  16. _ Node = &Limit{}

  17. _ Node = &OnCondition{}

  18. _ Node = &OrderByClause{}

  19. _ Node = &SelectField{}

  20. _ Node = &TableName{}

  21. _ Node = &TableRefsClause{}

  22. _ Node = &TableSource{}

  23. _ Node = &UnionSelectList{}

  24. _ Node = &WildCardField{}

  25. _ Node = &WindowSpec{}

  26. _ Node = &PartitionByClause{}

  27. _ Node = &FrameClause{}

  28. _ Node = &FrameBound{}

  29. )

dml中的部分主要类举例

SelectStmt: select语句相关处理类,实现了DMLNode

  1. // SelectStmt represents the select query node.

  2. // See https://dev.mysql.com/doc/refman/5.7/en/select.html

  3. type SelectStmt struct {

  4. dmlNode

  5. resultSetNode


  6. // SelectStmtOpts wraps around select hints and switches.

  7. *SelectStmtOpts

  8. // Distinct represents whether the select has distinct option.

  9. Distinct bool

  10. // From is the from clause of the query.

  11. From *TableRefsClause

  12. // Where is the where clause in select statement.

  13. Where ExprNode

  14. // Fields is the select expression list.

  15. Fields *FieldList

  16. // GroupBy is the group by expression list.

  17. GroupBy *GroupByClause

  18. // Having is the having condition.

  19. Having *HavingClause

  20. // WindowSpecs is the window specification list.

  21. WindowSpecs []WindowSpec

  22. // OrderBy is the ordering expression list.

  23. OrderBy *OrderByClause

  24. // Limit is the limit clause.

  25. Limit *Limit

  26. // LockTp is the lock type

  27. LockTp SelectLockType

  28. // TableHints represents the table level Optimizer Hint for join type

  29. TableHints []*TableOptimizerHint

  30. // IsAfterUnionDistinct indicates whether it's a stmt after "union distinct".

  31. IsAfterUnionDistinct bool

  32. // IsInBraces indicates whether it's a stmt in brace.

  33. IsInBraces bool

  34. }

UpdateStmt: update相关语句,实现了DMLNode

  1. // UpdateStmt is a statement to update columns of existing rows in tables with new values.

  2. // See https://dev.mysql.com/doc/refman/5.7/en/update.html

  3. type UpdateStmt struct {

  4. dmlNode


  5. TableRefs *TableRefsClause

  6. List []*Assignment

  7. Where ExprNode

  8. Order *OrderByClause

  9. Limit *Limit

  10. Priority mysql.PriorityEnum

  11. IgnoreErr bool

  12. MultipleTable bool

  13. TableHints []*TableOptimizerHint

  14. }

parse.model

model模块主要是相关的sql对应的数据结构定义和Job操作任务结构的定义

parse.model.model.go

model.go中主要包含相关schema的元信息描述结构,例如库信息DbInfo,表信息TableInfo,列信息ColumnInfo,视图信息ViewInfo,表锁信息LockInfo等。具体结构举例如下:

库信息DbInfo

  1. // DBInfo provides meta data describing a DB.

  2. type DBInfo struct {

  3. ID int64 `json:"id"` // Database ID

  4. Name CIStr `json:"db_name"` // DB name.

  5. Charset string `json:"charset"`

  6. Collate string `json:"collate"`

  7. Tables []*TableInfo `json:"-"` // Tables in the DB.

  8. State SchemaState `json:"state"`

  9. }

表信息TableInfo

  1. // TableInfo provides meta data describing a DB table.

  2. type TableInfo struct {

  3. ID int64 `json:"id"`

  4. Name CIStr `json:"name"`

  5. Charset string `json:"charset"`

  6. Collate string `json:"collate"`

  7. // Columns are listed in the order in which they appear in the schema.

  8. Columns []*ColumnInfo `json:"cols"`

  9. Indices []*IndexInfo `json:"index_info"`

  10. ForeignKeys []*FKInfo `json:"fk_info"`

  11. State SchemaState `json:"state"`

  12. PKIsHandle bool `json:"pk_is_handle"`

  13. Comment string `json:"comment"`

  14. AutoIncID int64 `json:"auto_inc_id"`

  15. MaxColumnID int64 `json:"max_col_id"`

  16. MaxIndexID int64 `json:"max_idx_id"`

  17. // UpdateTS is used to record the timestamp of updating the table's schema information.

  18. // These changing schema operations don't include 'truncate table' and 'rename table'.

  19. UpdateTS uint64 `json:"update_timestamp"`

  20. // OldSchemaID :

  21. // Because auto increment ID has schemaID as prefix,

  22. // We need to save original schemaID to keep autoID unchanged

  23. // while renaming a table from one database to another.

  24. // TODO: Remove it.

  25. // Now it only uses for compatibility with the old version that already uses this field.

  26. OldSchemaID int64 `json:"old_schema_id,omitempty"`


  27. // ShardRowIDBits specify if the implicit row ID is sharded.

  28. ShardRowIDBits uint64

  29. // MaxShardRowIDBits uses to record the max ShardRowIDBits be used so far.

  30. MaxShardRowIDBits uint64 `json:"max_shard_row_id_bits"`

  31. // PreSplitRegions specify the pre-split region when create table.

  32. // The pre-split region num is 2^(PreSplitRegions-1).

  33. // And the PreSplitRegions should less than or equal to ShardRowIDBits.

  34. PreSplitRegions uint64 `json:"pre_split_regions"`


  35. Partition *PartitionInfo `json:"partition"`


  36. Compression string `json:"compression"`


  37. View *ViewInfo `json:"view"`

  38. // Lock represent the table lock info.

  39. Lock *TableLockInfo `json:"Lock"`


  40. // Version means the version of the table info.

  41. Version uint16 `json:"version"`

  42. }

parser.model.ddl.go

ddl.go中主要包含了相关ddl语句所形成的逻辑job结构,这是从ddlsql形成第一步的job产生的逻辑操作集合,是sql和存储native api之间的连接桥梁。

由sql形成job序列,再经由逻辑优化和代价优化后会形成最后的job集合,按照job序列依次调用存储模块的接口并使用算子进行计算。其主要任务结构Job定义如下:

  1. // Job is for a DDL operation.

  2. type Job struct {

  3. ID int64 `json:"id"`

  4. Type ActionType `json:"type"`

  5. SchemaID int64 `json:"schema_id"`

  6. TableID int64 `json:"table_id"`

  7. State JobState `json:"state"`

  8. Error *terror.Error `json:"err"`

  9. // ErrorCount will be increased, every time we meet an error when running job.

  10. ErrorCount int64 `json:"err_count"`

  11. // RowCount means the number of rows that are processed.

  12. RowCount int64 `json:"row_count"`

  13. Mu sync.Mutex `json:"-"`

  14. Args []interface{} `json:"-"`

  15. // RawArgs : We must use json raw message to delay parsing special args.

  16. RawArgs json.RawMessage `json:"raw_args"`

  17. SchemaState SchemaState `json:"schema_state"`

  18. // SnapshotVer means snapshot version for this job.

  19. SnapshotVer uint64 `json:"snapshot_ver"`

  20. // StartTS uses timestamp allocated by TSO.

  21. // Now it's the TS when we put the job to TiKV queue.

  22. StartTS uint64 `json:"start_ts"`

  23. // DependencyID is the job's ID that the current job depends on.

  24. DependencyID int64 `json:"dependency_id"`

  25. // Query string of the ddl job.

  26. Query string `json:"query"`

  27. BinlogInfo *HistoryInfo `json:"binlog"`


  28. // Version indicates the DDL job version. For old jobs, it will be 0.

  29. Version int64 `json:"version"`


  30. // ReorgMeta is meta info of ddl reorganization.

  31. // This field is depreciated.

  32. ReorgMeta *DDLReorgMeta `json:"reorg_meta"`


  33. // Priority is only used to set the operation priority of adding indices.

  34. Priority int `json:"priority"`

  35. }


文章转载自李小姐的秋田犬,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论