From 2140d05f34f7976ed7f7058baa952490ee3fb859 Mon Sep 17 00:00:00 2001 From: Andrey Makarov Date: Wed, 14 Sep 2022 19:28:01 +0300 Subject: [PATCH] nimgrep: add `--inContext` and `--notinContext` options (#19528) * nimgrep: add `--matchContext` and `--noMatchContext` options * Rename options for uniformity * Revise option names, add `--parentPath` options * Revert --bin deprecation * Copy-paste an original test from quantimnot The origin was: https://gist.githubusercontent.com/quantimnot/5d23b32fe0936ffc453220d20a87b9e2/raw/96544656d52332118295e55aa73718c389e5d194/tnimgrep.nim * Change ! to n * Attempt to fix test * Fix test on Windows * Change --contentsFile -> --inFile, add more tests * Bump * Change --parentPath to --dirpath --- doc/nimgrep.md | 87 +++++++-- doc/nimgrep_cmdline.txt | 48 +++-- tests/tools/tnimgrep.nim | 402 +++++++++++++++++++++++++++++++++++++++ tools/nimgrep.nim | 250 +++++++++++++++++------- 4 files changed, 683 insertions(+), 104 deletions(-) create mode 100644 tests/tools/tnimgrep.nim diff --git a/doc/nimgrep.md b/doc/nimgrep.md index e000efb464..8fb86a9d38 100644 --- a/doc/nimgrep.md +++ b/doc/nimgrep.md @@ -34,6 +34,66 @@ Command line switches .. include:: nimgrep_cmdline.txt +Path filter options +------------------- + +Let us assume we have file `dirA/dirB/dirC/file.nim`. +Filesystem path options will match for these parts of the path: + +| option | matches for | +| :------------------ | :-------------------------------- | +| `--[not]extensions` | ``nim`` | +| `--[not]filename` | ``file.nim`` | +| `--[not]dirname` | ``dirA`` and ``dirB`` and ``dirC`` | +| `--[not]dirpath` | ``dirA/dirB/dirC`` | + +Combining multiple filter options together and negating them +------------------------------------------------------------ + +Options for filtering can be provided multiple times so they form a list, +which works as: +* positive filters + `--filename`, `--dirname`, `--dirpath`, `--inContext`, + `--inFile` accept files/matches if *any* pattern from the list is hit +* negative filters + `--notfilename`, `--notdirname`, `--notdirpath`, `--notinContext`, + `--notinFile` accept files/matches if *no* pattern from the list is hit. + +In other words the same filtering option repeated many times means logical OR. + +.. Important:: + Different filtering options are related by logical AND: they all must + be true for a match to be accepted. + E.g. `--filename:F --dirname:D1 --notdirname:D2` means + `filename(F) AND dirname(D1) AND (NOT dirname(D2))`. + +So negative filtering patterns are effectively related by logical OR also: +`(NOT PAT1) AND (NOT PAT2) == NOT (PAT1 OR PAT2)`:literal: in pseudo-code. + +That means you can always use only 1 such an option with logical OR, e.g. +`--notdirname:PAT1 --notdirname:PAT2` is fully equivalent to +`--notdirname:'PAT1|PAT2'`. + +.. Note:: + If you want logical AND on patterns you should compose 1 appropriate pattern, + possibly combined with multi-line mode `(?s)`:literal:. + E.g. to require that multi-line context of matches has occurences of + **both** PAT1 and PAT2 use positive lookaheads (`(?=PAT)`:literal:): + ```cmd + nimgrep --inContext:'(?s)(?=.*PAT1)(?=.*PAT2)' + ``` + +Meaning of `^`:literal: and `$`:literal: +======================================== + +`nimgrep`:cmd: PCRE engine is run in a single-line mode so +`^`:literal: matches the beginning of whole input *file* and +`$`:literal: matches the end of *file* (or whole input *string* for +options like `--filename`). + +Add the `(?m)`:literal: modifier to the beginning of your pattern for +`^`:literal: and `$`:literal: to match the beginnings and ends of *lines*. + Examples ======== @@ -51,23 +111,18 @@ All examples below use default PCRE Regex patterns: + To exclude version control directories (Git, Mercurial=hg, Subversion=svn) from the search: - ```cmd - nimgrep --excludeDir:'^\.git$' --excludeDir:'^\.hg$' --excludeDir:'^\.svn$' - # short: --ed:'^\.git$' --ed:'^\.hg$' --ed:'^\.svn$' + nimgrep --notdirname:'^\.git$' --notdirname:'^\.hg$' --notdirname:'^\.svn$' + # short: --ndi:'^\.git$' --ndi:'^\.hg$' --ndi:'^\.svn$' ``` - -+ To search only in paths containing the `tests` sub-directory recursively: - ++ To search only in paths containing the `tests`:literal: sub-directory + recursively: ```cmd - nimgrep --recursive --includeDir:'(^|/)tests($|/)' - # short: -r --id:'(^|/)tests($|/)' + nimgrep --recursive --dirname:'^tests$' + # short: -r --di:'^tests$' + # or using --dirpath: + nimgrep --recursive --dirpath:'(^|/)tests($|/)' + # short: -r --pa:'(^|/)tests($|/)' ``` - - .. Attention:: note the subtle difference between `--excludeDir`:option: and - `--includeDir`:option:\: the former is applied to relative directory entries - and the latter is applied to the whole paths - -+ Nimgrep can search multi-line, e.g. to find files containing `import` - and then `strutils` use pattern `'import(.|\n)*?strutils'`:option:. - ++ Nimgrep can search multi-line, e.g. to find files containing `import`:literal: + and then `strutils`:literal: use pattern `'import(.|\n)*?strutils'`:literal:. diff --git a/doc/nimgrep_cmdline.txt b/doc/nimgrep_cmdline.txt index 4ec344495a..73f29f5245 100644 --- a/doc/nimgrep_cmdline.txt +++ b/doc/nimgrep_cmdline.txt @@ -46,8 +46,7 @@ Options: nimgrep --filenames # In current dir nimgrep --filenames "" DIRECTORY # Note empty pattern "", lists all files in DIRECTORY - -* Interpret patterns: +* Interprete patterns: --peg PATTERN and PAT are Peg --re PATTERN and PAT are regular expressions (default) --rex, -x use the "extended" syntax for the regular expression @@ -62,28 +61,45 @@ Options: * File system walk: --recursive, -r process directories recursively --follow follow all symlinks when processing recursively - --ext:EX1|EX2|... only search the files with the given extension(s), - empty one ("--ext") means files with missing extension - --noExt:EX1|... exclude files having given extension(s), use empty one to - skip files with no extension (like some binary files are) - --includeFile:PAT search only files whose names contain pattern PAT - --excludeFile:PAT skip files whose names contain pattern PAT - --includeDir:PAT search only files with their whole directory path - containing PAT - --excludeDir:PAT skip directories whose name (not path) - contain pattern PAT - --if,--ef,--id,--ed abbreviations of the 4 options above --sortTime, -s[:asc|desc] order files by the last modification time (default: off): ascending (recent files go last) or descending -* Filter file content: - --match:PAT select files containing a (not displayed) match of PAT - --noMatch:PAT select files not containing any match of PAT +* Filter files (based on filesystem paths): + + .. Hint:: Instead of `not` you can type just `n` for negative options below. + + --ex[tensions]:EX1|EX2|... + only search the files with the given extension(s), + empty one (`--ex`) means files with missing extension + --notex[tensions]:EX1|EX2|... + exclude files having given extension(s), use empty one to + skip files with no extension (like some binary files are) + --fi[lename]:PAT search only files whose name matches pattern PAT + --notfi[lename]:PAT skip files whose name matches pattern PAT + --di[rname]:PAT select files that in their path have a directory name + that matches pattern PAT + --notdi[rname]:PAT do not descend into directories whose name (not path) + matches pattern PAT + --dirp[ath]:PAT select only files whose whole relative directory path + matches pattern PAT + --notdirp[ath]:PAT skip files whose whole relative directory path + matches pattern PAT + +* Filter files (based on file contents): + --inF[ile]:PAT select files containing a (not displayed) match of PAT + --notinF[ile]:PAT skip files containing a match of PAT --bin:on|off|only process binary files? (detected by \0 in first 1K bytes) (default: on - binary and text files treated the same way) --text, -t process only text files, the same as `--bin:off` +* Filter matches: + --inC[ontext]:PAT select only matches containing a match of PAT in their + surrounding context (multiline with `-c`, `-a`, `-b`) + --notinC[ontext]:PAT + skip matches not containing a match of PAT + in their surrounding context + * Represent results: --nocolor output will be given without any colors --color[:on] force color even if output is redirected (default: auto) diff --git a/tests/tools/tnimgrep.nim b/tests/tools/tnimgrep.nim new file mode 100644 index 0000000000..e97b979f18 --- /dev/null +++ b/tests/tools/tnimgrep.nim @@ -0,0 +1,402 @@ +discard """ + output: ''' + +[Suite] nimgrep filesystem + +[Suite] nimgrep contents filtering +''' +""" +## Authors: quantimnot, a-mr + +import osproc, os, streams, unittest, strutils + +#======= +# setup +#======= + +var process: Process +var ngStdOut, ngStdErr: string +var ngExitCode: int +let previousDir = getCurrentDir() +let tempDir = getTempDir() +let testFilesRoot = tempDir / "nimgrep_test_files" + +template nimgrep(optsAndArgs): untyped = + process = startProcess(previousDir / "bin/nimgrep " & optsAndArgs, + options = {poEvalCommand}) + ngExitCode = process.waitForExit + ngStdOut = process.outputStream.readAll + ngStdErr = process.errorStream.readAll + +func fixSlash(s: string): string = + if DirSep == '/': + result = s + else: # on Windows + result = s.replace('/', DirSep) + +func initString(len = 1000, val = ' '): string = + result = newString(len) + for i in 0.. 0: yield Output(kind: justCount, matches: cnt) if yieldContents and found and optCount notin options: yield Output(kind: fileContents, buffer: move(buffer)) - -proc hasRightFileName(path: string, walkOptC: WalkOptComp[Pattern]): bool = +proc hasRightPath(path: string, walkOptC: WalkOptComp[Pattern]): bool = + if not ( + walkOpt.extensions.len > 0 or walkOpt.notExtensions.len > 0 or + walkOpt.filename.len > 0 or walkOpt.notFilename.len > 0 or + walkOpt.notDirPath.len > 0 or walkOpt.dirPath.len > 0): + return true let filename = path.lastPathPart let ex = filename.splitFile.ext.substr(1) # skip leading '.' if walkOpt.extensions.len != 0: @@ -875,31 +940,44 @@ proc hasRightFileName(path: string, walkOptC: WalkOptComp[Pattern]): bool = matched = true break if not matched: return false - for x in walkOpt.skipExtensions: + for x in walkOpt.notExtensions: if os.cmpPaths(x, ex) == 0: return false - if walkOptC.includeFile.len != 0: - var matched = false - for pat in walkOptC.includeFile: - if filename.contains(pat): - matched = true - break - if not matched: return false - for pat in walkOptC.excludeFile: - if filename.contains(pat): return false - let dirname = path.parentDir - if walkOptC.includeDir.len != 0: - var matched = false - for pat in walkOptC.includeDir: - if dirname.contains(pat): - matched = true - break - if not matched: return false + ensureIncluded walkOptC.filename, filename: + return false + ensureExcluded walkOptC.notFilename, filename: + return false + let parent = path.parentDir + ensureExcluded walkOptC.notDirPath, parent: + return false + ensureIncluded walkOptC.dirPath, parent: + return false result = true -proc hasRightDirectory(path: string, walkOptC: WalkOptComp[Pattern]): bool = - let dirname = path.lastPathPart - for pat in walkOptC.excludeDir: - if dirname.contains(pat): return false +proc isRightDirectory(path: string, walkOptC: WalkOptComp[Pattern]): bool = + ## --dirname can be only checked when the final path is known + ## so this proc is suitable for files only. + if walkOptC.dirname.len > 0: + var badDirname = false + var (nextParent, dirname) = splitPath(path) + # check that --dirname matches for one of directories in parent path: + while dirname != "": + badDirname = false + ensureIncluded walkOptC.dirname, dirname: + badDirname = true + if not badDirname: + break + (nextParent, dirname) = splitPath(nextParent) + if badDirname: # badDirname was set to true for all the dirs + return false + result = true + +proc descendToDirectory(path: string, walkOptC: WalkOptComp[Pattern]): bool = + ## --notdirname can be checked for directories immediately for optimization to + ## prevent descending into undesired directories. + if walkOptC.notDirname.len > 0: + let dirname = path.lastPathPart + ensureExcluded walkOptC.notDirname, dirname: + return false result = true iterator walkDirBasic(dir: string, walkOptC: WalkOptComp[Pattern]): string @@ -908,22 +986,24 @@ iterator walkDirBasic(dir: string, walkOptC: WalkOptComp[Pattern]): string var timeFiles = newSeq[(times.Time, string)]() while dirStack.len > 0: let d = dirStack.pop() + let rightDirForFiles = d.isRightDirectory(walkOptC) var files = newSeq[string]() var dirs = newSeq[string]() for kind, path in walkDir(d): case kind of pcFile: - if path.hasRightFileName(walkOptC): + if path.hasRightPath(walkOptC) and rightDirForFiles: files.add(path) of pcLinkToFile: - if optFollow in options and path.hasRightFileName(walkOptC): + if optFollow in options and path.hasRightPath(walkOptC) and + rightDirForFiles: files.add(path) of pcDir: - if optRecursive in options and path.hasRightDirectory(walkOptC): + if optRecursive in options and path.descendToDirectory(walkOptC): dirs.add path of pcLinkToDir: if optFollow in options and optRecursive in options and - path.hasRightDirectory(walkOptC): + path.descendToDirectory(walkOptC): dirs.add path if sortTime: # sort by time - collect files before yielding for file in files: @@ -948,10 +1028,12 @@ iterator walkDirBasic(dir: string, walkOptC: WalkOptComp[Pattern]): string iterator walkRec(paths: seq[string]): tuple[error: string, filename: string] {.closure.} = declareCompiledPatterns(walkOptC, WalkOptComp): - walkOptC.excludeFile.add walkOpt.excludeFile.compileArray() - walkOptC.includeFile.add walkOpt.includeFile.compileArray() - walkOptC.includeDir.add walkOpt.includeDir.compileArray() - walkOptC.excludeDir.add walkOpt.excludeDir.compileArray() + walkOptC.notFilename.add walkOpt.notFilename.compileArray() + walkOptC.filename.add walkOpt.filename.compileArray() + walkOptC.dirname.add walkOpt.dirname.compileArray() + walkOptC.notDirname.add walkOpt.notDirname.compileArray() + walkOptC.dirPath.add walkOpt.dirPath.compileArray() + walkOptC.notDirPath.add walkOpt.notDirPath.compileArray() for path in paths: if dirExists(path): for p in walkDirBasic(path, walkOptC): @@ -1030,8 +1112,10 @@ template processFileResult(pattern: Pattern; filename: string, proc run1Thread() = declareCompiledPatterns(searchOptC, SearchOptComp): compile1Pattern(searchOpt.pattern, searchOptC.pattern) - compile1Pattern(searchOpt.checkMatch, searchOptC.checkMatch) - compile1Pattern(searchOpt.checkNoMatch, searchOptC.checkNoMatch) + searchOptC.inFile.add searchOpt.inFile.compileArray() + searchOptC.notInFile.add searchOpt.notInFile.compileArray() + searchOptC.inContext.add searchOpt.inContext.compileArray() + searchOptC.notInContext.add searchOpt.notInContext.compileArray() if optPipe in options: processFileResult(searchOptC.pattern, "-", processFile(searchOptC, "-", @@ -1073,8 +1157,10 @@ proc worker(initSearchOpt: SearchOpt) {.thread.} = searchOpt = initSearchOpt # init thread-local var declareCompiledPatterns(searchOptC, SearchOptComp): compile1Pattern(searchOpt.pattern, searchOptC.pattern) - compile1Pattern(searchOpt.checkMatch, searchOptC.checkMatch) - compile1Pattern(searchOpt.checkNoMatch, searchOptC.checkNoMatch) + searchOptC.inFile.add searchOpt.inFile.compileArray() + searchOptC.notInFile.add searchOpt.notInFile.compileArray() + searchOptC.inContext.add searchOpt.inContext.compileArray() + searchOptC.notInContext.add searchOpt.notInContext.compileArray() while true: let (fileNo, filename) = searchRequestsChan.recv() var fileResult: FileResult @@ -1197,15 +1283,35 @@ for kind, key, val in getopt(): nWorkers = countProcessors() else: nWorkers = parseNonNegative(val, key) - of "ext": walkOpt.extensions.add val.split('|') - of "noext", "no-ext": walkOpt.skipExtensions.add val.split('|') - of "excludedir", "exclude-dir", "ed": walkOpt.excludeDir.add val - of "includedir", "include-dir", "id": walkOpt.includeDir.add val - of "includefile", "include-file", "if": walkOpt.includeFile.add val - of "excludefile", "exclude-file", "ef": walkOpt.excludeFile.add val - of "match": searchOpt.checkMatch = val - of "nomatch": - searchOpt.checkNoMatch = val + of "extensions", "ex", "ext": walkOpt.extensions.add val.split('|') + of "nextensions", "notextensions", "nex", "notex", + "noext", "no-ext": # 2 deprecated options + walkOpt.notExtensions.add val.split('|') + of "dirname", "di": + walkOpt.dirname.add val + of "ndirname", "notdirname", "ndi", "notdi", + "excludedir", "ed": # 2 deprecated options + walkOpt.notDirname.add val + of "dirpath", "dirp", + "includedir", "id": # 2 deprecated options + walkOpt.dirPath.add val + of "ndirpath", "notdirpath", "ndirp", "notdirp": + walkOpt.notDirPath.add val + of "filename", "fi", + "includefile", "include-file", "if": # 3 deprecated options + walkOpt.filename.add val + of "nfilename", "nfi", "notfilename", "notfi", + "excludefile", "exclude-file", "ef": # 3 deprecated options + walkOpt.notFilename.add val + of "infile", "inf", + "matchfile", "match", "mf": # 3 deprecated options + searchOpt.inFile.add val + of "ninfile", "notinfile", "ninf", "notinf", + "nomatchfile", "nomatch", "nf": # 3 options are deprecated + searchOpt.notInFile.add val + of "incontext", "inc": searchOpt.inContext.add val + of "nincontext", "notincontext", "ninc", "notinc": + searchOpt.notInContext.add val of "bin": case val of "on": searchOpt.checkBin = biOn