comm
Updated:
comm
is a unix tool that is useful for doing line diffs between two sorted files.
It let’s you view lines that are unique to file1
, lines unique to file2
, or lines that are in common (set intersection).
It has an invariant check that will warn when the files are not in sorted order. You’re going to want to use this to compare sets, without whipping out a python or node repl. With python:
- Read each file
- Produce two sets
file.readlines()
- Perform set operations
set1 - set2
to get lines unique toset1
set2 - set1
to get lines unique toset2
set1 & set2
to get lines that exist in bothset1 ^ set2
to get lines are not in common
Usage
# Only lines in common
comm -12 1.txt 2.txt
# Lines only found in 1.txt
comm -23 1.txt 2.txt
# Lines only found in 2.txt
comm -13 1.txt 2.txt