zhv/cli/awk.md

7.3 KiB

general

Laenderschluessel

ISO 3166-2:DE

init

  • dload csv and check file
unzip -C ~/Downloads/zhv.zip -d ~/Downloads/
ls -ltha /home/begerad/Downloads/zHV_aktuell_csv.2023-08-04.csv
wc -l /home/begerad/Downloads/zHV_aktuell_csv.2023-08-04.csv
head -n2 ~/Downloads/zHV_aktuell_csv.2023-08-04.csv

awk

manual

  • print special global id
awk '/de:01001:104053/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv
  • print special state code
awk '/;"de:01/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-01.csv
awk '/;"de:02/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-02.csv
awk 'NR == 1 { print }; /;"de:03/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-ni.csv
awk 'NR == 1 { print }; /;"de:04/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-hb.csv
awk 'NR == 1 { print }; /;"de:03/{print}; /;"de:04/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-ni-hb.csv
awk '/;"de:05/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-05.csv
awk '/;"de:06/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-06.csv
awk '/;"de:07/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-07.csv
awk '/;"de:08/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-08.csv
awk '/;"de:09/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-09.csv
awk '/;"de:10/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-10.csv
awk '/;"de:11/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-11.csv
awk '/;"de:12/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-12.csv
awk '/;"de:13/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-13.csv
awk '/;"de:14/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-14.csv
awk '/;"de:15/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-15.csv
awk '/;"de:16/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-16.csv
  • result of wc -l de-?? is equal wc ~/Downloads/zHV_aktuell_csv.2023-08-04.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ awk '/;"de:01/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-01.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ wc -l de-01.csv 
12852 de-01.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ awk '/;"de:02/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-02.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ wc -l de-02.csv 
2344 de-02.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ awk '/;"de:03/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-03.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ wc -l de-03.csv 
59934 de-03.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ awk '/;"de:04/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-04.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ wc -l de-04.csv 
904 de-04.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ awk '/;"de:05/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-05.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ wc -l de-05.csv 
86254 de-05.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ awk '/;"de:06/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-06.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ wc -l de-06.csv 
28664 de-06.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ awk '/;"de:07/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-07.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ wc -l de-07.csv 
26331 de-07.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ awk '/;"de:08/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-08.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ wc -l de-08.csv 
9771 de-08.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ awk '/;"de:09/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-09.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ wc -l de-09.csv 
0 de-09.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ awk '/;"de:10/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-10.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ wc -l de-10.csv 
11230 de-10.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ awk '/;"de:11/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-11.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ wc -l de-11.csv 
0 de-11.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ awk '/;"de:12/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-12.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ wc -l de-12.csv 
3 de-12.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ awk '/;"de:13/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-13.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ wc -l de-13.csv 
8843 de-13.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ awk '/;"de:14/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-14.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ wc -l de-14.csv 
47796 de-14.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ awk '/;"de:15/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-15.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ wc -l de-15.csv 
9103 de-15.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ awk '/;"de:16/{print}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-16.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ wc -l de-16.csv 
14157 de-16.csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ wc ~/Downloads/zHV_aktuell_csv.2023-08-04.csv 
  318187   773416 54800119 /home/begerad/Downloads/zHV_aktuell_csv.2023-08-04.csv
  • print special fields of a special global id
cat ~/Downloads/zHV_aktuell_csv.2023-08-04.csv | awk -F';' '/de:01001:104053/{ print $1, $2, $3, $5, $6, $7, $17;}'
  • split by type
awk -F';' 'NR==1{h=$0; next};!seen[$2]++{f=$2".csv"; print h > f};{f=$2".csv"; print >> f; close(f)}' ~/Downloads/zHV_aktuell_csv.2023-08-04.csv

Caveat: This will not work if there are escaped commas in the first field. Commas in other fields should work fine. Explanation:

  • -F; (field separator) ensures that $1 etc. refer to the CSV columns rather than space separated values.

  • NR==1{h=$0; next} treats the first line specially (NR==1), by storing the full header line in a variable h (h=$0) and skipping the line (next).

  • !seen[$2]++{f=$2".csv"; print h > f} treats the first occurrence of any $2 specially (!seen[$2]) by storing $2 followed by .csv into a filename variable f and saving the header to that file (print h > f).

  • {f=$2".csv"; print >> f; close(f)} adds the current line to the file (print >> f) and closes the file descriptor (close(f)) to avoid keeping it around once processing of all lines with a specific ID is done.

  • result of wc -l ?.csv equals wc -l ~/Downloads/zHV_aktuell_csv.2023-08-04.csv > de-16.csv```

begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ wc -l \"A\".csv
49794 "A".csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ wc -l \"Q\".csv
85904 "Q".csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ wc -l \"S\".csv
182491 "S".csv
begerad@ltpt490:~/git/wtf/dancesWithCycles/zhv/awk$ wc ~/Downloads/zHV_aktuell_csv.2023-08-04.csv 
  318187   773416 54800119 /home/begerad/Downloads/zHV_aktuell_csv.2023-08-04.csv