1. ホーム
  2. スクリプト・コラム
  3. パール
  4. アプリケーションのヒント

Perlによるディレクトリの探索とLinuxコマンドによるログの解析 コード例共有

2022-01-29 17:39:44

コード例です。

コピーコード コードは以下の通りです。

#! /usr/bin/perl -w
$path = '/root/Documents'; # Current working directory
$dir = "$path/images"; # The directory to traverse
$log_file = "$path/access_201209.log"; # nginx logs 0903~0907, filesize: 5.4G
$result_file = 'result.f'; # file to put the results in

if(!open $output, ">>$result_file") { # Open the file as an append
         die " Open file failed: $! ";
}
&find_dir($dir);

sub find_dir() {
        my $base_dir = $_[0]; # $_[0] means the first argument of the subroutine (function)
        if( !opendir(DIR,"$base_dir") ) {
                warn "open dir failed: $! \n";
        }
        my @father_dir = readdir(DIR); # resource dump
        closedir(DIR);                 

        $base_dir =~ s/\/$//; # Delete the last / of the directory
        foreach $sub_dir (@father_dir) {
                if($sub_dir =~ /^\. /) { # Filter out . and . and hidden files
                        next;
                }

                if(-d "$base_dir/$sub_dir") { # Callback if it's a directory
                     &find_dir("$base_dir/$sub_dir"); # Invoke recursive function to avoid opening multiple copies in memory

                }elsif (-f "$base_dir/$sub_dir") { # If it's a file then ....

                 # Keep a space in front of the file so that directories like theme_skin/blue/images are not searched
                 my $this_file = " $base_dir/$sub_dir";
                 $this_file =~ s/$path//; # Delete the string /root/Documents

                 # Use Linux command to find [$this_file string] in [$log_file file] and count the number of times the string appears
                 my $result = `grep -c "$this_file" $log_file`; # $this_file should be enclosed in double quotes to prevent program errors caused by spaces in the image name
                 chomp $result ; # Remove the newline character from the return value after Linux executes the command
                 print $output "$this_file : $result \n"; # Write the processing result to the file specified by $output

                 # Delete files that have already been logged, so that each time you terminate the script, you can continue looking for the previous content
                 unlink "$base_dir/$sub_dir";
                }
        }
}

print "\n Finished \n";

# Now open the result.f file and replace /images/ with images/ so that the files can be deleted in the current working directory
# Linux command to find records that have been accessed 0 times in 5 days and delete them
# The number 0 should have spaces on the left and right side to prevent finding records with 0 in the file name

# Method 1 :
# gawk -F ':' '$2 ~ / 0 / {print $1}' result.f | xargs rm -rf

# method 2 : (exactly like method 1)
# grep ' 0 ' result.f | gawk -F ':' '{print $1}' | xargs rm -rf