Welcome to the Linux Foundation Forum!
find /var/www/ -type f >> FullFileList.txt
joonas
Posts: 9
find /var/www/ -type f >> FullFileList.txt
i try to get some structure into a poorly sorted collection of 100Million html,jpg,flv,swf,doc,pdf etc files which a spread to a a million of subfolders.
find /var/www/ -type f >> FullFileList.txt
a whole run would take about 20 hours and fill about 100 Million lines
but for some reason its always failing somewhere at 60% and filesize 2GB
i tryed it about 10 times in a row
filesystem is XFS
maybe its just one corrupted filename or read error but its not always exactly the same line were it stops
now i wonder if there is a way to continue
if not , if there is a whole other method to reach the same listing, which will also be able to continue instead of restart
thanks a lot in advance
Jonas
0
Comments
-
Have you tried updating your slocate database using the "updatedb" command, then doing the search using locate to buypass the direct find function?
And example would be:
for FIL in $(locate /var/www);do if [ -f "$FIL" ];then echo "$FIL">>~/files.txt; fi; done0 -
hey,
okay
right, i did not try this, thank youupdatedb --database-root /var/www --output /var/WWWMLOCATEDB
question: will i be able to continue this in case it also fails at a certain point?
it will probablly take until tommrow until i see/report the results
the file seems growing slowlyer than it was with find and takeing less resources (maybe its restriced by a setting?)
btw - do you guys know any opensource/free searchengine like tool which fulltext indexes the whole folder (100million files, 1million folders, 5000gb) with a high performance and make the search available to website visitors?0 -
updatedb seems too slow?
running 2 hours it got 75.000 lines only whereas find did 10 Million lines in that time
its constantly useing only 20mb of Ram , maybe thats a setting somewhere?0 -
I'm sorry, I did not realize the scope of your logging. With the quantity of files you have I do not know of a program or methods that can index the files in a prompter timeline.
To my knowledge using a pre-built slocate database is the quickest method, what is slowing it down is the file verification in the find statement, unfortunately if you turn that off it will also display the directories in your output file.
Does anyone else know of a tool that will fit his needs?0 -
heyTo my knowledge using a pre-built slocate database is the quickest method, what is slowing it down is the file verification in the find statement, unfortunately if you turn that off it will also display the directories in your output file.
i did not even reach this step because i first have to create the database...
to get the files from the database if it was there was no issue, even a customised regex would only take minutes.
trys so farfind /var/www/ -type f >> FullFileList.txt: writing about 5 Million lines ( files only) an hour fails somewhere at about 50million lines, mostly a similar size (maybe readerror, corrupted failname etc)
updatedb --database-root /var/www --output /var/WWWMLOCATEDB
yet one try done only, but only writing 35.000 lines an hour and process failed after 4hoursDoes anyone else know of a tool that will fit his needs?0 -
The failure may have been due to a filesize limitation, have you reviewed the size of the output files?
This just came to me, if this indexing is failing because of size then maybe a database solution may be the best bet, it may take a while but it will be easy to search and use.0 -
mfillpot wrote:The failure may have been due to a filesize limitation, have you reviewed the size of the output files?
This just came to me, if this indexing is failing because of size then maybe a database solution may be the best bet, it may take a while but it will be easy to search and use.
no, its stoped between 2 and 3 GB only.
there is TB sized image file on the same partition...0 -
joonas wrote:no, its stoped between 2 and 3 GB only.
there is TB sized image file on the same partition...
With that being the case then I agree with you about a corruption or disk I/O error causing the issue, which will have to be repaired prior to you completing the indexing operation.
On a side note due the massive count of files you are trying to index, a website search can take a while going through a text file. So it went ahead and made a mysql schema and bash script that can be used for an introductory indexing, it takes a while but can be beneficial. The benefits you can get from database indexing would be querying of searches and the ability to code the site to automatically update any new files or deleted files from the database.
Again my script can take a while, but if you are interested I can post the script here.0 -
i still wonder if there isnt a way to make the find more error resistant or just continue with the file at a certain line.
it will of course be nice to try you script!0 -
I will deliver my script and mysql table creation scripts when I have some time available in the next few days.
I personally have never seen find or slocate error out and freeze due to a corruption (I have seen corruptions before and testing indexing of the corrupt files), I am now questioning if ls would even properly display the bad file. I know the apps do have a certain level of error detection and correction, but until we know exactly what is causing the error no mechanism can be build to avoid it. Since you stated the indexing function freeze at the same point I would advise reviewing where it stopped and navigating to the location to see what file or filename is causing the failure.0 -
Hello:)
i were now able to complete a filelist with find.
now that the filelist is done i more think about:
creating the database
and maybe to store text/html/css files within the database and also in less soon future ultimatly buying addtional harddrives and do a fast fulltext search......
About your script:
first i thought you already had a completed one, but now iam pretty impressed by your offer to help with a script you are making specialy for this issue! it must not be working already (especially since a part of the issue is gone) but i will anyways be curious to look through the code/ideas you already wrote down.
thanks, Joonas0 -
It is good to see that your filesystem finally stabilized. Below are my scripts to setup the schema and tables, the another script to verify the files and place them into th database, it includes a verification mechanism so you can use it for updating the index.
As with anything else, this is my first attempt at this script and there may be better ways, but we have to start somewhere.
Create schema and tables#!/bin/bash # # This script creates the mysql schema and tables to file indexing USER="root" PW="password" DBNAME="fileindex" mysql -u $USER --password=$PW --execute="create database $DBNAME;" mysql -u $USER --password=$PW --database=$DBNAME --execute="create table files (file_id int not null auto_increment primary key, dir_id int, filename varchar(100),verified bool);" mysql -u $USER --password=$PW --database=$DBNAME --execute="create table dirs (dir_id int not null auto_increment primary key, dirname varchar(100),verified bool);"
Write the files and directories into the database#!/bin/bash # Index all files and directories from a directory into mysql USR="user" PW="password" DBNAME="fileindex" BASELOC="/var/www" # Mark all entries as unverified so the missing files will be tagged CMD="UPDATE dirs SET verified=false" mysql -u $USR --password=$PW --database=$DBNAME --execute="$CMD" CMD="UPDATE files SET verified=false" mysql -u $USR --password=$PW --database=$DBNAME --execute="$CMD" # Start storing the directories for LOC in $(find $BASELOC -type d) do CMD="SELECT dir_id from dirs where dirname='$LOC'" OUT=`mysql -u $USR --password=$PW --database=$DBNAME --execute="$CMD"` if [ -z "$OUT" ]; then CMD="INSERT INTO dirs (dirname,verified) VALUES('$LOC',true)" else CMD="UPDATE dirs SET verified=true WHERE dirname='$LOC'" fi mysql -u $USR --password=$PW --database=$DBNAME --execute="$CMD" done # Start storing the files for LOC in $(find $BASELOC -type f) do DNAME=`dirname "$LOC"` LNAME=`basename "$LOC"` CMD="SELECT dir_id from dirs where dirname='$DNAME'" DNUM=`mysql -u $USR --password=$PW --database=$DBNAME --execute="$CMD"` DNUM=`echo "$DNUM"|grep -v dir_id` CMD="SELECT file_id from files where filename='$LOC' and dir_id='$DNUM" OUT=`mysql -u $USR --password=$PW --database=$DBNAME --execute="$CMD"` if [ -z "$OUT" ]; then CMD="INSERT INTO files (dir_id,filename,verified) VALUES('$DNUM','$LNAME',true)" else CMD="UPDATE files SET verified=true WHERE filename='$LNAME' and dir_id='$DNUM'" fi mysql -u $USR --password=$PW --database=$DBNAME --execute="$CMD" done # Remove the missing files and directories CMD="DELETE dirs.* from dirs where verified=false" mysql -u $USR --password=$PW --database=$DBNAME --execute="$CMD" CMD="DELETE files.* from files where verified=false" mysql -u $USR --password=$PW --database=$DBNAME --execute="$CMD"
0
Categories
- All Categories
- 207 LFX Mentorship
- 207 LFX Mentorship: Linux Kernel
- 734 Linux Foundation IT Professional Programs
- 339 Cloud Engineer IT Professional Program
- 166 Advanced Cloud Engineer IT Professional Program
- 66 DevOps Engineer IT Professional Program
- 132 Cloud Native Developer IT Professional Program
- 122 Express Training Courses
- 122 Express Courses - Discussion Forum
- 6K Training Courses
- 40 LFC110 Class Forum - Discontinued
- 66 LFC131 Class Forum
- 39 LFD102 Class Forum
- 222 LFD103 Class Forum
- 17 LFD110 Class Forum
- 34 LFD121 Class Forum
- 17 LFD133 Class Forum
- 6 LFD134 Class Forum
- 17 LFD137 Class Forum
- 70 LFD201 Class Forum
- 3 LFD210 Class Forum
- 2 LFD210-CN Class Forum
- 2 LFD213 Class Forum - Discontinued
- 128 LFD232 Class Forum - Discontinued
- 1 LFD233 Class Forum
- 3 LFD237 Class Forum
- 23 LFD254 Class Forum
- 689 LFD259 Class Forum
- 110 LFD272 Class Forum
- 3 LFD272-JP クラス フォーラム
- 10 LFD273 Class Forum
- 113 LFS101 Class Forum
- LFS111 Class Forum
- 2 LFS112 Class Forum
- 1 LFS116 Class Forum
- 3 LFS118 Class Forum
- 3 LFS142 Class Forum
- 3 LFS144 Class Forum
- 3 LFS145 Class Forum
- 1 LFS146 Class Forum
- 2 LFS147 Class Forum
- 8 LFS151 Class Forum
- 1 LFS157 Class Forum
- 18 LFS158 Class Forum
- 5 LFS162 Class Forum
- 1 LFS166 Class Forum
- 3 LFS167 Class Forum
- 1 LFS170 Class Forum
- 1 LFS171 Class Forum
- 2 LFS178 Class Forum
- 2 LFS180 Class Forum
- 1 LFS182 Class Forum
- 4 LFS183 Class Forum
- 30 LFS200 Class Forum
- 737 LFS201 Class Forum - Discontinued
- 2 LFS201-JP クラス フォーラム
- 17 LFS203 Class Forum
- 118 LFS207 Class Forum
- 1 LFS207-DE-Klassenforum
- LFS207-JP クラス フォーラム
- 301 LFS211 Class Forum
- 55 LFS216 Class Forum
- 50 LFS241 Class Forum
- 44 LFS242 Class Forum
- 37 LFS243 Class Forum
- 13 LFS244 Class Forum
- 1 LFS245 Class Forum
- 45 LFS250 Class Forum
- 1 LFS250-JP クラス フォーラム
- LFS251 Class Forum
- 146 LFS253 Class Forum
- LFS254 Class Forum
- LFS255 Class Forum
- 6 LFS256 Class Forum
- LFS257 Class Forum
- 1.2K LFS258 Class Forum
- 9 LFS258-JP クラス フォーラム
- 116 LFS260 Class Forum
- 156 LFS261 Class Forum
- 41 LFS262 Class Forum
- 82 LFS263 Class Forum - Discontinued
- 15 LFS264 Class Forum - Discontinued
- 11 LFS266 Class Forum - Discontinued
- 23 LFS267 Class Forum
- 18 LFS268 Class Forum
- 29 LFS269 Class Forum
- 200 LFS272 Class Forum
- 1 LFS272-JP クラス フォーラム
- LFS274 Class Forum
- 3 LFS281 Class Forum
- 8 LFW111 Class Forum
- 257 LFW211 Class Forum
- 180 LFW212 Class Forum
- 12 SKF100 Class Forum
- SKF200 Class Forum
- SKF201 Class Forum
- 791 Hardware
- 199 Drivers
- 68 I/O Devices
- 37 Monitors
- 98 Multimedia
- 174 Networking
- 91 Printers & Scanners
- 85 Storage
- 754 Linux Distributions
- 82 Debian
- 67 Fedora
- 16 Linux Mint
- 13 Mageia
- 23 openSUSE
- 147 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 351 Ubuntu
- 465 Linux System Administration
- 39 Cloud Computing
- 71 Command Line/Scripting
- Github systems admin projects
- 91 Linux Security
- 78 Network Management
- 101 System Management
- 47 Web Management
- 56 Mobile Computing
- 17 Android
- 28 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 366 Off Topic
- 114 Introductions
- 171 Small Talk
- 20 Study Material
- 534 Programming and Development
- 293 Kernel Development
- 223 Software Development
- 1.2K Software
- 212 Applications
- 182 Command Line
- 3 Compiling/Installing
- 405 Games
- 312 Installation
- 79 All In Program
- 79 All In Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)