Welcome to the new Linux Foundation Forum!
find /var/www/ -type f >> FullFileList.txt
find /var/www/ -type f >> FullFileList.txt
i try to get some structure into a poorly sorted collection of 100Million html,jpg,flv,swf,doc,pdf etc files which a spread to a a million of subfolders.
find /var/www/ -type f >> FullFileList.txt
a whole run would take about 20 hours and fill about 100 Million lines
but for some reason its always failing somewhere at 60% and filesize 2GB
i tryed it about 10 times in a row
filesystem is XFS
maybe its just one corrupted filename or read error but its not always exactly the same line were it stops
now i wonder if there is a way to continue
if not , if there is a whole other method to reach the same listing, which will also be able to continue instead of restart
thanks a lot in advance
Jonas
0
Comments
And example would be:
for FIL in $(locate /var/www);do if [ -f "$FIL" ];then echo "$FIL">>~/files.txt; fi; done
okay
right, i did not try this, thank you
question: will i be able to continue this in case it also fails at a certain point?
it will probablly take until tommrow until i see/report the results
the file seems growing slowlyer than it was with find and takeing less resources (maybe its restriced by a setting?)
btw - do you guys know any opensource/free searchengine like tool which fulltext indexes the whole folder (100million files, 1million folders, 5000gb) with a high performance and make the search available to website visitors?
running 2 hours it got 75.000 lines only whereas find did 10 Million lines in that time
its constantly useing only 20mb of Ram , maybe thats a setting somewhere?
To my knowledge using a pre-built slocate database is the quickest method, what is slowing it down is the file verification in the find statement, unfortunately if you turn that off it will also display the directories in your output file.
Does anyone else know of a tool that will fit his needs?
i did not even reach this step because i first have to create the database...
to get the files from the database if it was there was no issue, even a customised regex would only take minutes.
trys so far
yet one try done only, but only writing 35.000 lines an hour and process failed after 4hours
yes please
This just came to me, if this indexing is failing because of size then maybe a database solution may be the best bet, it may take a while but it will be easy to search and use.
no, its stoped between 2 and 3 GB only.
there is TB sized image file on the same partition...
With that being the case then I agree with you about a corruption or disk I/O error causing the issue, which will have to be repaired prior to you completing the indexing operation.
On a side note due the massive count of files you are trying to index, a website search can take a while going through a text file. So it went ahead and made a mysql schema and bash script that can be used for an introductory indexing, it takes a while but can be beneficial. The benefits you can get from database indexing would be querying of searches and the ability to code the site to automatically update any new files or deleted files from the database.
Again my script can take a while, but if you are interested I can post the script here.
it will of course be nice to try you script!
I personally have never seen find or slocate error out and freeze due to a corruption (I have seen corruptions before and testing indexing of the corrupt files), I am now questioning if ls would even properly display the bad file. I know the apps do have a certain level of error detection and correction, but until we know exactly what is causing the error no mechanism can be build to avoid it. Since you stated the indexing function freeze at the same point I would advise reviewing where it stopped and navigating to the location to see what file or filename is causing the failure.
i were now able to complete a filelist with find.
now that the filelist is done i more think about:
creating the database
and maybe to store text/html/css files within the database and also in less soon future ultimatly buying addtional harddrives and do a fast fulltext search......
About your script:
first i thought you already had a completed one, but now iam pretty impressed by your offer to help with a script you are making specialy for this issue! it must not be working already (especially since a part of the issue is gone) but i will anyways be curious to look through the code/ideas you already wrote down.
thanks, Joonas
As with anything else, this is my first attempt at this script and there may be better ways, but we have to start somewhere.
Create schema and tables
Write the files and directories into the database