You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm running several crawls with the same seed file, but I noticed that Heritrix add lines to this file and explicitly modify it. Couldn't this be avoided like, I don't know, maybe copying the seed file to the job directory and modify that copy?
The problem is that I started with a file of 51451 lines and currently has 1083095 after maybe 20 times it's been reused. This slows down the initialization, but even worse, the initialization is different after each crawl because some of the seeds I have redirects to other website or the same website but a specific resource (not only the common redirection from http to https which I guess it's the reason why this feature was implemented), but that redirection which is annotated in this seed file, in the next crawl job redirects again to another redirection. So, in the end, my seed files is adding new seeds which I hadn't noticed before.
Thank you!
The text was updated successfully, but these errors were encountered:
Hi!
I'm running several crawls with the same seed file, but I noticed that Heritrix add lines to this file and explicitly modify it. Couldn't this be avoided like, I don't know, maybe copying the seed file to the job directory and modify that copy?
Seed-related configuration:
The problem is that I started with a file of 51451 lines and currently has 1083095 after maybe 20 times it's been reused. This slows down the initialization, but even worse, the initialization is different after each crawl because some of the seeds I have redirects to other website or the same website but a specific resource (not only the common redirection from http to https which I guess it's the reason why this feature was implemented), but that redirection which is annotated in this seed file, in the next crawl job redirects again to another redirection. So, in the end, my seed files is adding new seeds which I hadn't noticed before.
Thank you!
The text was updated successfully, but these errors were encountered: