Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deployment Tips - To other users #88

Open
PiotrZSL opened this issue Mar 19, 2020 · 11 comments
Open

Deployment Tips - To other users #88

PiotrZSL opened this issue Mar 19, 2020 · 11 comments
Assignees

Comments

@PiotrZSL
Copy link

As I'm using this Woboq with some big project, here is some tips from me.

Woboq isn't perfect, it's slow, lack of many features (global search, template support, no longer active developed, ...) BUT it works, and speeds up development or issue resolution on some older branches a loot.

Problem - Slow execution
Woboq is single thread application and don't waste time of making it multithread (clang itself don't like it), what I did and work for me is: Split compilation_databse.json into (Numer of CPU), and run woboq on every of them in paralel. Then merge html files (pick first found), merge index (merge + sort + uniq), and merge refs (for every duplicated file merge content + uniq + sort). This speed up analisys from 9h to 1h (merge take some time but). Everything done in ram disk (/dev/shm) on server with sufficient amout of memory (this is important).

Problem - Lot of small files
Project on with I work takes 22GB (in html files). Moving this to server where it could be serve to users takes lot of time (zip/unzip), and I don't have disk space there anyway to keep all branches.
First I tryied with NFS, but when it were sufficient fast, then updating is slow, removing 22GB from NFS takes a lot of time, and there were network issues. Now I changing to something diffrent - squashfs, it takes 2 minutes to build squashfs from that 22GB folder (in ram) on my 52 core server, and after compression it takes ~1GB. That is nice because now I can move it to other server, and publish to users (mount), and removing is also easy, it's only one file to remove, that is important because I wan't users to see up to date version every 1h. It's not so perfect anyway, problem is with refs folder, it's > 128 MB, and cannot be put into squash fs, So in python I split that folder into smaller ones based on crc32(ref file name) % 1000, and I did same in javascript to update request urls. Works perfect.

Problem - No global search
I solved this by deploying Hound (with some path to support local folders), and updated javascript and rule in hound.conf, now I can on presing enter search in source files that I also put (after directory mapping apply) into output folder, so in hound I can just add .html to get working search.
From time to time hound crash (when changing files) but anyway, this was one day job.

Problem - No blame
Because I anyway change output, Decided to add suport for blame. Dont't waste time of editing exist html, it takes ages, instead I create new table in html and save it into separate file, just make sure that style is same so row with will match. Then in javascript load with ajax blame file and just load it in browser.

Hope this helps...

@gavinchou
Copy link

Nice tips, thank you! @PiotrZSL

@kanihal
Copy link

kanihal commented Nov 14, 2020

@PiotrZSL Can you please publish the code for search and blame enhancement that you have done?

@gavinchou
Copy link

@PiotrZSL Can you please publish the code for search and blame enhancement that you have done?

image

Here is my approach to global search, hack the search box:

  1. woboq proxies search request to the search backend service, just like the original engine does, the search box
  2. implement an HTTP service to ack. search request, it can be straightforward, my HTTP server just runs a command like "grep", "the-silver-searcher" or a shell script, just like you search code with command line utils, to search the source or the compiled HTML files, and "return"* stdout/stderr to woboq
  • "return" is actually a browser page reloading to the page the HTTP server response

Check this out to see more: https://github.com/gavinchou/woboq_codebrowser commit 170e34f.

Please let me know if you need source code of the HTTP service. However, I bet you may want to write your own.

How does the search perform depends on how you implement the HTTP search service.

@PiotrZSL
Copy link
Author

PiotrZSL commented Dec 7, 2020

@PiotrZSL Can you please publish the code for search and blame enhancement that you have done?

I cannot share source code, as there are some company copyrights involved. Anyway I did similar thing as gavinchou.
First I deployed Hound (https://github.com/hound-search/hound) as independent search engin, on same source code as woboq were run. In hound (from disk - patch needed - can be found in merge requests) configuration you may pass url to "web" (= woboq url). This work as redirect from hound to woboq. For woboq to hound I simply changed "search box" that when I write it search like normal, but when I press enter it redirect to hound.

As for blame, i has separate script that near every .html file generate .blame file with just author and commit id. The in single .json I got commit ids and dates, messages for "tip". Then I just load blame file if exist on page open via javascript and add separate "div" with "table" to show blames. I moved some of things from javascript to lua script on server.

Unfortunetly I did lot of this in work for work, and thats where copyright problem comes...

@gavinchou
Copy link

gavinchou commented Aug 25, 2021

To address slow execution (single-threaded) problem, I fix it with multiprocess, check this commit for more details:
gavinchou@719e7a7

@ogoffart ogoffart removed their assignment Sep 27, 2023
@ogoffart
Copy link
Contributor

One problem with parallel build is that since the code generator don't use a real database for its output, there could be corruption as several thread or process tries to write in the same files.
In order to implement multi output, we need to make sure to have file-level locking:

  • in general, hold a filesystem lock when trying to write to the files in refs
  • before starting processing a file (eg, an header file), it currently check whether the output html file already exist for this file. It should be changed to create an empty file for it at the same time so other process don't try to process the same file (which would result in duplicated uses or other references)

@Waqar144
Copy link
Collaborator

I tried with filesystem locks initially but it made the generation even slower than single process (maybe my approach was bad), so I tried another way: Each generator process appends a unique suffix to the file before writing to it, the suffix is supplied via an environment variable when invoking the generator. After generation of all files finishes, we can combine the output from all the processes into one, remove duplicates etc. This has worked somewhat reliably so far and the performance boost is quite big.

@gavinchou
Copy link

One problem with parallel build is that since the code generator don't use a real database for its output, there could be corruption as several thread or process tries to write in the same files. In order to implement multi output, we need to make sure to have file-level locking:

  • in general, hold a filesystem lock when trying to write to the files in refs
  • before starting processing a file (eg, an header file), it currently check whether the output html file already exist for this file. It should be changed to create an empty file for it at the same time so other process don't try to process the same file (which would result in duplicated uses or other references)

@ogoffart You are right about the "concurrency issue" of the multiprocess approach. Not only the stdout stderr but also some output files may be written multiple times. I haven't fixed it yet. However, it's usable if the "concurrency" is set to 4 or 8.

@gavinchou
Copy link

I tried with filesystem locks initially but it made the generation even slower than single process (maybe my approach was bad), so I tried another way: Each generator process appends a unique suffix to the file before writing to it, the suffix is supplied via an environment variable when invoking the generator. After generation of all files finishes, we can combine the output from all the processes into one, remove duplicates etc. This has worked somewhat reliably so far and the performance boost is quite big.

@Waqar144 Nice try, can you share both of your approaches of solving the the multiprocess issue? I am wondering why the lock slows down the generation and how you process the suffix.

@Waqar144
Copy link
Collaborator

Waqar144 commented Jan 1, 2024

Its a script available here: https://github.com/KDAB/codebrowser/blob/master/scripts/runner.py

The lock version is lost by now.

@gavinchou
Copy link

Its a script available here: https://github.com/KDAB/codebrowser/blob/master/scripts/runner.py

The lock version is lost by now.

@Waqar144 Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants