Converts large Mercurial repositories to Git/LFS significantly faster by integrating
the LFS conversion into the history export process.
Currently, converting large repositories requires two sequential, long-running steps:
1. Full history conversion (`hg` to `git`).
2. Full history rewrite/import (`git lfs import`).
For huge monorepos (100GiB+, 1M+ files), this sequence can take hours or days.
This commit introduces a new plugin that allows the repository to be converted *incrementally*
(JIT: Just-In-Time). The plugin identifies large files during the initial `hg` to `git`
conversion and immediately writes LFS pointers, eliminating the need for the second,
time-consuming history rewrite step.
The current conversion process mandates an empty repository for a clean start.
This presents a barrier to performance optimization strategies.
This change introduces the ability to pass a repository root commit hash.
This is necessary to support the immediate next commit (Incremental LFS conversion),
which uses a `.gitattributes` file and LFS pointers to bypass the slow, full-history
rewriting often required on large non-empty monorepos (100GiB+, 1M+ files).
The immediate benefit is allowing conversion to start when a non-empty repo
already contains an orphan commit, laying the groundwork for the optimized LFS
conversion feature.
When Plugins are used in a repository that contains largefiles,
the following exception is thrown as soon as the first largefile
is converted:
```
Traceback (most recent call last):
File "fast-export/hg-fast-export.py", line 728, in <module>
sys.exit(hg2git(options.repourl,m,options.marksfile,options.mappingfile,
File "fast-export/hg-fast-export.py", line 581, in hg2git
c=export_commit(ui,repo,rev,old_marks,max,c,authors,branchesmap,
File "fast-export/hg-fast-export.py", line 366, in export_commit
export_file_contents(ctx,man,modified,hgtags,fn_encoding,plugins)
File "fast-export/hg-fast-export.py", line 222, in export_file_contents
file_data = {'filename':filename,'file_ctx':file_ctx,'data':d}
UnboundLocalError: local variable 'file_ctx' referenced before assignment
```
This commit fixes the error by:
* initializing the file_ctx before the largefile handling takes place
* Providing a new `is_largefile` value for plugins so they can detect
if largefile handling was applied (and therefore the file_ctx
object may no longer be in sync with the git version of the file)
Encode the `name` parameter to bytes (using the utf8 codec).
This fixes the `TypeError` in subsequent concatenations in `get_branch`:
```
Traceback (most recent call last):
# stack omitted for brevity
File "C:\Dev\git-migration\fast-export\hg2git.py", line 73, in get_branch
return origin_name + b'/' + name
TypeError: can only concatenate str (not "bytes") to str
```
The conversion is done unconditionally since the passed
parameter is currently always of type `str`.
When the length logic for fast-import 'data' commands was updated in
4c10270 (Fix data handling, 2023-03-02), one branch was missed, so
commit messages now do not have a final LF appended in most cases. This
changed the longtime behavior, which had been consistent since the first
commit of hg2git, 9832035 (Initial import, 2007-03-06), and is expected
by some applications which compare against old conversions from
Mercurial.
The `file_data_filter` method should be called when files are deleted.
In this case the `data` and `file_ctx` keys map to None. This is so
that a filter which modifies file names can apply the same name
transformations before files are deleted.
It's included as a module for a reason.
Also, use "$0" so the tests can be run like `./t/main.t` (or any other
directory).
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>