148 Commits

Author SHA1 Message Date
Frej Drejhammar
c87b66ed7d Merge branch 'frej/fix-gh348' 2026-02-14 20:55:09 +01:00
Frej Drejhammar
76db75d963 Support Mercurial 7.2
In Mercurial 7.2 the iteritems() method of the branchmap has been
removed. Switch to iterating over the branches and then fetching heads
by the branchheads() method. In 7.2, a call to
mercurial.initialization.init() is also needed to process a repo using
the largefiles extension.

Thanks to Michael Cho (@cho-m) for suggesting the initialization fix.

Closes #348

Co-developed-by: Michael Cho <michael@michaelcho.dev>
2026-02-14 20:53:20 +01:00
Frej Drejhammar
5c9068a1f1 Merge branch 'frej/gh347' 2026-01-17 20:38:59 +01:00
Kévin Lévesque
42d1c89e73 Split plugin data example into multiple lines
The data example was previously a single line, making it difficult to read.
Replaced the one-liner with a multi-line format to improve clarity.
2026-01-12 16:33:37 -05:00
Kévin Lévesque
9d71921ed8 Allow incremental Git LFS conversion via plugin
Converts large Mercurial repositories to Git/LFS significantly faster by integrating
the LFS conversion into the history export process.

Currently, converting large repositories requires two sequential, long-running steps:
1. Full history conversion (`hg` to `git`).
2. Full history rewrite/import (`git lfs import`).

For huge monorepos (100GiB+, 1M+ files), this sequence can take hours or days.

This commit introduces a new plugin that allows the repository to be converted *incrementally*
(JIT: Just-In-Time). The plugin identifies large files during the initial `hg` to `git`
conversion and immediately writes LFS pointers, eliminating the need for the second,
time-consuming history rewrite step.
2026-01-12 16:33:20 -05:00
Kévin Lévesque
f6b72d248f Allow specifying a repository root commit for conversion
The current conversion process mandates an empty repository for a clean start.
This presents a barrier to performance optimization strategies.

This change introduces the ability to pass a repository root commit hash.

This is necessary to support the immediate next commit (Incremental LFS conversion),
which uses a `.gitattributes` file and LFS pointers to bypass the slow, full-history
rewriting often required on large non-empty monorepos (100GiB+, 1M+ files).

The immediate benefit is allowing conversion to start when a non-empty repo
already contains an orphan commit, laying the groundwork for the optimized LFS
conversion feature.
2025-12-22 10:11:33 -05:00
Frej Drejhammar
8e1ba281d4 Merge branch 'frej/gh341'
Closes #341
Closes #342
2025-08-16 15:18:34 +02:00
Günther Nußmüller
d77765a23e Fix UnboundLocalError with plugins and largefiles
When Plugins are used in a repository that contains largefiles,
the following exception is thrown as soon as the first largefile
is converted:

```
Traceback (most recent call last):
  File "fast-export/hg-fast-export.py", line 728, in <module>
    sys.exit(hg2git(options.repourl,m,options.marksfile,options.mappingfile,
  File "fast-export/hg-fast-export.py", line 581, in hg2git
    c=export_commit(ui,repo,rev,old_marks,max,c,authors,branchesmap,
  File "fast-export/hg-fast-export.py", line 366, in export_commit
    export_file_contents(ctx,man,modified,hgtags,fn_encoding,plugins)
  File "fast-export/hg-fast-export.py", line 222, in export_file_contents
    file_data = {'filename':filename,'file_ctx':file_ctx,'data':d}
UnboundLocalError: local variable 'file_ctx' referenced before assignment
```

This commit fixes the error by:

 * initializing the file_ctx before the largefile handling takes place
 * Providing a new `is_largefile` value for plugins so they can detect
    if largefile handling was applied (and therefore the file_ctx
    object may no longer be in sync with the git version of the file)
2025-08-11 08:30:17 +02:00
Frej Drejhammar
95459e5599 Merge branch 'gh/340'
Closes #340
2025-07-17 19:45:42 +02:00
Günther Nußmüller
de5c8d9d97 Remove redundant type check in set_default_branch
The passed `name` parameter is always of type `str` in Python 3
hence the type check is redundant and no longer needed.
2025-07-16 14:44:43 +02:00
Günther Nußmüller
ad96531587 Fix TypeError when using the --origin option
Encode the `name` parameter to bytes (using the utf8 codec).

This fixes the `TypeError` in subsequent concatenations in `get_branch`:

```
Traceback (most recent call last):
  # stack omitted for brevity
  File "C:\Dev\git-migration\fast-export\hg2git.py", line 73, in get_branch
    return origin_name + b'/' + name
TypeError: can only concatenate str (not "bytes") to str
```

The conversion is done unconditionally since the passed
parameter is currently always of type `str`.
2025-07-16 14:41:45 +02:00
Frej Drejhammar
4af9a33bd6 Merge branch 'frej/gh338' 2025-06-05 16:53:33 +02:00
Frej Drejhammar
f71385ec14 Fix "Warn if one of the marks, mapping, or heads files are empty"
The commit "Warn if one of the marks, mapping, or heads files are
empty" (7224e420a7) mixed up the state and heads caches and reported
that the heads cache was empty if the state case was. Error found by
Shun-ichi Goto.

Closes #338
2025-06-05 16:50:56 +02:00
Frej Drejhammar
ae21cbf1a2 CI: Bump Ubuntu version used for the CI
Switch to the oldest supported version.
2025-06-05 16:50:37 +02:00
Frej Drejhammar
8762fee403 Merge branch 'gh/337' 2025-03-30 13:47:22 +02:00
Frank Zingsheim
bd707b5d6e Fix: Largefiles ignored #141
Import mercurial large files as ordinary files into git

The basic idea to this fix is based on
https://github.com/planestraveler/fast-export/tree/add-lfs-support-v2
from PR #65

Closes #141
2025-03-29 18:39:27 +01:00
Frej Drejhammar
0afd336d6f Merge branch 'gh/333' 2024-07-13 19:37:00 +02:00
Thalia Archibald
dd1c8f219b Disable core.ignoreCase in tests
When core.ignoreCase is set in the global config, hg-fast-export.sh
warns the user and exits. Override this for tests.
2024-07-06 02:46:07 -07:00
Thalia Archibald
f947189dcc Consistently terminate commit messages with LF
When the length logic for fast-import 'data' commands was updated in
4c10270 (Fix data handling, 2023-03-02), one branch was missed, so
commit messages now do not have a final LF appended in most cases. This
changed the longtime behavior, which had been consistent since the first
commit of hg2git, 9832035 (Initial import, 2007-03-06), and is expected
by some applications which compare against old conversions from
Mercurial.
2024-07-05 05:20:35 -07:00
Frej Drejhammar
2a3806576c Merge branch 'gh/328' 2024-04-07 15:30:23 +02:00
Frej Drejhammar
08e2297853 CI: Add tests to avoid a repeat of #328
Extend tests to cover the file content filter example plugins in order
to avoid a repeat of #328.
2024-04-07 15:25:04 +02:00
Frej Drejhammar
893d6302b7 Fix errors resulting from #318
When commit ddfc3a8300 ("Run file_data_filter on deleted files")
started calling the file_data_filter plugin method, in order to make
deletion of plugin-renamed files work, the example plugins were not
updated. This commit updates the example plugins to not crash when the
file context is None.

Thanks to @hetas discovering this.

Closes 328
2024-04-07 15:23:08 +02:00
Frej Drejhammar
3de7bcfc18 CI: Remove run-tests script
The script should have been removed in 90c6ad5f87 ("test: use make
to run the tests").
2024-03-02 20:25:29 +01:00
Frej Drejhammar
d72e96b202 Drop manual CodeQL actions
Use default configuration as configured in the web interface instead
of hand-configured ci-actions which gives warnings.
2024-02-23 18:11:10 +01:00
Frej Drejhammar
fb225c4700 Merge branch 'gh/321' 2024-02-23 17:07:02 +01:00
Frej Drejhammar
997e8e1a8c Merge branch 'gh/320'
Fixes warnings appearing with Python 3.12.

hg-fast-export.py:231: SyntaxWarning: invalid escape sequence '\.'
2024-02-23 17:04:28 +01:00
Stephan Hohe
ddb574004f Add tests for plugins setting file content to None 2024-02-23 13:43:28 +01:00
Stephan Hohe
e63feee1b9 Don't add file if plugin sets content to None 2024-02-20 17:07:23 +01:00
Stephan Hohe
7b4bb7ff1d Fix escape in regular expression 2024-02-19 23:40:05 +01:00
Frej Drejhammar
53bbe05278 Merge branch 'frej/gh318'
Closes #318
2024-02-16 17:56:17 +01:00
Frej Drejhammar
ddfc3a8300 Run file_data_filter on deleted files
The `file_data_filter` method should be called when files are deleted.
In this case the `data` and `file_ctx` keys map to None. This is so
that a filter which modifies file names can apply the same name
transformations before files are deleted.
2024-02-16 17:12:49 +01:00
Frej Drejhammar
21ab3f347b Make plugin loader look in directories relative to cwd
Make the plugin loader also look for plugins using a path relative to
the current working directory.
2024-02-16 17:06:51 +01:00
Frej Drejhammar
878ba44f48 Merge branch 'frej/run-tests-with-different-python-versions' 2023-12-28 13:48:02 +01:00
Frej Drejhammar
2476d08517 Run tests with multiple Python versions
Run the CI tests with both the earliest supported Python version and
the latest stable release.

The intent is to quickly notice when new features require adjusting
the oldest supported Python version and also detect when the latest
stable version breaks old code (as when 3.12 removed `imp` and we
witched to `importlib` in #311).
2023-12-28 13:40:48 +01:00
Frej Drejhammar
d4298a0906 Check for a supported Python version on startup
Check that hg-fast-export is running on a supported version of Python
on startup. This is an attempt to avoid problems like #314 in the
future.
2023-12-28 13:40:48 +01:00
Frej Drejhammar
efe934e16b Update required version of Python to 3.7
Due to problems with handling of Unicode input in Python < 3.7, bump
the required version of Python to 3.7.
2023-12-28 13:40:48 +01:00
Frej Drejhammar
59675eca22 Add command line flag to dump found versions
Add `--debug` command line flag which dumps the detected versions of
Mercurial and Python. This will probably help future debugging when
unexpected versions are used.
2023-12-28 13:40:48 +01:00
Frej Drejhammar
3c694243c4 Merge branch 'frej/fix-314' 2023-12-28 13:39:42 +01:00
Frej Drejhammar
1bbf7028b4 Don't look for a Python 2 interpreter
Don't look for a Python 2 interpreter as Python is no longer
supported. If there is a Python 2 available and it had the Mercurial
modules available, hg-fast-export would use it and fail to import
`importlib.machinery`. This is probably the cause of #314.

Closes #314.
2023-12-27 13:18:56 +01:00
Frej Drejhammar
c8fa290adf Merge branch 'PR/312' 2023-11-18 20:39:44 +01:00
Ekin Dursun
c49dd0cf60 Remove Python 2 compatibility code
Python 2 support was removed recently, so we don't need the
compatibility code anymore.
2023-11-18 20:22:18 +03:00
Frej Drejhammar
4f94d61d84 Merge branch 'PR/311'
Closes #311
2023-11-18 14:54:53 +01:00
Ekin Dursun
a3d0562737 Make pluginloader use importlib instead imp
Python 3.12 has removed imp and it's recommended to use importlib
instead. Python 2.7 doesn't have importlib, so Python 2.7 support is
ceased (not a big deal since it's been more than 3 years since it was
EOLed) as a part of this change.
2023-11-12 20:41:43 +03:00
Frej Drejhammar
0d0e90d328 Merge branch 'PR/305' into frej/felipec-pr-spree
Closes #305
2023-03-27 20:35:36 +02:00
Frej Drejhammar
64ee34dfb0 Merge branch 'PR/303' into frej/felipec-pr-spree
Closes #303
Closes #304
2023-03-27 20:34:17 +02:00
Frej Drejhammar
71834a584c Merge branch 'PR/302' into frej/felipec-pr-spree
Closes #302
2023-03-27 20:33:59 +02:00
Frej Drejhammar
4310e47760 Merge branch 'PR/301' into frej/felipec-pr-spree
Closes #301
2023-03-27 20:33:36 +02:00
Felipe Contreras
278cc9966c github: rename the main action to ci
As in: Continuous Integration.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-27 01:54:00 -06:00
Felipe Contreras
cf66c36a32 github: move CodeQL steps into the main action
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-27 01:53:49 -06:00
Felipe Contreras
269c23c5bb github: cleanup codeql action
Based on the latest walk-through: https://github.com/github/codeql-action.

Gets rid of the warning:

Warning: 1 issue was detected with this workflow: git checkout HEAD^2 is no longer necessary. Please remove this step as Code Scanning recommends analyzing the merge commit for best results.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-27 01:49:29 -06:00
Felipe Contreras
90c6ad5f87 test: use make to run the tests
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-26 20:05:03 -06:00
Felipe Contreras
51db3b4236 test: update default location of sharness
It's included as a module for a reason.

Also, use "$0" so the tests can be run like `./t/main.t` (or any other
directory).

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-26 20:04:38 -06:00
Felipe Contreras
fba03b95fb github: update checkout action
Gets rid of the warning:

Node.js 12 actions are deprecated. Please update the following actions to use Node.js 16: actions/checkout@v2. For more information see: https://github.blog/changelog/2022-09-22-github-actions-all-actions-will-begin-running-on-node16-instead-of-node12/.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-26 19:46:38 -06:00
Felipe Contreras
2cc7db7556 test: bump sharness to 1.2
It's finally released.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-26 19:43:04 -06:00
Frej Drejhammar
a89033b5b1 Merge branch 'PR/299' into frej/sharness-as-submodule-and-smoke-test
Closes #298
Closes #299
2023-03-26 18:40:52 +02:00
Frej Drejhammar
fd5bd48a6c Update codeql to version 2 2023-03-26 16:48:07 +02:00
Frej Drejhammar
84a877d112 Add smoke tests to CI test suite
The added test is an unpublished test, now ported to Sharness, which
has been used by the maintainer to sanity check PRs.
2023-03-26 16:48:07 +02:00
Frej Drejhammar
3f57c4340a Change CI to run tests using test runner 2023-03-24 18:46:53 +01:00
Frej Drejhammar
1e872eb235 Add primitive test runner 2023-03-24 18:11:37 +01:00
Frej Drejhammar
ecdbf0e42e Add Sharness as a submodule 2023-03-24 17:22:23 +01:00
Felipe Contreras
9754a9f3f6 Trivial simplification
Just return the values directly, no need to store them into variables.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-14 22:12:50 -06:00
Felipe Contreras
d2f11bd619 Remove multiple parent logic for file changes
This is already what repo.status does.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-14 22:12:50 -06:00
Felipe Contreras
3582221efd Compare changes only with the first parent
It's not necessary to check both parents.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-14 22:12:50 -06:00
Felipe Contreras
0ae0d20496 Remove no-op check
This code is only executed when there's two parents.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-14 22:12:50 -06:00
Felipe Contreras
e09a14a266 Move parents logic inside get_filechanges
This way export_commit is much simpler (already quite complex), and it's
easier to modify the logic.

No functional changes.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-14 22:12:50 -06:00
Felipe Contreras
9df2f97f6c Rename variables in get_filechanges
It's easier to understand this way.

No functional changes.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-14 22:12:50 -06:00
Felipe Contreras
531fa9b3a2 Simplify split_dict
There's no need to keep track of the left side: if it's modified it's
modified.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-14 22:12:50 -06:00
Felipe Contreras
a229b39d66 Coalesce modified files
Git doesn't care if they are added or changed: they are modified.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-14 22:12:50 -06:00
Felipe Contreras
c666fd9c95 Trivial style cleanup
Checking the array directly is more idiomatic.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-14 22:12:50 -06:00
Felipe Contreras
21fa443b4a Simplify list of files for the first commit
We already have the files.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-14 22:12:50 -06:00
Felipe Contreras
fd6ba361c6 github: enable tests
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-13 20:18:29 -06:00
Felipe Contreras
153ba2a5c1 Add main test
Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-13 20:18:29 -06:00
Frej Drejhammar
df5278f755 Merge branch 'PR/297'
Closes #297
2023-03-13 17:57:20 +01:00
Felipe Contreras
6fbe4d0ad0 Skip earlier
Now that we have ctx easily available, skip early.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-10 12:38:42 -06:00
Felipe Contreras
fa73d8dec9 Share the changectx more
It's used everywhere, might as well pass it along.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-10 12:38:30 -06:00
Felipe Contreras
e1e15b2091 Avoid revsymbol()
We can just do repo[rev].

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-09 19:48:44 -06:00
Felipe Contreras
534d2bdd92 Don't deal with the node in get_changeset()
It's not necessary.

It could be fetched with repo[rev].node(), but why bother?

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-09 19:48:44 -06:00
Felipe Contreras
23f41c0ff1 Use revision directly instead of revnode
We don't need the revnode.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-09 19:48:44 -06:00
Felipe Contreras
8b1fd408ca Use changectx directly
There's no need to call repo[revnode] when repo[rev] works perfectly
fine.

And since we have the context already we can just do ctx.hex() instead
of hexlifying ourselves.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-09 19:48:44 -06:00
Felipe Contreras
4a4d242e98 Fetch node directly
No need to call get_changeset() for that.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-09 19:48:44 -06:00
Felipe Contreras
432254100b Fetch branch names directly
No need to use get_changeset() for just one thing.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-09 19:48:44 -06:00
Felipe Contreras
5e4bc6eb03 Remove cruft
Nothing uses that variable.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-09 19:48:44 -06:00
Felipe Contreras
7886016978 hg2git: set proper default branch
So that cfg_master is picked up in get_branch().

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-09 19:48:44 -06:00
Frej Drejhammar
18577f559d Merge branch 'PR/296' 2023-03-04 20:21:29 +01:00
Felipe Contreras
88defe7fd1 README: cleanup initial instructions
The `git init` command can create the directory, and HEAD doesn't need
to be specified in `git checkout` (it's the default).

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-04 09:53:25 -06:00
Frej Drejhammar
4edea927fb Merge branch 'PR/295'
Closes 295
2023-03-04 16:12:26 +01:00
Felipe Contreras
bbab981130 Trivial simplification of wr
No need to issue two write commands.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-04 16:08:45 +01:00
Felipe Contreras
c3cbf1e04d Add wr_data helper
No functional changes.

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-03 19:34:29 -06:00
Felipe Contreras
4c10270302 Fix data handling
The length should be exactly the same as the data, for example if the
data is "hello" only 5 characters should be written on the stream. Thus
it should always be `len(data)`, not `len(data)+1` as it currently is in
some places.

Since the first commit of hg2git.py there was a wtf comment, presumably
Rocco was confused about this common discrepancy.

We can shuffle the logic around by adding '\n' to the data, and removing
+1 to the length.

Also, the data should be written without a newline (wr_no_nl).

Signed-off-by: Felipe Contreras <felipe.contreras@gmail.com>
2023-03-03 19:33:45 -06:00
Frej Drejhammar
723d8032ba Merge branch 'PR/294' 2022-11-25 16:31:18 +01:00
df
268299a358 Fix typo in README
Added dash to match the actual usage of the 'ignore-unnamed-heads' option
2022-11-19 18:15:04 +01:00
Frej Drejhammar
6700b164d0 Merge branch 'PR/293'
Closes #292
2022-10-23 14:47:04 +02:00
chrisjbillington
13c273f10c Resolve unicode escape sequences not being processed correctly
In `process_unicode_escape_sequences()`, any backslash escape sequences
in the original string are escaped upon the first
`.encode('unicode-escape')` and therefore round-trip the sequence of
`.encode('unicode-escape').decode('unicode-escape')`.

That is not what we want - we want these sequences to be passed-through
the `.encode` unchanged, so that they will be converted to the
character they represent upon `.decode()`.

This patch changes the `.encode()` step to pass through any ascii
characters unchanged, only escaping non-ascii characters. This ensures
any existing backslash escape sequences will be interpreted as the
character they represent upon `.decode()`.
2022-10-23 11:51:33 +11:00
Frej Drejhammar
667404e836 Merge branch 'PR291' 2022-09-21 18:31:16 +02:00
Nicolas Vanhoren
38e236962d Update README.md to change recommandation for crlf filtering 2022-09-21 01:37:39 +02:00
Frej Drejhammar
dbb8158527 Merge branch 'frej/submodule-doc-improvement' 2022-02-10 20:05:07 +01:00
Frej Drejhammar
bb0bcda7ba Merge branch 'frej/fix-re-future-warning' 2022-02-10 20:04:14 +01:00
Frej Drejhammar
838b654614 Remove inconsistencies from submodule documentation
The submodule documentation is not consistent with regards to the
example directory structure. Update the example to be consistent.

Closes #277.
2022-02-09 15:58:48 +01:00
Frej Drejhammar
f179afce65 Fix FutureWarning about nested sets in re
Since Python 3.7 the re module warns for syntax which could, in the
future, be misparsed as a nested set. Avoid this by escaping the
literal `[` we search for in the regexp.

Reported by Monte Davidoff @mndavidoff

Closes #269.
2022-02-09 15:37:29 +01:00
Frej Drejhammar
5b7ca5aaec Give proper error message when refusing to overwrite existing branch
If fast-export was asked to export a Mercurial branch to Git and a
branch of the same name already existed in the Git repo but it was not
created by fast export, fast-export would crash while trying to format
an error message claiming that the destination branch was modified
behind its back.

This patch extends fast-export to detect the situation above and give
a proper error message which hopefully is less confusing to the user.

Credits for discovering the original crash goes to Shun-ichi Goto
<gotoh@taiyo.co.jp>.

Closes: #269.
2021-08-27 16:04:40 +02:00
Frej Drejhammar
4227621eed Update contribution guidelines and make github display them
Try to make it clear that sloppy, throw it over the fence, patches
won't be accepted without revision and try to make sure a potential
contributor sees the warning while creating a pull request.
2021-07-29 15:28:01 +02:00
Frej Drejhammar
bdfc0c08c7 Merge branch 'frej/issue-258'
Closes 258
2021-02-26 16:44:31 +01:00
Frej Drejhammar
001749e69d Merge branch 'PR/260'
Closes 257
2021-02-26 16:40:12 +01:00
SirIntellegence
20c22a3110 Add plugin support for the 'extra' field
Permits plugins to import other information such as svn conversion revisions
2021-02-22 13:09:48 -07:00
Frej Drejhammar
f741bf39f2 bugfix: Avoid starting incremental conversions from scratch
Keys and values in the state cache are byte strings, therefore a
lookup of 'tip' will always fail. The failure makes the conversion
start over from the beginning, but as fast-export is deterministic the
results are the same, just very inefficient. The bug has existed since
the port to Python 3.

This patch switches the 'tip' lookup to use a byte string which should
make incremental conversions restart at the last converted commit. As
'x' == b'x' in Python 2, this should be a backwards compatible change.

Bug reported and fix suggested by Tomas Kolda.

Fixes #258.
2021-02-19 16:47:53 +01:00
Frej Drejhammar
427663c766 Merge branch 'PR/254' 2021-01-10 15:18:28 +01:00
Ray Luo
056756f193 Remove some ".py" wording
Avoid confusion about which file is the main entry point to fast-export,
in order to avoid the issue mentioned here

https://github.com/frej/fast-export/issues/158#issuecomment-754482516

Also fix a typo
2021-01-09 02:06:52 -08:00
Frej Drejhammar
588e03bb23 Merge branch 'PR/251' 2020-11-15 15:34:27 +01:00
Jason Winnebeck
89da4ad8af Document --ignore-unnamed-heads option 2020-11-14 21:24:54 -05:00
Frej Drejhammar
b0d5e56c8d Merge branch 'PR/247' 2020-10-29 19:01:04 +01:00
Frej Drejhammar
787e8559b9 Fix typo in README 2020-10-29 19:00:30 +01:00
Henrik Tunedal
ab500a24a7 Add plugin for dropping commits from output 2020-10-29 12:04:27 +01:00
Frej Drejhammar
ead75895b0 Enable code analysis
Merge github generated workflow into master
2020-10-10 16:26:53 +02:00
Frej Drejhammar
bf5f14ddab Create codeql-analysis.yml 2020-10-10 13:15:54 +00:00
Frej Drejhammar
7057ce2c2b Allow plugins to modify the committer
Plugins have since they were introduced been able to modify the author
of a commit, but not the committer. This patch adds the necessary
support for allowing them to also modify the committer.
2020-09-30 17:47:33 +02:00
Frej Drejhammar
2b6f735b8c Update section about submitting patches in README
Try to cover the most common reasons for requesting changes in PRs.
2020-09-09 14:08:00 +02:00
Frej Drejhammar
71acb42a09 Merge branch 'PR/236-v2' into master
Implement a plugin converting unnamed heads to branches
2020-07-31 17:08:04 +02:00
Ondrej Stanek
a7955bc49b Update head2branch plugin to accept hg commit hash
The revision number isn't a unique identifier of commits across
repository clones and forks, while the hg hash is guaranteed to be stable.
2020-07-31 10:50:57 +02:00
Ondrej Stanek
9c6dea9fd4 Pass original hg commit hash to plugins 2020-07-31 10:50:51 +02:00
Ethan Furman
21827a53f7 Add head2branch plugin
Support converting unnamed heads to named branches during mercurial
conversions.

Co-Authored-By:	ostan89@gmail.com
2020-07-31 10:49:08 +02:00
Ethan Furman
5c1cbf82b0 Add revision to commit_data for commit plugins
Co-Authored-By: ostan89@gmail.com
2020-07-31 10:48:33 +02:00
Ondrej Stanek
50631c4b34 Add option --ignore-unnamed-heads
This option allows the user to ignore only unnamed heads (compared to --force
which ignores all non-fatal issues). The intended use is for a future plugin
converting unnamed heads to named branches.
2020-07-31 10:30:53 +02:00
Ethan Furman
2a9dd53d14 Show all unnamed heads at once
Co-Authored-By: ostan89@gmail.com
2020-07-31 10:27:07 +02:00
Frej Drejhammar
597093eaf1 Merge branch 'fix-233'
Closes #233
2020-07-10 16:52:17 +02:00
Frej Drejhammar
3910044a97 Avoid crash during rev-parse when the default encoding is ascii
In some locales the default encoding is ascii in which case
subprocess.check_output() will fail if it is given a non-ascii ref as
one of the arguments. By forcing the ref to be utf8 we will avoid a
crash while still behaving correctly when the default encoding is
utf8.

The credits for this fix go to Nikita Bazhinov for discovering the fix
and Chris J Billington for explaining it.

Co-Authored-By: Nikita Bazhinov <nbazhinov@syntellect.ru>
Co-Authored-By: Chris J Billington <chrisjbillington@gmail.com>
2020-07-10 16:41:38 +02:00
Frej Drejhammar
44c50d0fae Merge branch 'PR/226' 2020-05-07 20:10:24 +02:00
chrisjbillington
d29d30363b Fix backward incompatible change for hg < 5.1
The port to Python 3 in b961f146 changed `repo.branchmap().iteritems()`
to use `.items()` instead. However, the object returned by mercurial
isn't a dictionary and its `.items()` method was only introduced (as an
alias for `iteritems`) in hg 5.1. `iteritems()` still exists, so let's
keep using it for now to retain compatibility with hg < 5.1.
2020-05-06 11:59:49 -04:00
Frej Drejhammar
f102d2a69f Merge branch 'PR/223'
Closes #223
2020-05-06 16:31:13 +02:00
Ondrej Stanek
cf0e5837b6 Allow converting a repository with git and hg subrepos
In the verification phase, fast-export falsely expects that both hg
and git subrepositories should have the appropriate line in the
subrepo-map file. The case is, that only hg subrepos need a line in
subrepo-map that references a converted subrepo, while git
subrepositories do not.
2020-05-06 16:30:05 +02:00
Frej Drejhammar
61d22307af Merge branch 'PR/217'
Closes: #215
2020-03-26 20:17:20 +01:00
chrisjbillington
3b3f86b71e Allow utf8 in mappings
We were previously processing entries in mapping files (when
`--mappings-are-raw` is not given) with
`.decode('unicode_escape').encode('utf8')` to replace backslash escape
sequences in bytestrings with the utf-8 encoded characters they
represent. However, it turns out that `.decode
('unicode_escape')` assumes latin-1 encoding if it encounters non-ascii
bytes: https://bugs.python.org/issue21331. So this gave incorrect
results if non-ascii utf8 data was present in the mapping.

To fix this, we now add an extra layer of `.decode('utf8').encode
('unicode-escape')` in order to convert any non-ascii characters into
their backslash escape sequences. Then the subsequent
`.decode('unicode_escape')` only encounters ascii characters and gives
correct results.
2020-03-25 12:33:42 -04:00
Frej Drejhammar
e51844cd65 Merge branch 'PR/214'
Closes: #213
2020-03-25 16:09:01 +01:00
Toni Sissala
90eeef2ff4 Fix TypeError when using -M command line argument
hg-fast-export.sanitize_name expects branch name to be a bytes
object. Command line parser gives out str objects. Convert
possible str object to bytes in hg2git.set_default_branch().
2020-03-25 11:19:25 +02:00
Frej Drejhammar
7f4d9c3ad4 Merge branch 'PR/211' 2020-03-10 17:51:47 +01:00
Pi Delport
b37420f404 Fix link markup for hg-export-tool 2020-03-09 16:41:26 +02:00
Frej Drejhammar
f2aa47fdf7 Merge branch 'PR/210'
Closes #210.
2020-03-08 19:43:23 +01:00
chrisjbillington
6361b44c33 Fix bug in ignoring .git files/folders on Windows
Mercurial internally stores (most) filepaths using forward slashes, and
returns them as such from its Python API, even on Windows.

So the splitting up of filepaths with `os.path.sep` was incorrect,
resulting in `.git` files (those within a subdirectory, anyway)
not being ignored on Windows as intended. Splitting on `b'/'` regardless
of OS fixes this.
2020-03-08 19:40:50 +01:00
Frej Drejhammar
afeb58ae95 Merge branch 'PR/209' 2020-03-06 17:30:52 +01:00
chrisjbillington
48508ee299 Fix failure to print error message in verify_heads
On Python 3, `b'%s' % None` fails with a TypeError. In verify_heads,
an error message prints the sha1 of a git commit, but that sha1
can be None.

This commit instead prints `b'<None>'` if sha1 is None.
2020-03-06 11:02:38 -05:00
Frej Drejhammar
56da62847a Merge branch 'PR/208'
Closes #207.
2020-03-01 14:34:38 +01:00
Max Fuqua
750fe6d3e1 Resolve type error resulting from passing an int to b'%s' in python3 2020-02-29 14:55:15 -05:00
Frej Drejhammar
e4d6d433ec Merge branch 'PR/206' 2020-02-29 14:48:46 +01:00
Steven Peters
058c791b75 Check python's mercurial version for compatibility
When checking that python has the mercurial package in hg-fast-export.sh,
use the same import statement that is used in hg-fast-export.py.

hg-fast-export.py imports revsymbol from mercurial.scmutil,
which was introduced in mercurial 4.6, but Ubuntu 18.04 only has
mercurial 4.5.3 using python2, so an incompatible python version may be
chosen without this change.
2020-02-28 15:41:24 -08:00
Frej Drejhammar
13010f7a25 Merge branch 'PR/204'
Closes #203.
2020-02-21 16:34:03 +01:00
chrisjbillington
4071f720b0 Fix issue #203: Resolve stderr encoding issues
In Python 3, `sys.stderr.write()` requires unicode strings, and all
output on standard streams is UTF8 encoded. Therefore in the port to
Python 3, we `.decode()`d all strings that are used in `%` formatting of
strings to be printed to stderr.

However, in Python 2, `sys.stderr` accepts either bytestrings or unicode
strings, and:

- `%s` formatting of a bytestring with a unicode string, i.e  `"%s" %
  u"foo"` results in a unicode string.
- Writing a unicode string to stderr/stdout uses that stream's encoding
- When the output of the process is being piped somewhere other than a
  terminal (as it is when called with pipes and shell redirection from
  hg-fast-export.sh), that encoding is None, which implies ASCII.
- This raises UnicodeEncodeError if the unicode strings passed to
  `stderr.write()` have non-ascii characters.

We cannot fix this problem simply by encoding UTF8 again before writing
to stderr on Python 2. This is because the *decoding* of filenames with
the UTF8 codec may fail - filenames may not even be valid UTF8 desite
this being the declared filesystem encoding.

We could `fsdecode()` filenames on Python 3, which would use the
`surrogateescape` error handler, but stderr does not use this error
handler for output, meaning we would just have to encode again (with the
same error handler) anyway. And Python 2 lacks the `surrogateescape`
error handler in any case - we would need to reimplement it just to do a
round-trip decode and encode for no reason.

This commit leaves filenames and other repository data as bytestrings,
and simply writes them to `sys.stderr.buffer` on Python 3 or
`sys.stderr` on Python 2 as-is, after `%` formatting with bytestring
literals. This avoids encoding issues of filenames altogether.

Other writing to stderr that does not involve repository data has been
left with "native" strings, i.e.
`sys.stderr.write("a string literal %s" % a_command_line_arg)`. These
will still fail on Python 3 if the user passes a non-UTF filename as a
command line argument or similar. This is acceptable IMHO - although
`hg-fast-export` may encounter invalid UTF8 in mercurial repositories,
it is not too much to impose that the user name their branch mapping
files etc with valid UTF8!
2020-02-19 12:18:00 -05:00
Frej Drejhammar
160aa3c9ef Add a reference to hg-export-tool in the documentation
Add pointers to hg-export-tool as a way to batch convert multiple
Mercurial repos, and deal with duplicate heads.
2020-02-14 17:16:18 +01:00
Frej Drejhammar
883474184d Merge branch 'PR/201'
Closes 201
2020-02-14 17:01:35 +01:00
chrisjbillington
b961f146df Support Python 3
Port hg-fast-import to Python 2/3 polyglot code.

Since mercurial accepts and returns bytestrings for all repository data,
the approach I've taken here is to use bytestrings throughout the
hg-fast-import code. All strings pertaining to repository data are
bytestrings. This means the code is using the same string datatype for
this data on Python 3 as it did (and still does) on Python 2.

Repository data coming from subprocess calls to git, or read from files,
is also left as the bytestrings either returned from
subprocess.check_output or as read from the file in 'rb' mode.

Regexes and string literals that are used with repository data have
all had a b'' prefix added.

When repository data is used in error/warning messages, it is decoded
with the UTF8 codec for printing.

With this patch, hg-fast-export.py writes binary output to
sys.stdout.buffer on Python 3 - on Python 2 this doesn't exist and it
still uses sys.stdout.

The only strings that are left as "native" strings and not coerced to
bytestrings are filepaths passed in on the command line, and dictionary
keys for internal data structures used by hg-fast-import.py, that do
not originate in repository data.

Mapping files are read in 'rb' mode, and thus bytestrings are read from
them. When an encoding is given, their contents are decoded with that
encoding, but then immediately encoded again with UTF8 and they are
returned as the resulting bytestrings

Other necessary changes were:

 - indexing byestrings with a single index returns an integer on Python.
   These indexing operations have been replaced with a one-element
   slice: x[0] -> x[0:1] or x[-1] -> [-1:] so at to return a bytestring.

 - raw_hash.encode('hex_codec') replaced with binascii.hexlify(raw_hash)

 - str(integer) -> b'%d' % integer

 - 'string_escape' codec replaced with 'unicode_escape' (which was
    backported to python 2.7). Strings decoded with this codec were then
    immediately re-encoded with UTF8.

 - Calls to map() intended to execute their contents immediately were
   unwrapped or converted to list comprehensions, since map() is an
   iterator and does not execute until iterated over.

hg-fast-export.sh has been modified to not require Python 2. Instead, if
PYTHON has not been defined, it checks python2, python, then python3,
and uses the first one that exists and can import the mercurial module.
2020-02-13 14:35:19 -05:00
50 changed files with 2873 additions and 341 deletions

28
.github/contributing.md vendored Normal file
View File

@@ -0,0 +1,28 @@
When submitting a patch make sure the commits in your pull request:
* Have good commit messages
Please read Chris Beams' blog post [How to Write a Git Commit
Message](https://chris.beams.io/posts/git-commit/) on how to write a
good commit message. Although the article recommends at most 50
characters for the subject, up to 72 characters are frequently
accepted for fast-export.
* Adhere to good [commit
hygiene](http://www.ericbmerritt.com/2011/09/21/commit-hygiene-and-git.html)
When developing a pull request for hg-fast-export, base your work on
the current `master` branch and rebase your work if it no longer can
be merged into the current `master` without conflicts. Never merge
`master` into your development branch, rebase if your work needs
updates from `master`.
When a pull request is modified due to review feedback, please
incorporate the changes into the proper commit. A good reference on
how to modify history is in the [Pro Git book, Section
7.6](https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History).
Please do not submit a pull request if you are not willing to spend
the time required to address review comments or revise the patch until
it follows the guidelines above. A _take it or leave it_ approach to
contributing wastes both your and the maintainer's time.

4
.github/requirements-earliest.txt vendored Normal file
View File

@@ -0,0 +1,4 @@
mercurial==5.2
# Required for git_lfs_importer plugin
pathspec==0.11.2

4
.github/requirements-latest.txt vendored Normal file
View File

@@ -0,0 +1,4 @@
mercurial
# Required for git_lfs_importer plugin
pathspec==0.12.1

71
.github/workflows/ci.yml vendored Normal file
View File

@@ -0,0 +1,71 @@
name: CI
on:
push:
branches: [master]
pull_request:
# The branches below must be a subset of the branches above
branches: [master]
jobs:
test-earliest:
name: Run test suite on the earliest supported Python version
runs-on: ubuntu-22.04
steps:
- uses: actions/checkout@v4
name: Checkout repository
with:
fetch-depth: 1
submodules: 'recursive'
- uses: actions/setup-python@v5
id: earliest
with:
python-version: '3.7.x'
check-latest: true
cache: 'pip'
cache-dependency-path: '**/requirements-earliest.txt'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r .github/requirements-earliest.txt
- name: Report selected versions
run: |
echo Selected '${{ steps.earliest.outputs.python-version }}'
./hg-fast-export.sh --debug
- name: Run tests on earliest supported Python version
run: make -C t
test-latest:
name: Run test suite on the latest supported python version
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
name: Checkout repository
with:
fetch-depth: 1
submodules: 'recursive'
- uses: actions/setup-python@v5
id: latest
with:
python-version: '3.x'
check-latest: true
cache: 'pip'
cache-dependency-path: '**/requirements-latest.txt'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r .github/requirements-latest.txt
- name: Report selected version
run: |
echo Selected '${{ steps.latest.outputs.python-version }}'
./hg-fast-export.sh --debug
- name: Run tests on 3.x
run: make -C t

3
.gitmodules vendored Normal file
View File

@@ -0,0 +1,3 @@
[submodule "t/sharness"]
path = t/sharness
url = https://github.com/felipec/sharness.git

View File

@@ -27,10 +27,10 @@ command line option.
## Example ## Example
Example mercurial repo folder structure (~/mercurial): Example mercurial repo folder structure (~/mercurial) containing two subrepos:
src/... src/...
subrepo/subrepo1 subrepos/subrepo1
subrepo/subrepo2 subrepos/subrepo2
### Setup ### Setup
Create an empty new folder where all the converted git modules will be imported: Create an empty new folder where all the converted git modules will be imported:
@@ -41,18 +41,18 @@ Create an empty new folder where all the converted git modules will be imported:
mkdir submodule1 mkdir submodule1
cd submodule1 cd submodule1
git init git init
hg-fast-export.sh -r ~/mercurial/subrepo1 hg-fast-export.sh -r ~/mercurial/subrepos/subrepo1
cd .. cd ..
mkdir submodule2 mkdir submodule2
cd submodule2 cd submodule2
git init git init
hg-fast-export.sh -r ~/mercurial/subrepo2 hg-fast-export.sh -r ~/mercurial/subrepos/subrepo2
### Create mapping file ### Create mapping file
cd ~/imported-gits cd ~/imported-gits
cat > submodule-mappings << EOF cat > submodule-mappings << EOF
"subrepo/subrepo1"="../submodule1" "subrepos/subrepo1"="../submodule1"
"subrepo/subrepo2"="../submodule2" "subrepos/subrepo2"="../submodule2"
EOF EOF
### Convert main repository ### Convert main repository
@@ -60,16 +60,16 @@ Create an empty new folder where all the converted git modules will be imported:
mkdir git-main-repo mkdir git-main-repo
cd git-main-repo cd git-main-repo
git init git init
hg-fast-export.sh -r ~/mercurial --subrepo-map=../submodule-mappings hg-fast-export.sh -r ~/mercurial --subrepo-map=~/imported-gits/submodule-mappings
### Result ### Result
The resulting repository will now contain the subrepo/subrepo1 and The resulting repository will now contain the submodules at the paths
subrepo/subrepo1 submodules. The created .gitmodules file will look `subrepos/subrepo1` and `subrepos/subrepo2`. The created .gitmodules
like: file will look like:
[submodule "subrepo/subrepo1"] [submodule "subrepos/subrepo1"]
path = subrepo/subrepo1 path = subrepos/subrepo1
url = ../submodule1 url = ../submodule1
[submodule "subrepo/subrepo2"] [submodule "subrepos/subrepo2"]
path = subrepo/subrepo2 path = subrepos/subrepo2
url = ../submodule2 url = ../submodule2

166
README.md
View File

@@ -1,4 +1,4 @@
hg-fast-export.(sh|py) - mercurial to git converter using git-fast-import hg-fast-export.sh - mercurial to git converter using git-fast-import
========================================================================= =========================================================================
Legal Legal
@@ -29,8 +29,8 @@ first time.
System Requirements System Requirements
------------------- -------------------
This project depends on Python 2.7 and the Mercurial >= 4.6 This project depends on Python (>=3.7) and the Mercurial package (>=
package. If Python is not installed, install it before proceeding. The 5.2). If Python is not installed, install it before proceeding. The
Mercurial package can be installed with `pip install mercurial`. Mercurial package can be installed with `pip install mercurial`.
On windows the bash that comes with "Git for Windows" is known to work On windows the bash that comes with "Git for Windows" is known to work
@@ -42,11 +42,10 @@ Usage
Using hg-fast-export is quite simple for a mercurial repository <repo>: Using hg-fast-export is quite simple for a mercurial repository <repo>:
``` ```
mkdir repo-git # or whatever git init repo-git # or whatever
cd repo-git cd repo-git
git init
hg-fast-export.sh -r <local-repo> hg-fast-export.sh -r <local-repo>
git checkout HEAD git checkout
``` ```
Please note that hg-fast-export does not automatically check out the Please note that hg-fast-export does not automatically check out the
@@ -79,10 +78,10 @@ author information than git, an author mapping file can be given to
hg-fast-export to fix up malformed author strings. The file is hg-fast-export to fix up malformed author strings. The file is
specified using the -A option. The file should contain lines of the specified using the -A option. The file should contain lines of the
form `"<key>"="<value>"`. Inside the key and value strings, all escape form `"<key>"="<value>"`. Inside the key and value strings, all escape
sequences understood by the python `string_escape` encoding are sequences understood by the python `unicode_escape` encoding are
supported. (Versions of fast-export prior to v171002 had a different supported; strings are otherwise assumed to be UTF8-encoded.
syntax, the old syntax can be enabled by the flag (Versions of fast-export prior to v171002 had a different syntax, the
`--mappings-are-raw`.) old syntax can be enabled by the flag `--mappings-are-raw`.)
The example authors.map below will translate `User The example authors.map below will translate `User
<garbage<tab><user@example.com>` to `User <user@example.com>`. <garbage<tab><user@example.com>` to `User <user@example.com>`.
@@ -93,6 +92,9 @@ The example authors.map below will translate `User
-- End of authors.map -- -- End of authors.map --
``` ```
If you have many Mercurial repositories, Chris J Billington's
[hg-export-tool] allows you to batch convert them.
Tag and Branch Naming Tag and Branch Naming
--------------------- ---------------------
@@ -129,10 +131,58 @@ is to convert line endings in text files from CRLF to git's preferred LF:
# $2 = Mercurial's hash of the file # $2 = Mercurial's hash of the file
# $3 = "1" if Mercurial reports the file as binary, otherwise "0" # $3 = "1" if Mercurial reports the file as binary, otherwise "0"
if [ "$3" == "1" ]; then cat; else dos2unix; fi if [ "$3" == "1" ]; then cat; else dos2unix -q; fi
# -q option in call to dos2unix allows to avoid returning an
# error code when handling non-ascii based text files (like UTF-16
# encoded text files)
-- End of crlf-filter.sh -- -- End of crlf-filter.sh --
``` ```
Mercurial Largefiles Extension
------------------------------
### Handling Mercurial Largefiles during Migration
When migrating from Mercurial to Git, largefiles are exported as ordinary
files by default. To ensure a successful migration and manage repository
size, follow the requirements below.
#### 1. Pre-Export: Ensure File Availability
Before starting the export, you must have all largefiles from all
Mercurial commits available locally. Use one of these methods:
* **For a new clone:** `hg clone --all-largefiles <repo-url>`
* **For an existing repo:** `hg lfpull --rev "all()"`
#### 2. Choosing Your LFS Strategy
If you want your files to be versioned in Git LFS rather than as standard
Git blobs, you have two primary paths:
* **[git_lfs_importer plugin](./plugins/git_lfs_importer/README.md)
(During Conversion)**
Recommended for large repos. This performs Just-In-Time (JIT) conversion
by identifying large files during the export and writing LFS pointers
immediately, skipping the need for a second pass. This also supports
**incremental conversion**, making it much more efficient for ongoing
migrations.
* **[git lfs migrate import](https://github.com/git-lfs/git-lfs/blob/main/docs/man/git-lfs-migrate.adoc)
(After Conversion)**
A standard two-step process: first, export the full history from Mercurial
to Git, then run a separate full history rewrite to move files into LFS.
### Why use the git_lfs_importer plugin?
For "monorepos" or very large repositories (100GiB+), the traditional
two-step process can take days. By integrating the LFS conversion
directly into the history export, the plugin eliminates the massive
time overhead of a secondary history rewrite and allows for incremental
progress.
For detailed setup, see the
[git_lfs_importer](./plugins/git_lfs_importer/README.md)
plugin documentation.
Plugins Plugins
----------------- -----------------
@@ -163,9 +213,18 @@ defined filter methods in the [dos2unix](./plugins/dos2unix) and
[branch_name_in_commit](./plugins/branch_name_in_commit) plugins. [branch_name_in_commit](./plugins/branch_name_in_commit) plugins.
``` ```
commit_data = {'branch': branch, 'parents': parents, 'author': author, 'desc': desc} commit_data = {
'author': author,
'branch': branch,
'committer': 'committer',
'desc': desc,
'extra': extra,
'hg_hash': hg_hash,
'parents': parents,
'revision': revision,
}
def commit_message_filter(self,commit_data): def commit_message_filter(self, commit_data):
``` ```
The `commit_message_filter` method is called for each commit, after parsing The `commit_message_filter` method is called for each commit, after parsing
from hg, but before outputting to git. The dictionary `commit_data` contains the from hg, but before outputting to git. The dictionary `commit_data` contains the
@@ -174,9 +233,14 @@ values in the dictionary after filters have been run are used to create the git
commit. commit.
``` ```
file_data = {'filename':filename,'file_ctx':file_ctx,'d':d} file_data = {
'data': file_contents,
'file_ctx': file_ctx,
'filename': filename,
'is_largefile': largefile_status,
}
def file_data_filter(self,file_data): def file_data_filter(self, file_data):
``` ```
The `file_data_filter` method is called for each file within each commit. The `file_data_filter` method is called for each file within each commit.
The dictionary `file_data` contains the above attributes about the file, and The dictionary `file_data` contains the above attributes about the file, and
@@ -184,6 +248,17 @@ can be modified by any filter. `file_ctx` is the filecontext from the
mercurial python library. After all filters have been run, the values mercurial python library. After all filters have been run, the values
are used to add the file to the git commit. are used to add the file to the git commit.
The `file_data_filter` method is also called when files are deleted,
but in this case the `data` and `file_ctx` keys map to None. This is
so that a filter which modifies file names can apply the same name
transformations when files are deleted.
The `is_largefile` entry within the `file_data` dictionary will contain
`True` if the original file was a largefile and has been converted
to a normal file before the plugins were invoked. In this case, the `file_ctx`
will still point to the filecontext for the original, unconverted file, while
`filename` and `data` will contain the already converted information.
Submodules Submodules
---------- ----------
See README-SUBMODULES.md for how to convert subrepositories into git See README-SUBMODULES.md for how to convert subrepositories into git
@@ -194,7 +269,15 @@ Notes/Limitations
hg-fast-export supports multiple branches but only named branches with hg-fast-export supports multiple branches but only named branches with
exactly one head each. Otherwise commits to the tip of these heads exactly one head each. Otherwise commits to the tip of these heads
within the branch will get flattened into merge commits. within the branch will get flattened into merge commits. There are a
few options to deal with this:
1. Chris J Billington's [hg-export-tool] can help you to handle branches with
duplicate heads.
2. Use the [head2branch plugin](./plugins/head2branch) to create a new named
branch from an unnamed head.
3. You can ignore unnamed heads with the `--ignore-unnamed-heads` option, which
is appropriate in situations such as the extra heads being close commits
(abandoned, unmerged changes).
hg-fast-export will ignore any files or directories tracked by mercurial hg-fast-export will ignore any files or directories tracked by mercurial
called `.git`, and will print a warning if it encounters one. Git cannot called `.git`, and will print a warning if it encounters one. Git cannot
@@ -213,8 +296,8 @@ possible to use hg-fast-export on remote repositories
Design Design
------ ------
hg-fast-export.py was designed in a way that doesn't require a 2-pass hg-fast-export was designed in a way that doesn't require a 2-pass
mechanism or any prior repository analysis: if just feeds what it mechanism or any prior repository analysis: it just feeds what it
finds into git-fast-import. This also implies that it heavily relies finds into git-fast-import. This also implies that it heavily relies
on strictly linear ordering of changesets from hg, i.e. its on strictly linear ordering of changesets from hg, i.e. its
append-only storage model so that changesets hg-fast-export already append-only storage model so that changesets hg-fast-export already
@@ -223,15 +306,37 @@ saw never get modified.
Submitting Patches Submitting Patches
------------------ ------------------
Please use the [issue-tracker](https://github.com/frej/fast-export) at Please create a pull request at
github to report bugs and submit patches. [Github](https://github.com/frej/fast-export/pulls) to submit patches.
Please read When submitting a patch make sure the commits in your pull request:
[https://chris.beams.io/posts/git-commit/](https://chris.beams.io/posts/git-commit/)
on how to write a good commit message before submitting a pull request * Have good commit messages
for review. Although the article recommends at most 50 characters for
the subject, up to 72 characters are frequently accepted for Please read Chris Beams' blog post [How to Write a Git Commit
fast-export. Message](https://chris.beams.io/posts/git-commit/) on how to write a
good commit message. Although the article recommends at most 50
characters for the subject, up to 72 characters are frequently
accepted for fast-export.
* Adhere to good [commit
hygiene](http://www.ericbmerritt.com/2011/09/21/commit-hygiene-and-git.html)
When developing a pull request for hg-fast-export, base your work on
the current `master` branch and rebase your work if it no longer can
be merged into the current `master` without conflicts. Never merge
`master` into your development branch, rebase if your work needs
updates from `master`.
When a pull request is modified due to review feedback, please
incorporate the changes into the proper commit. A good reference on
how to modify history is in the [Pro Git book, Section
7.6](https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History).
Please do not submit a pull request if you are not willing to spend
the time required to address review comments or revise the patch until
it follows the guidelines above. A _take it or leave it_ approach to
contributing wastes both your and the maintainer's time.
Frequent Problems Frequent Problems
================= =================
@@ -274,3 +379,12 @@ Frequent Problems
By design fast export does not touch your working directory, so to By design fast export does not touch your working directory, so to
git it looks like you have deleted all files, when in fact they have git it looks like you have deleted all files, when in fact they have
never been checked out. Just do a checkout of the branch you want. never been checked out. Just do a checkout of the branch you want.
* `Error: repository has at least one unnamed head: hg r<N>`
By design, hg-fast-export cannot deal with extra heads on a branch.
There are a few options depending on whether the extra heads are
in-use/open or normally closed. See [Notes/Limitations](#noteslimitations)
section for more details.
[hg-export-tool]: https://github.com/chrisjbillington/hg-export-tool

View File

@@ -1,28 +1,21 @@
#!/usr/bin/env python2 #!/usr/bin/env python3
# Copyright (c) 2007, 2008 Rocco Rutte <pdmef@gmx.net> and others. # Copyright (c) 2007, 2008 Rocco Rutte <pdmef@gmx.net> and others.
# Copyright (c) 2025 Siemens
# License: MIT <http://www.opensource.org/licenses/mit-license.php> # License: MIT <http://www.opensource.org/licenses/mit-license.php>
from mercurial import node
from mercurial.scmutil import revsymbol
from hg2git import setup_repo,fixup_user,get_branch,get_changeset from hg2git import setup_repo,fixup_user,get_branch,get_changeset
from hg2git import load_cache,save_cache,get_git_sha1,set_default_branch,set_origin_name from hg2git import load_cache,save_cache,get_git_sha1,set_default_branch,set_origin_name
from optparse import OptionParser from optparse import OptionParser
import re import re
import sys import sys
import os import os
from binascii import hexlify
import pluginloader import pluginloader
from hgext.largefiles import lfutil
if sys.platform == "win32":
# On Windows, sys.stdout is initially opened in text mode, which means that
# when a LF (\n) character is written to sys.stdout, it will be converted
# into CRLF (\r\n). That makes git blow up, so use this platform-specific
# code to change the mode of sys.stdout to binary.
import msvcrt
msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
# silly regex to catch Signed-off-by lines in log message # silly regex to catch Signed-off-by lines in log message
sob_re=re.compile('^Signed-[Oo]ff-[Bb]y: (.+)$') sob_re=re.compile(b'^Signed-[Oo]ff-[Bb]y: (.+)$')
# insert 'checkpoint' command after this many commits or none at all if 0 # insert 'checkpoint' command after this many commits or none at all if 0
cfg_checkpoint_count=0 cfg_checkpoint_count=0
# write some progress message every this many file contents written # write some progress message every this many file contents written
@@ -36,63 +29,43 @@ submodule_mappings=None
auto_sanitize = None auto_sanitize = None
def gitmode(flags): def gitmode(flags):
return 'l' in flags and '120000' or 'x' in flags and '100755' or '100644' return b'l' in flags and b'120000' or b'x' in flags and b'100755' or b'100644'
def wr_no_nl(msg=''): def wr_no_nl(msg=b''):
assert isinstance(msg, bytes)
if msg: if msg:
sys.stdout.write(msg) sys.stdout.buffer.write(msg)
def wr(msg=''): def wr(msg=b''):
wr_no_nl(msg) wr_no_nl(msg + b'\n')
sys.stdout.write('\n')
#map(lambda x: sys.stderr.write('\t[%s]\n' % x),msg.split('\n')) #map(lambda x: sys.stderr.write('\t[%s]\n' % x),msg.split('\n'))
def wr_data(data):
wr(b'data %d' % (len(data)))
wr(data)
def checkpoint(count): def checkpoint(count):
count=count+1 count=count+1
if cfg_checkpoint_count>0 and count%cfg_checkpoint_count==0: if cfg_checkpoint_count>0 and count%cfg_checkpoint_count==0:
sys.stderr.write("Checkpoint after %d commits\n" % count) sys.stderr.buffer.write(b"Checkpoint after %d commits\n" % count)
wr('checkpoint') wr(b'checkpoint')
wr() wr()
return count return count
def revnum_to_revref(rev, old_marks): def revnum_to_revref(rev, old_marks):
"""Convert an hg revnum to a git-fast-import rev reference (an SHA1 """Convert an hg revnum to a git-fast-import rev reference (an SHA1
or a mark)""" or a mark)"""
return old_marks.get(rev) or ':%d' % (rev+1) return old_marks.get(rev) or b':%d' % (rev+1)
def file_mismatch(f1,f2): def get_filechanges(repo,revision,parents,files):
"""See if two revisions of a file are not equal."""
return node.hex(f1)!=node.hex(f2)
def split_dict(dleft,dright,l=[],c=[],r=[],match=file_mismatch):
"""Loop over our repository and find all changed and missing files."""
for left in dleft.keys():
right=dright.get(left,None)
if right==None:
# we have the file but our parent hasn't: add to left set
l.append(left)
elif match(dleft[left],right) or gitmode(dleft.flags(left))!=gitmode(dright.flags(left)):
# we have it but checksums mismatch: add to center set
c.append(left)
for right in dright.keys():
left=dleft.get(right,None)
if left==None:
# if parent has file but we don't: add to right set
r.append(right)
# change is already handled when comparing child against parent
return l,c,r
def get_filechanges(repo,revision,parents,mleft):
"""Given some repository and revision, find all changed/deleted files.""" """Given some repository and revision, find all changed/deleted files."""
l,c,r=[],[],[] if not parents:
for p in parents: # first revision: feed in full manifest
if p<0: continue return files,[]
mright=revsymbol(repo,str(p)).manifest() else:
l,c,r=split_dict(mleft,mright,l,c,r) # take the changes from the first parent
l.sort() f=repo.status(parents[0],revision)
c.sort() return f.modified+f.added,f.removed
r.sort()
return l,c,r
def get_author(logmessage,committer,authors): def get_author(logmessage,committer,authors):
"""As git distincts between author and committer of a patch, try to """As git distincts between author and committer of a patch, try to
@@ -110,7 +83,7 @@ def get_author(logmessage,committer,authors):
"Signed-off-by: foo" and thus matching our detection regex. Prevent "Signed-off-by: foo" and thus matching our detection regex. Prevent
that.""" that."""
loglines=logmessage.split('\n') loglines=logmessage.split(b'\n')
i=len(loglines) i=len(loglines)
# from tail walk to top skipping empty lines # from tail walk to top skipping empty lines
while i>=0: while i>=0:
@@ -138,23 +111,23 @@ def remove_gitmodules(ctx):
# be to only remove the submodules of the first parent. # be to only remove the submodules of the first parent.
for parent_ctx in ctx.parents(): for parent_ctx in ctx.parents():
for submodule in parent_ctx.substate.keys(): for submodule in parent_ctx.substate.keys():
wr('D %s' % submodule) wr(b'D %s' % submodule)
wr('D .gitmodules') wr(b'D .gitmodules')
def refresh_git_submodule(name,subrepo_info): def refresh_git_submodule(name,subrepo_info):
wr('M 160000 %s %s' % (subrepo_info[1],name)) wr(b'M 160000 %s %s' % (subrepo_info[1],name))
sys.stderr.write("Adding/updating submodule %s, revision %s\n" sys.stderr.buffer.write(
% (name,subrepo_info[1])) b"Adding/updating submodule %s, revision %s\n" % (name, subrepo_info[1])
return '[submodule "%s"]\n\tpath = %s\n\turl = %s\n' % (name,name, )
subrepo_info[0]) return b'[submodule "%s"]\n\tpath = %s\n\turl = %s\n' % (name, name, subrepo_info[0])
def refresh_hg_submodule(name,subrepo_info): def refresh_hg_submodule(name,subrepo_info):
gitRepoLocation=submodule_mappings[name] + "/.git" gitRepoLocation=submodule_mappings[name] + b"/.git"
# Populate the cache to map mercurial revision to git revision # Populate the cache to map mercurial revision to git revision
if not name in subrepo_cache: if not name in subrepo_cache:
subrepo_cache[name]=(load_cache(gitRepoLocation+"/hg2git-mapping"), subrepo_cache[name]=(load_cache(gitRepoLocation+b"/hg2git-mapping"),
load_cache(gitRepoLocation+"/hg2git-marks", load_cache(gitRepoLocation+b"/hg2git-marks",
lambda s: int(s)-1)) lambda s: int(s)-1))
(mapping_cache,marks_cache)=subrepo_cache[name] (mapping_cache,marks_cache)=subrepo_cache[name]
@@ -162,71 +135,110 @@ def refresh_hg_submodule(name,subrepo_info):
if subrepo_hash in mapping_cache: if subrepo_hash in mapping_cache:
revnum=mapping_cache[subrepo_hash] revnum=mapping_cache[subrepo_hash]
gitSha=marks_cache[int(revnum)] gitSha=marks_cache[int(revnum)]
wr('M 160000 %s %s' % (gitSha,name)) wr(b'M 160000 %s %s' % (gitSha,name))
sys.stderr.write("Adding/updating submodule %s, revision %s->%s\n" sys.stderr.buffer.write(
% (name,subrepo_hash,gitSha)) b"Adding/updating submodule %s, revision %s->%s\n"
return '[submodule "%s"]\n\tpath = %s\n\turl = %s\n' % (name,name, % (name, subrepo_hash, gitSha)
)
return b'[submodule "%s"]\n\tpath = %s\n\turl = %s\n' % (name,name,
submodule_mappings[name]) submodule_mappings[name])
else: else:
sys.stderr.write("Warning: Could not find hg revision %s for %s in git %s\n" % sys.stderr.buffer.write(
(subrepo_hash,name,gitRepoLocation)) b"Warning: Could not find hg revision %s for %s in git %s\n"
return '' % (subrepo_hash, name, gitRepoLocation,)
)
return b''
def refresh_gitmodules(ctx): def refresh_gitmodules(ctx):
"""Updates list of ctx submodules according to .hgsubstate file""" """Updates list of ctx submodules according to .hgsubstate file"""
remove_gitmodules(ctx) remove_gitmodules(ctx)
gitmodules="" gitmodules=b""
# Create the .gitmodules file and all submodules # Create the .gitmodules file and all submodules
for name,subrepo_info in ctx.substate.items(): for name,subrepo_info in ctx.substate.items():
if subrepo_info[2]=='git': if subrepo_info[2]==b'git':
gitmodules+=refresh_git_submodule(name,subrepo_info) gitmodules+=refresh_git_submodule(name,subrepo_info)
elif submodule_mappings and name in submodule_mappings: elif submodule_mappings and name in submodule_mappings:
gitmodules+=refresh_hg_submodule(name,subrepo_info) gitmodules+=refresh_hg_submodule(name,subrepo_info)
if len(gitmodules): if len(gitmodules):
wr('M 100644 inline .gitmodules') wr(b'M 100644 inline .gitmodules')
wr('data %d' % (len(gitmodules)+1)) wr_data(gitmodules)
wr(gitmodules)
def is_largefile(filename):
return filename[:6] == b'.hglf/'
def largefile_orig_name(filename):
return filename[6:]
def largefile_data(ctx, file, filename):
lf_file_ctx=ctx.filectx(file)
lf_hash=lf_file_ctx.data().strip(b'\n')
sys.stderr.write("Detected large file hash %s\n" % lf_hash.decode())
#should detect where the large files are located
file_with_data = lfutil.findfile(ctx.repo(), lf_hash)
if file_with_data is None:
# Autodownloading from the mercurial repository would be an issue as there
# is a good chance that we may need to input some username and password.
# This will surely break fast-export as there will be some unexpected
# output.
sys.stderr.write("Large file wasn't found in local cache.\n")
sys.stderr.write("Please clone with --all-largefiles\n")
sys.stderr.write("or pull all large files with 'hg lfpull --rev "
"\"all()\"'\n")
# closing in the middle of import will revert everything to the last checkpoint
sys.exit(3)
with open(os.path.normpath(file_with_data), 'rb') as file_with_data_handle:
return file_with_data_handle.read()
def export_file_contents(ctx,manifest,files,hgtags,encoding='',plugins={}): def export_file_contents(ctx,manifest,files,hgtags,encoding='',plugins={}):
count=0 count=0
max=len(files) max=len(files)
is_submodules_refreshed=False is_submodules_refreshed=False
for file in files: for file in files:
if not is_submodules_refreshed and (file=='.hgsub' or file=='.hgsubstate'): if not is_submodules_refreshed and (file==b'.hgsub' or file==b'.hgsubstate'):
is_submodules_refreshed=True is_submodules_refreshed=True
refresh_gitmodules(ctx) refresh_gitmodules(ctx)
# Skip .hgtags files. They only get us in trouble. # Skip .hgtags files. They only get us in trouble.
if not hgtags and file == ".hgtags": if not hgtags and file == b".hgtags":
sys.stderr.write('Skip %s\n' % (file)) sys.stderr.buffer.write(b'Skip %s\n' % file)
continue continue
if encoding: if encoding:
filename=file.decode(encoding).encode('utf8') filename=file.decode(encoding).encode('utf8')
else: else:
filename=file filename=file
if '.git' in filename.split(os.path.sep): if b'.git' in filename.split(b'/'): # Even on Windows, the path separator is / here.
sys.stderr.write('Ignoring file %s which cannot be tracked by git\n' % filename) sys.stderr.buffer.write(
b'Ignoring file %s which cannot be tracked by git\n' % filename
)
continue continue
largefile = False
file_ctx=ctx.filectx(file) file_ctx=ctx.filectx(file)
if is_largefile(filename):
largefile = True
filename = largefile_orig_name(filename)
d = largefile_data(ctx, file, filename)
else:
d=file_ctx.data() d=file_ctx.data()
if plugins and plugins['file_data_filters']: if plugins and plugins['file_data_filters']:
file_data = {'filename':filename,'file_ctx':file_ctx,'data':d} file_data = {'filename':filename,'file_ctx':file_ctx,'data':d, 'is_largefile':largefile}
for filter in plugins['file_data_filters']: for filter in plugins['file_data_filters']:
filter(file_data) filter(file_data)
d=file_data['data'] d=file_data['data']
filename=file_data['filename'] filename=file_data['filename']
file_ctx=file_data['file_ctx'] file_ctx=file_data['file_ctx']
wr('M %s inline %s' % (gitmode(manifest.flags(file)), if d is not None:
wr(b'M %s inline %s' % (gitmode(manifest.flags(file)),
strip_leading_slash(filename))) strip_leading_slash(filename)))
wr('data %d' % len(d)) # had some trouble with size() wr(b'data %d' % len(d)) # had some trouble with size()
wr(d) wr(d)
count+=1 count+=1
if count%cfg_export_boundary==0: if count%cfg_export_boundary==0:
sys.stderr.write('Exported %d/%d files\n' % (count,max)) sys.stderr.buffer.write(b'Exported %d/%d files\n' % (count,max))
if max>cfg_export_boundary: if max>cfg_export_boundary:
sys.stderr.write('Exported %d/%d files\n' % (count,max)) sys.stderr.buffer.write(b'Exported %d/%d files\n' % (count,max))
def sanitize_name(name,what="branch", mapping={}): def sanitize_name(name,what="branch", mapping={}):
"""Sanitize input roughly according to git-check-ref-format(1)""" """Sanitize input roughly according to git-check-ref-format(1)"""
@@ -246,164 +258,172 @@ def sanitize_name(name,what="branch", mapping={}):
def dot(name): def dot(name):
if not name: return name if not name: return name
if name[0] == '.': return '_'+name[1:] if name[0:1] == b'.': return b'_'+name[1:]
return name return name
if not auto_sanitize: if not auto_sanitize:
return mapping.get(name,name) return mapping.get(name,name)
n=mapping.get(name,name) n=mapping.get(name,name)
p=re.compile('([[ ~^:?\\\\*]|\.\.)') p=re.compile(b'([\\[ ~^:?\\\\*]|\\.\\.)')
n=p.sub('_', n) n=p.sub(b'_', n)
if n[-1] in ('/', '.'): n=n[:-1]+'_' if n[-1:] in (b'/', b'.'): n=n[:-1]+b'_'
n='/'.join(map(dot,n.split('/'))) n=b'/'.join([dot(s) for s in n.split(b'/')])
p=re.compile('_+') p=re.compile(b'_+')
n=p.sub('_', n) n=p.sub(b'_', n)
if n!=name: if n!=name:
sys.stderr.write('Warning: sanitized %s [%s] to [%s]\n' % (what,name,n)) sys.stderr.buffer.write(
b'Warning: sanitized %s [%s] to [%s]\n' % (what.encode(), name, n)
)
return n return n
def strip_leading_slash(filename): def strip_leading_slash(filename):
if filename[0] == '/': if filename[0:1] == b'/':
return filename[1:] return filename[1:]
return filename return filename
def export_commit(ui,repo,revision,old_marks,max,count,authors, def export_commit(ui,repo,revision,old_marks,max,count,authors,
branchesmap,sob,brmap,hgtags,encoding='',fn_encoding='', branchesmap,sob,brmap,hgtags,encoding='',fn_encoding='',
plugins={}): first_commit_hash="",plugins={}):
def get_branchname(name): def get_branchname(name):
if brmap.has_key(name): if name in brmap:
return brmap[name] return brmap[name]
n=sanitize_name(name, "branch", branchesmap) n=sanitize_name(name, "branch", branchesmap)
brmap[name]=n brmap[name]=n
return n return n
(revnode,_,user,(time,timezone),files,desc,branch,_)=get_changeset(ui,repo,revision,authors,encoding) ctx=repo[revision]
if repo[revnode].hidden():
if ctx.hidden():
return count return count
(_,user,(time,timezone),files,desc,branch,extra)=get_changeset(ui,repo,revision,authors,encoding)
branch=get_branchname(branch) branch=get_branchname(branch)
parents = [p for p in repo.changelog.parentrevs(revision) if p >= 0] parents = [p for p in repo.changelog.parentrevs(revision) if p >= 0]
author = get_author(desc,user,authors) author = get_author(desc,user,authors)
hg_hash=ctx.hex()
if plugins and plugins['commit_message_filters']: if plugins and plugins['commit_message_filters']:
commit_data = {'branch': branch, 'parents': parents, 'author': author, 'desc': desc} commit_data = {'branch': branch, 'parents': parents,
'author': author, 'desc': desc,
'revision': revision, 'hg_hash': hg_hash,
'committer': user, 'extra': extra}
for filter in plugins['commit_message_filters']: for filter in plugins['commit_message_filters']:
filter(commit_data) filter(commit_data)
branch = commit_data['branch'] branch = commit_data['branch']
parents = commit_data['parents'] parents = commit_data['parents']
author = commit_data['author'] author = commit_data['author']
user = commit_data['committer']
desc = commit_data['desc'] desc = commit_data['desc']
if len(parents)==0 and revision != 0: if len(parents)==0 and revision != 0:
wr('reset refs/heads/%s' % branch) wr(b'reset refs/heads/%s' % branch)
wr('commit refs/heads/%s' % branch) wr(b'commit refs/heads/%s' % branch)
wr('mark :%d' % (revision+1)) wr(b'mark :%d' % (revision+1))
if sob: if sob:
wr('author %s %d %s' % (author,time,timezone)) wr(b'author %s %d %s' % (author,time,timezone))
wr('committer %s %d %s' % (user,time,timezone)) wr(b'committer %s %d %s' % (user,time,timezone))
wr('data %d' % (len(desc)+1)) # wtf? wr_data(desc + b'\n')
wr(desc)
wr()
ctx=revsymbol(repo,str(revision))
man=ctx.manifest() man=ctx.manifest()
added,changed,removed,type=[],[],[],''
if len(parents) == 0: if not parents:
# first revision: feed in full manifest
added=man.keys()
added.sort()
type='full' type='full'
if revision == 0 and first_commit_hash:
wr(b'from %s' % first_commit_hash.encode())
type='simple delta'
else: else:
wr('from %s' % revnum_to_revref(parents[0], old_marks)) wr(b'from %s' % revnum_to_revref(parents[0], old_marks))
if len(parents) == 1: if len(parents) == 1:
# later non-merge revision: feed in changed manifest
# if we have exactly one parent, just take the changes from the
# manifest without expensively comparing checksums
f=repo.status(parents[0],revnode)
added,changed,removed=f.added,f.modified,f.removed
type='simple delta' type='simple delta'
else: # a merge with two parents else: # a merge with two parents
wr('merge %s' % revnum_to_revref(parents[1], old_marks)) wr(b'merge %s' % revnum_to_revref(parents[1], old_marks))
# later merge revision: feed in changed manifest
# for many files comparing checksums is expensive so only do it for
# merges where we really need it due to hg's revlog logic
added,changed,removed=get_filechanges(repo,revision,parents,man)
type='thorough delta' type='thorough delta'
sys.stderr.write('%s: Exporting %s revision %d/%d with %d/%d/%d added/changed/removed files\n' % modified,removed=get_filechanges(repo,revision,parents,files)
(branch,type,revision+1,max,len(added),len(changed),len(removed)))
for filename in removed: sys.stderr.buffer.write(
b'%s: Exporting %s revision %d/%d with %d/%d modified/removed files\n'
% (branch, type.encode(), revision + 1, max, len(modified), len(removed))
)
for file in removed:
if fn_encoding: if fn_encoding:
filename=filename.decode(fn_encoding).encode('utf8') filename=file.decode(fn_encoding).encode('utf8')
filename=strip_leading_slash(filename) else:
if filename=='.hgsub': filename=file
remove_gitmodules(ctx)
wr('D %s' % filename)
export_file_contents(ctx,man,added,hgtags,fn_encoding,plugins) if plugins and plugins['file_data_filters']:
export_file_contents(ctx,man,changed,hgtags,fn_encoding,plugins) file_data = {'filename':filename, 'file_ctx':None, 'data':None}
for filter in plugins['file_data_filters']:
filter(file_data)
filename=file_data['filename']
filename=strip_leading_slash(filename)
if filename==b'.hgsub':
remove_gitmodules(ctx)
if is_largefile(filename):
filename=largefile_orig_name(filename)
wr(b'D %s' % filename)
export_file_contents(ctx,man,modified,hgtags,fn_encoding,plugins)
wr() wr()
return checkpoint(count) return checkpoint(count)
def export_note(ui,repo,revision,count,authors,encoding,is_first): def export_note(ui,repo,revision,count,authors,encoding,is_first):
(revnode,_,user,(time,timezone),_,_,_,_)=get_changeset(ui,repo,revision,authors,encoding) ctx = repo[revision]
if repo[revnode].hidden():
if ctx.hidden():
return count return count
parents = [p for p in repo.changelog.parentrevs(revision) if p >= 0] (_,user,(time,timezone),_,_,_,_)=get_changeset(ui,repo,revision,authors,encoding)
wr('commit refs/notes/hg') wr(b'commit refs/notes/hg')
wr('committer %s %d %s' % (user,time,timezone)) wr(b'committer %s %d %s' % (user,time,timezone))
wr('data 0') wr(b'data 0')
if is_first: if is_first:
wr('from refs/notes/hg^0') wr(b'from refs/notes/hg^0')
wr('N inline :%d' % (revision+1)) wr(b'N inline :%d' % (revision+1))
hg_hash=revsymbol(repo,str(revision)).hex() hg_hash=ctx.hex()
wr('data %d' % (len(hg_hash))) wr_data(hg_hash)
wr_no_nl(hg_hash)
wr() wr()
return checkpoint(count) return checkpoint(count)
wr('data %d' % (len(desc)+1)) # wtf?
wr(desc)
wr()
def export_tags(ui,repo,old_marks,mapping_cache,count,authors,tagsmap): def export_tags(ui,repo,old_marks,mapping_cache,count,authors,tagsmap):
l=repo.tagslist() l=repo.tagslist()
for tag,node in l: for tag,node in l:
# Remap the branch name # Remap the branch name
tag=sanitize_name(tag,"tag",tagsmap) tag=sanitize_name(tag,"tag",tagsmap)
# ignore latest revision # ignore latest revision
if tag=='tip': continue if tag==b'tip': continue
# ignore tags to nodes that are missing (ie, 'in the future') # ignore tags to nodes that are missing (ie, 'in the future')
if node.encode('hex_codec') not in mapping_cache: if hexlify(node) not in mapping_cache:
sys.stderr.write('Tag %s refers to unseen node %s\n' % (tag, node.encode('hex_codec'))) sys.stderr.buffer.write(b'Tag %s refers to unseen node %s\n' % (tag, hexlify(node)))
continue continue
rev=int(mapping_cache[node.encode('hex_codec')]) rev=int(mapping_cache[hexlify(node)])
ref=revnum_to_revref(rev, old_marks) ref=revnum_to_revref(rev, old_marks)
if ref==None: if ref==None:
sys.stderr.write('Failed to find reference for creating tag' sys.stderr.buffer.write(
' %s at r%d\n' % (tag,rev)) b'Failed to find reference for creating tag %s at r%d\n' % (tag, rev)
)
continue continue
sys.stderr.write('Exporting tag [%s] at [hg r%d] [git %s]\n' % (tag,rev,ref)) sys.stderr.buffer.write(b'Exporting tag [%s] at [hg r%d] [git %s]\n' % (tag, rev, ref))
wr('reset refs/tags/%s' % tag) wr(b'reset refs/tags/%s' % tag)
wr('from %s' % ref) wr(b'from %s' % ref)
wr() wr()
count=checkpoint(count) count=checkpoint(count)
return count return count
def load_mapping(name, filename, mapping_is_raw): def load_mapping(name, filename, mapping_is_raw):
raw_regexp=re.compile('^([^=]+)[ ]*=[ ]*(.+)$') raw_regexp=re.compile(b'^([^=]+)[ ]*=[ ]*(.+)$')
string_regexp='"(((\\.)|(\\")|[^"])*)"' string_regexp=b'"(((\\.)|(\\")|[^"])*)"'
quoted_regexp=re.compile('^'+string_regexp+'[ ]*=[ ]*'+string_regexp+'$') quoted_regexp=re.compile(b'^'+string_regexp+b'[ ]*=[ ]*'+string_regexp+b'$')
def parse_raw_line(line): def parse_raw_line(line):
m=raw_regexp.match(line) m=raw_regexp.match(line)
@@ -411,26 +431,40 @@ def load_mapping(name, filename, mapping_is_raw):
return None return None
return (m.group(1).strip(), m.group(2).strip()) return (m.group(1).strip(), m.group(2).strip())
def process_unicode_escape_sequences(s):
# Replace unicode escape sequences in the otherwise UTF8-encoded bytestring s with
# the UTF8-encoded characters they represent. We need to do an additional
# .decode('utf8').encode('ascii', 'backslashreplace') to convert any non-ascii
# characters into their escape sequences so that the subsequent
# .decode('unicode-escape') succeeds:
return (
s.decode('utf8')
.encode('ascii', 'backslashreplace')
.decode('unicode-escape')
.encode('utf8')
)
def parse_quoted_line(line): def parse_quoted_line(line):
m=quoted_regexp.match(line) m=quoted_regexp.match(line)
if m==None: if m==None:
return None return
return (m.group(1).decode('string_escape'),
m.group(5).decode('string_escape')) return (process_unicode_escape_sequences(m.group(1)),
process_unicode_escape_sequences(m.group(5)))
cache={} cache={}
if not os.path.exists(filename): if not os.path.exists(filename):
sys.stderr.write('Could not open mapping file [%s]\n' % (filename)) sys.stderr.write('Could not open mapping file [%s]\n' % (filename))
return cache return cache
f=open(filename,'r') f=open(filename,'rb')
l=0 l=0
a=0 a=0
for line in f.readlines(): for line in f.readlines():
l+=1 l+=1
line=line.strip() line=line.strip()
if l==1 and line[0]=='#' and line=='# quoted-escaped-strings': if l==1 and line[0:1]==b'#' and line==b'# quoted-escaped-strings':
continue continue
elif line=='' or line[0]=='#': elif line==b'' or line[0:1]==b'#':
continue continue
m=parse_raw_line(line) if mapping_is_raw else parse_quoted_line(line) m=parse_raw_line(line) if mapping_is_raw else parse_quoted_line(line)
if m==None: if m==None:
@@ -452,9 +486,11 @@ def branchtip(repo, heads):
break break
return tip return tip
def verify_heads(ui,repo,cache,force,branchesmap): def verify_heads(ui,repo,cache,force,ignore_unnamed_heads,branchesmap):
branches={} branches={}
for bn, heads in repo.branchmap().iteritems():
for bn in repo.branchmap():
heads = repo.branchmap().branchheads(bn)
branches[bn] = branchtip(repo, heads) branches[bn] = branchtip(repo, heads)
l=[(-repo.changelog.rev(n), n, t) for t, n in branches.items()] l=[(-repo.changelog.rev(n), n, t) for t, n in branches.items()]
l.sort() l.sort()
@@ -465,26 +501,38 @@ def verify_heads(ui,repo,cache,force,branchesmap):
sanitized_name=sanitize_name(b,"branch",branchesmap) sanitized_name=sanitize_name(b,"branch",branchesmap)
sha1=get_git_sha1(sanitized_name) sha1=get_git_sha1(sanitized_name)
c=cache.get(sanitized_name) c=cache.get(sanitized_name)
if sha1!=c: if not c and sha1:
sys.stderr.write('Error: Branch [%s] modified outside hg-fast-export:' sys.stderr.buffer.write(
'\n%s (repo) != %s (cache)\n' % (b,sha1,c)) b'Error: Branch [%s] already exists and was not created by hg-fast-export, '
b'export would overwrite unrelated branch\n' % b)
if not force: return False
elif sha1!=c:
sys.stderr.buffer.write(
b'Error: Branch [%s] modified outside hg-fast-export:'
b'\n%s (repo) != %s (cache)\n' % (b, b'<None>' if sha1 is None else sha1, c)
)
if not force: return False if not force: return False
# verify that branch has exactly one head # verify that branch has exactly one head
t={} t={}
for h in repo.filtered('visible').heads(): unnamed_heads=False
(_,_,_,_,_,_,branch,_)=get_changeset(ui,repo,h) for h in repo.filtered(b'visible').heads():
branch=get_branch(repo[h].branch())
if t.get(branch,False): if t.get(branch,False):
sys.stderr.write('Error: repository has at least one unnamed head: hg r%s\n' % sys.stderr.buffer.write(
repo.changelog.rev(h)) b'Error: repository has an unnamed head: hg r%d\n'
if not force: return False % repo.changelog.rev(h)
)
unnamed_heads=True
if not force and not ignore_unnamed_heads: return False
t[branch]=True t[branch]=True
if unnamed_heads and not force and not ignore_unnamed_heads: return False
return True return True
def hg2git(repourl,m,marksfile,mappingfile,headsfile,tipfile, def hg2git(repourl,m,marksfile,mappingfile,headsfile,tipfile,
authors={},branchesmap={},tagsmap={}, authors={},branchesmap={},tagsmap={},
sob=False,force=False,hgtags=False,notes=False,encoding='',fn_encoding='', sob=False,force=False,ignore_unnamed_heads=False,hgtags=False,
notes=False,encoding='',fn_encoding='',first_commit_hash='',
plugins={}): plugins={}):
def check_cache(filename, contents): def check_cache(filename, contents):
if len(contents) == 0: if len(contents) == 0:
@@ -500,12 +548,12 @@ def hg2git(repourl,m,marksfile,mappingfile,headsfile,tipfile,
if len(state_cache) != 0: if len(state_cache) != 0:
for (name, data) in [(marksfile, old_marks), for (name, data) in [(marksfile, old_marks),
(mappingfile, mapping_cache), (mappingfile, mapping_cache),
(headsfile, state_cache)]: (headsfile, heads_cache)]:
check_cache(name, data) check_cache(name, data)
ui,repo=setup_repo(repourl) ui,repo=setup_repo(repourl)
if not verify_heads(ui,repo,heads_cache,force,branchesmap): if not verify_heads(ui,repo,heads_cache,force,ignore_unnamed_heads,branchesmap):
return 1 return 1
try: try:
@@ -513,26 +561,26 @@ def hg2git(repourl,m,marksfile,mappingfile,headsfile,tipfile,
except AttributeError: except AttributeError:
tip=len(repo) tip=len(repo)
min=int(state_cache.get('tip',0)) min=int(state_cache.get(b'tip',0))
max=_max max=_max
if _max<0 or max>tip: if _max<0 or max>tip:
max=tip max=tip
for rev in range(0,max): for rev in range(0,max):
(revnode,_,_,_,_,_,_,_)=get_changeset(ui,repo,rev,authors) ctx=repo[rev]
if repo[revnode].hidden(): if ctx.hidden():
continue continue
mapping_cache[revnode.encode('hex_codec')] = str(rev) mapping_cache[ctx.hex()] = b"%d" % rev
if submodule_mappings: if submodule_mappings:
# Make sure that all submodules are registered in the submodule-mappings file # Make sure that all mercurial submodules are registered in the submodule-mappings file
for rev in range(0,max): for rev in range(0,max):
ctx=revsymbol(repo,str(rev)) ctx=repo[rev]
if ctx.hidden(): if ctx.hidden():
continue continue
if ctx.substate: if ctx.substate:
for key in ctx.substate: for key in ctx.substate:
if key not in submodule_mappings: if ctx.substate[key][2]=='hg' and key not in submodule_mappings:
sys.stderr.write("Error: %s not found in submodule-mappings\n" % (key)) sys.stderr.write("Error: %s not found in submodule-mappings\n" % (key))
return 1 return 1
@@ -540,14 +588,14 @@ def hg2git(repourl,m,marksfile,mappingfile,headsfile,tipfile,
brmap={} brmap={}
for rev in range(min,max): for rev in range(min,max):
c=export_commit(ui,repo,rev,old_marks,max,c,authors,branchesmap, c=export_commit(ui,repo,rev,old_marks,max,c,authors,branchesmap,
sob,brmap,hgtags,encoding,fn_encoding, sob,brmap,hgtags,encoding,fn_encoding,first_commit_hash,
plugins) plugins)
if notes: if notes:
for rev in range(min,max): for rev in range(min,max):
c=export_note(ui,repo,rev,c,authors, encoding, rev == min and min != 0) c=export_note(ui,repo,rev,c,authors, encoding, rev == min and min != 0)
state_cache['tip']=max state_cache[b'tip']=max
state_cache['repo']=repourl state_cache[b'repo']=repourl
save_cache(tipfile,state_cache) save_cache(tipfile,state_cache)
save_cache(mappingfile,mapping_cache) save_cache(mappingfile,mapping_cache)
@@ -591,7 +639,9 @@ if __name__=='__main__':
parser.add_option("-T","--tags",dest="tagsfile", parser.add_option("-T","--tags",dest="tagsfile",
help="Read tags map from TAGSFILE") help="Read tags map from TAGSFILE")
parser.add_option("-f","--force",action="store_true",dest="force", parser.add_option("-f","--force",action="store_true",dest="force",
default=False,help="Ignore validation errors by force") default=False,help="Ignore validation errors by force, implies --ignore-unnamed-heads")
parser.add_option("--ignore-unnamed-heads",action="store_true",dest="ignore_unnamed_heads",
default=False,help="Ignore unnamed head errors")
parser.add_option("-M","--default-branch",dest="default_branch", parser.add_option("-M","--default-branch",dest="default_branch",
help="Set the default branch") help="Set the default branch")
parser.add_option("-o","--origin",dest="origin_name", parser.add_option("-o","--origin",dest="origin_name",
@@ -612,6 +662,8 @@ if __name__=='__main__':
help="Add a plugin with the given init string <name=init>") help="Add a plugin with the given init string <name=init>")
parser.add_option("--subrepo-map", type="string", dest="subrepo_map", parser.add_option("--subrepo-map", type="string", dest="subrepo_map",
help="Provide a mapping file between the subrepository name and the submodule name") help="Provide a mapping file between the subrepository name and the submodule name")
parser.add_option("--first-commit-hash", type="string", dest="first_commit_hash",
help="Allow importing into an existing git repository by specifying the hash of the first commit")
(options,args)=parser.parse_args() (options,args)=parser.parse_args()
@@ -687,6 +739,9 @@ if __name__=='__main__':
sys.exit(hg2git(options.repourl,m,options.marksfile,options.mappingfile, sys.exit(hg2git(options.repourl,m,options.marksfile,options.mappingfile,
options.headsfile, options.statusfile, options.headsfile, options.statusfile,
authors=a,branchesmap=b,tagsmap=t, authors=a,branchesmap=b,tagsmap=t,
sob=options.sob,force=options.force,hgtags=options.hgtags, sob=options.sob,force=options.force,
ignore_unnamed_heads=options.ignore_unnamed_heads,
hgtags=options.hgtags,
notes=options.notes,encoding=encoding,fn_encoding=fn_encoding, notes=options.notes,encoding=encoding,fn_encoding=fn_encoding,
first_commit_hash=options.first_commit_hash,
plugins=plugins_dict)) plugins=plugins_dict))

View File

@@ -28,29 +28,32 @@ SFX_STATE="state"
GFI_OPTS="" GFI_OPTS=""
if [ -z "${PYTHON}" ]; then if [ -z "${PYTHON}" ]; then
# $PYTHON is not set, so we try to find a working python 2.7 to # $PYTHON is not set, so we try to find a working python with mercurial:
# use. PEP 394 tells us to use 'python2', otherwise try plain for python_cmd in python3 python; do
# 'python'. if command -v $python_cmd > /dev/null; then
if command -v python2 > /dev/null; then $python_cmd -c 'from mercurial.scmutil import revsymbol' 2> /dev/null
PYTHON="python2" if [ $? -eq 0 ]; then
elif command -v python > /dev/null; then PYTHON=$python_cmd
PYTHON="python" break
else
echo "Could not find any python interpreter, please use the 'PYTHON'" \
"environment variable to specify the interpreter to use."
exit 1
fi fi
fi
done
fi fi
if [ -z "${PYTHON}" ]; then
# Check that the python specified by the user or autodetected above is echo "Could not find a python interpreter with the mercurial module >= 4.6 available. " \
# >= 2.7 and < 3. "Please use the 'PYTHON' environment variable to specify the interpreter to use."
if ! ${PYTHON} -c 'import sys; v=sys.version_info; exit(0 if v.major == 2 and v.minor >= 7 else 1)' > /dev/null 2>&1 ; then
echo "${PYTHON} is not a working python 2.7 interpreter, please use the" \
"'PYTHON' environment variable to specify the interpreter to use."
exit 1 exit 1
fi fi
USAGE="[--quiet] [-r <repo>] [--force] [-m <max>] [-s] [--hgtags] [-A <file>] [-B <file>] [-T <file>] [-M <name>] [-o <name>] [--hg-hash] [-e <encoding>]" "${PYTHON}" -c 'import sys; exit(sys.version_info.major==3 and sys.version_info.minor >= 7)'
if [ $? -eq 0 ]; then
echo "Could not find an interpreter for a supported Python version (>= 3.7)" \
"Please use the 'PYTHON' environment variable to specify the interpreter to use."
exit 1
fi
USAGE="[--quiet] [-r <repo>] [--force] [--ignore-unnamed-heads] [-m <max>] [-s] [--hgtags] [-A <file>] [-B <file>] [-T <file>] [-M <name>] [-o <name>] [--hg-hash] [-e <encoding>]"
LONG_USAGE="Import hg repository <repo> up to either tip or <max> LONG_USAGE="Import hg repository <repo> up to either tip or <max>
If <repo> is omitted, use last hg repository as obtained from state file, If <repo> is omitted, use last hg repository as obtained from state file,
GIT_DIR/$PFX-$SFX_STATE by default. GIT_DIR/$PFX-$SFX_STATE by default.
@@ -84,6 +87,8 @@ Options:
with <file-path> <hg-hash> <is-binary> as arguments with <file-path> <hg-hash> <is-binary> as arguments
--plugin <plugin=init> Add a plugin with the given init string (repeatable) --plugin <plugin=init> Add a plugin with the given init string (repeatable)
--plugin-path <plugin-path> Add an additional plugin lookup path --plugin-path <plugin-path> Add an additional plugin lookup path
--first-commit-hash <git-commit-hash> Use the given git commit hash as the
first commit's parent (for grafting)
" "
case "$1" in case "$1" in
-h|--help) -h|--help)
@@ -91,6 +96,14 @@ case "$1" in
echo "" echo ""
echo "$LONG_USAGE" echo "$LONG_USAGE"
exit 0 exit 0
;;
--debug)
echo -n "Using Python: "
"${PYTHON}" --version
echo -n "Using Mercurial: "
hg --version
exit 0
esac esac
IS_BARE=$(git rev-parse --is-bare-repository) \ IS_BARE=$(git rev-parse --is-bare-repository) \

View File

@@ -1,4 +1,4 @@
#!/usr/bin/env python #!/usr/bin/env python3
# Copyright (c) 2007, 2008 Rocco Rutte <pdmef@gmx.net> and others. # Copyright (c) 2007, 2008 Rocco Rutte <pdmef@gmx.net> and others.
# License: GPLv2 # License: GPLv2
@@ -7,6 +7,7 @@ from mercurial import node
from hg2git import setup_repo,load_cache,get_changeset,get_git_sha1 from hg2git import setup_repo,load_cache,get_changeset,get_git_sha1
from optparse import OptionParser from optparse import OptionParser
import sys import sys
from binascii import hexlify
def heads(ui,repo,start=None,stop=None,max=None): def heads(ui,repo,start=None,stop=None,max=None):
# this is copied from mercurial/revlog.py and differs only in # this is copied from mercurial/revlog.py and differs only in
@@ -24,7 +25,7 @@ def heads(ui,repo,start=None,stop=None,max=None):
heads = {startrev: 1} heads = {startrev: 1}
parentrevs = repo.changelog.parentrevs parentrevs = repo.changelog.parentrevs
for r in xrange(startrev + 1, max): for r in range(startrev + 1, max):
for p in parentrevs(r): for p in parentrevs(r):
if p in reachable: if p in reachable:
if r not in stoprevs: if r not in stoprevs:
@@ -33,7 +34,7 @@ def heads(ui,repo,start=None,stop=None,max=None):
if p in heads and p not in stoprevs: if p in heads and p not in stoprevs:
del heads[p] del heads[p]
return [(repo.changelog.node(r),str(r)) for r in heads] return [(repo.changelog.node(r), b"%d" % r) for r in heads]
def get_branches(ui,repo,heads_cache,marks_cache,mapping_cache,max): def get_branches(ui,repo,heads_cache,marks_cache,mapping_cache,max):
h=heads(ui,repo,max=max) h=heads(ui,repo,max=max)
@@ -44,11 +45,11 @@ def get_branches(ui,repo,heads_cache,marks_cache,mapping_cache,max):
_,_,user,(_,_),_,desc,branch,_=get_changeset(ui,repo,rev) _,_,user,(_,_),_,desc,branch,_=get_changeset(ui,repo,rev)
del stale[branch] del stale[branch]
git_sha1=get_git_sha1(branch) git_sha1=get_git_sha1(branch)
cache_sha1=marks_cache.get(str(int(rev)+1)) cache_sha1=marks_cache.get(b"%d" % (int(rev)+1))
if git_sha1!=None and git_sha1==cache_sha1: if git_sha1!=None and git_sha1==cache_sha1:
unchanged.append([branch,cache_sha1,rev,desc.split('\n')[0],user]) unchanged.append([branch,cache_sha1,rev,desc.split(b'\n')[0],user])
else: else:
changed.append([branch,cache_sha1,rev,desc.split('\n')[0],user]) changed.append([branch,cache_sha1,rev,desc.split(b'\n')[0],user])
changed.sort() changed.sort()
unchanged.sort() unchanged.sort()
return stale,changed,unchanged return stale,changed,unchanged
@@ -57,20 +58,20 @@ def get_tags(ui,repo,marks_cache,mapping_cache,max):
l=repo.tagslist() l=repo.tagslist()
good,bad=[],[] good,bad=[],[]
for tag,node in l: for tag,node in l:
if tag=='tip': continue if tag==b'tip': continue
rev=int(mapping_cache[node.encode('hex_codec')]) rev=int(mapping_cache[hexlify(node)])
cache_sha1=marks_cache.get(str(int(rev)+1)) cache_sha1=marks_cache.get(b"%d" % (int(rev)+1))
_,_,user,(_,_),_,desc,branch,_=get_changeset(ui,repo,rev) _,_,user,(_,_),_,desc,branch,_=get_changeset(ui,repo,rev)
if int(rev)>int(max): if int(rev)>int(max):
bad.append([tag,branch,cache_sha1,rev,desc.split('\n')[0],user]) bad.append([tag,branch,cache_sha1,rev,desc.split(b'\n')[0],user])
else: else:
good.append([tag,branch,cache_sha1,rev,desc.split('\n')[0],user]) good.append([tag,branch,cache_sha1,rev,desc.split(b'\n')[0],user])
good.sort() good.sort()
bad.sort() bad.sort()
return good,bad return good,bad
def mangle_mark(mark): def mangle_mark(mark):
return str(int(mark)-1) return b"%d" % (int(mark)-1)
if __name__=='__main__': if __name__=='__main__':
def bail(parser,opt): def bail(parser,opt):
@@ -107,7 +108,7 @@ if __name__=='__main__':
state_cache=load_cache(options.statusfile) state_cache=load_cache(options.statusfile)
mapping_cache = load_cache(options.mappingfile) mapping_cache = load_cache(options.mappingfile)
l=int(state_cache.get('tip',options.revision)) l=int(state_cache.get(b'tip',options.revision))
if options.revision+1>l: if options.revision+1>l:
sys.stderr.write('Revision is beyond last revision imported: %d>%d\n' % (options.revision,l)) sys.stderr.write('Revision is beyond last revision imported: %d>%d\n' % (options.revision,l))
sys.exit(1) sys.exit(1)
@@ -117,19 +118,39 @@ if __name__=='__main__':
stale,changed,unchanged=get_branches(ui,repo,heads_cache,marks_cache,mapping_cache,options.revision+1) stale,changed,unchanged=get_branches(ui,repo,heads_cache,marks_cache,mapping_cache,options.revision+1)
good,bad=get_tags(ui,repo,marks_cache,mapping_cache,options.revision+1) good,bad=get_tags(ui,repo,marks_cache,mapping_cache,options.revision+1)
print "Possibly stale branches:" print("Possibly stale branches:")
map(lambda b: sys.stdout.write('\t%s\n' % b),stale.keys()) for b in stale:
sys.stdout.write('\t%s\n' % b.decode('utf8'))
print "Possibly stale tags:" print("Possibly stale tags:")
map(lambda b: sys.stdout.write('\t%s on %s (r%s)\n' % (b[0],b[1],b[3])),bad) for b in bad:
sys.stdout.write(
'\t%s on %s (r%s)\n'
% (b[0].decode('utf8'), b[1].decode('utf8'), b[3].decode('utf8'))
)
print "Unchanged branches:" print("Unchanged branches:")
map(lambda b: sys.stdout.write('\t%s (r%s)\n' % (b[0],b[2])),unchanged) for b in unchanged:
sys.stdout.write('\t%s (r%s)\n' % (b[0].decode('utf8'),b[2].decode('utf8')))
print "Unchanged tags:" print("Unchanged tags:")
map(lambda b: sys.stdout.write('\t%s on %s (r%s)\n' % (b[0],b[1],b[3])),good) for b in good:
sys.stdout.write(
'\t%s on %s (r%s)\n'
% (b[0].decode('utf8'), b[1].decode('utf8'), b[3].decode('utf8'))
)
print "Reset branches in '%s' to:" % options.headsfile print("Reset branches in '%s' to:" % options.headsfile)
map(lambda b: sys.stdout.write('\t:%s %s\n\t\t(r%s: %s: %s)\n' % (b[0],b[1],b[2],b[4],b[3])),changed) for b in changed:
sys.stdout.write(
'\t:%s %s\n\t\t(r%s: %s: %s)\n'
% (
b[0].decode('utf8'),
b[1].decode('utf8'),
b[2].decode('utf8'),
b[4].decode('utf8'),
b[3].decode('utf8'),
)
)
print "Reset ':tip' in '%s' to '%d'" % (options.statusfile,options.revision) print("Reset ':tip' in '%s' to '%d'" % (options.statusfile,options.revision))

View File

@@ -11,7 +11,24 @@ SFX_MAPPING="mapping"
SFX_HEADS="heads" SFX_HEADS="heads"
SFX_STATE="state" SFX_STATE="state"
QUIET="" QUIET=""
PYTHON=${PYTHON:-python}
if [ -z "${PYTHON}" ]; then
# $PYTHON is not set, so we try to find a working python with mercurial:
for python_cmd in python2 python python3; do
if command -v $python_cmd > /dev/null; then
$python_cmd -c 'import mercurial' 2> /dev/null
if [ $? -eq 0 ]; then
PYTHON=$python_cmd
break
fi
fi
done
fi
if [ -z "${PYTHON}" ]; then
echo "Could not find a python interpreter with the mercurial module available. " \
"Please use the 'PYTHON'environment variable to specify the interpreter to use."
exit 1
fi
USAGE="[-r <repo>] -R <rev>" USAGE="[-r <repo>] -R <rev>"
LONG_USAGE="Print SHA1s of latest changes per branch up to <rev> useful LONG_USAGE="Print SHA1s of latest changes per branch up to <rev> useful

View File

@@ -1,11 +1,11 @@
#!/usr/bin/env python2 #!/usr/bin/env python3
# Copyright (c) 2007, 2008 Rocco Rutte <pdmef@gmx.net> and others. # Copyright (c) 2007, 2008 Rocco Rutte <pdmef@gmx.net> and others.
# License: MIT <http://www.opensource.org/licenses/mit-license.php> # License: MIT <http://www.opensource.org/licenses/mit-license.php>
from mercurial import hg,util,ui,templatefilters from mercurial import hg,util,ui,templatefilters
from mercurial import error as hgerror from mercurial import error as hgerror
from mercurial.scmutil import revsymbol,binnode from mercurial.scmutil import binnode
import re import re
import os import os
@@ -13,47 +13,55 @@ import sys
import subprocess import subprocess
# default git branch name # default git branch name
cfg_master='master' cfg_master=b'master'
# default origin name # default origin name
origin_name='' origin_name=b''
# silly regex to see if user field has email address # silly regex to see if user field has email address
user_re=re.compile('([^<]+) (<[^>]*>)$') user_re=re.compile(b'([^<]+) (<[^>]*>)$')
# silly regex to clean out user names # silly regex to clean out user names
user_clean_re=re.compile('^["]([^"]+)["]$') user_clean_re=re.compile(b'^["]([^"]+)["]$')
def set_default_branch(name): def set_default_branch(name):
global cfg_master global cfg_master
cfg_master = name cfg_master = name.encode('utf8')
def set_origin_name(name): def set_origin_name(name):
global origin_name global origin_name
origin_name = name origin_name = name.encode('utf8')
def setup_repo(url): def setup_repo(url):
try:
# Mercurial >= 7.2 requires explicit initialization for largefile
# support to work.
from mercurial import initialization
initialization.init()
except ImportError:
pass
try: try:
myui=ui.ui(interactive=False) myui=ui.ui(interactive=False)
except TypeError: except TypeError:
myui=ui.ui() myui=ui.ui()
myui.setconfig('ui', 'interactive', 'off') myui.setconfig(b'ui', b'interactive', b'off')
# Avoids a warning when the repository has obsolete markers # Avoids a warning when the repository has obsolete markers
myui.setconfig('experimental', 'evolution.createmarkers', True) myui.setconfig(b'experimental', b'evolution.createmarkers', True)
return myui,hg.repository(myui,url).unfiltered() return myui,hg.repository(myui, os.fsencode(url)).unfiltered()
def fixup_user(user,authors): def fixup_user(user,authors):
user=user.strip("\"") user=user.strip(b"\"")
if authors!=None: if authors!=None:
# if we have an authors table, try to get mapping # if we have an authors table, try to get mapping
# by defaulting to the current value of 'user' # by defaulting to the current value of 'user'
user=authors.get(user,user) user=authors.get(user,user)
name,mail,m='','',user_re.match(user) name,mail,m=b'',b'',user_re.match(user)
if m==None: if m==None:
# if we don't have 'Name <mail>' syntax, extract name # if we don't have 'Name <mail>' syntax, extract name
# and mail from hg helpers. this seems to work pretty well. # and mail from hg helpers. this seems to work pretty well.
# if email doesn't contain @, replace it with devnull@localhost # if email doesn't contain @, replace it with devnull@localhost
name=templatefilters.person(user) name=templatefilters.person(user)
mail='<%s>' % templatefilters.email(user) mail=b'<%s>' % templatefilters.email(user)
if '@' not in mail: if b'@' not in mail:
mail = '<devnull@localhost>' mail = b'<devnull@localhost>'
else: else:
# if we have 'Name <mail>' syntax, everything is fine :) # if we have 'Name <mail>' syntax, everything is fine :)
name,mail=m.group(1),m.group(2) name,mail=m.group(1),m.group(2)
@@ -62,34 +70,25 @@ def fixup_user(user,authors):
m2=user_clean_re.match(name) m2=user_clean_re.match(name)
if m2!=None: if m2!=None:
name=m2.group(1) name=m2.group(1)
return '%s %s' % (name,mail) return b'%s %s' % (name,mail)
def get_branch(name): def get_branch(name):
# 'HEAD' is the result of a bug in mutt's cvs->hg conversion, # 'HEAD' is the result of a bug in mutt's cvs->hg conversion,
# other CVS imports may need it, too # other CVS imports may need it, too
if name=='HEAD' or name=='default' or name=='': if name==b'HEAD' or name==b'default' or name==b'':
name=cfg_master name=cfg_master
if origin_name: if origin_name:
return origin_name + '/' + name return origin_name + b'/' + name
return name return name
def get_changeset(ui,repo,revision,authors={},encoding=''): def get_changeset(ui,repo,revision,authors={},encoding=''):
# Starting with Mercurial 4.6 lookup no longer accepts raw hashes (manifest,user,(time,timezone),files,desc,extra)=repo.changelog.read(revision)
# for lookups. Work around it by changing our behaviour depending on
# how it fails
try:
node=repo.lookup(revision)
except hgerror.ProgrammingError:
node=binnode(revsymbol(repo,str(revision))) # We were given a numeric rev
except hgerror.RepoLookupError:
node=revision # We got a raw hash
(manifest,user,(time,timezone),files,desc,extra)=repo.changelog.read(node)
if encoding: if encoding:
user=user.decode(encoding).encode('utf8') user=user.decode(encoding).encode('utf8')
desc=desc.decode(encoding).encode('utf8') desc=desc.decode(encoding).encode('utf8')
tz="%+03d%02d" % (-timezone / 3600, ((-timezone % 3600) / 60)) tz=b"%+03d%02d" % (-timezone // 3600, ((-timezone % 3600) // 60))
branch=get_branch(extra.get('branch','master')) branch=get_branch(extra.get(b'branch', b''))
return (node,manifest,fixup_user(user,authors),(time,tz),files,desc,branch,extra) return (manifest,fixup_user(user,authors),(time,tz),files,desc,branch,extra)
def mangle_key(key): def mangle_key(key):
return key return key
@@ -98,29 +97,35 @@ def load_cache(filename,get_key=mangle_key):
cache={} cache={}
if not os.path.exists(filename): if not os.path.exists(filename):
return cache return cache
f=open(filename,'r') f=open(filename,'rb')
l=0 l=0
for line in f.readlines(): for line in f.readlines():
l+=1 l+=1
fields=line.split(' ') fields=line.split(b' ')
if fields==None or not len(fields)==2 or fields[0][0]!=':': if fields==None or not len(fields)==2 or fields[0][0:1]!=b':':
sys.stderr.write('Invalid file format in [%s], line %d\n' % (filename,l)) sys.stderr.write('Invalid file format in [%s], line %d\n' % (filename,l))
continue continue
# put key:value in cache, key without ^: # put key:value in cache, key without ^:
cache[get_key(fields[0][1:])]=fields[1].split('\n')[0] cache[get_key(fields[0][1:])]=fields[1].split(b'\n')[0]
f.close() f.close()
return cache return cache
def save_cache(filename,cache): def save_cache(filename,cache):
f=open(filename,'w+') f=open(filename,'wb')
map(lambda x: f.write(':%s %s\n' % (str(x),str(cache.get(x)))),cache.keys()) for key, value in cache.items():
if not isinstance(key, bytes):
key = str(key).encode('utf8')
if not isinstance(value, bytes):
value = str(value).encode('utf8')
f.write(b':%s %s\n' % (key, value))
f.close() f.close()
def get_git_sha1(name,type='heads'): def get_git_sha1(name,type='heads'):
try: try:
# use git-rev-parse to support packed refs # use git-rev-parse to support packed refs
ref="refs/%s/%s" % (type,name) ref="refs/%s/%s" % (type,name.decode('utf8'))
l=subprocess.check_output(["git", "rev-parse", "--verify", "--quiet", ref]) l=subprocess.check_output(["git", "rev-parse", "--verify",
"--quiet", ref.encode('utf8')])
if l == None or len(l) == 0: if l == None or len(l) == 0:
return None return None
return l[0:40] return l[0:40]

View File

@@ -1,19 +1,23 @@
import os import os
import imp import importlib.machinery
import importlib.util
PluginFolder = os.path.join(os.path.dirname(os.path.realpath(__file__)),"..","plugins") PluginFolder = os.path.join(os.path.dirname(os.path.realpath(__file__)),"..","plugins")
MainModule = "__init__" MainModule = "__init__"
def get_plugin(name, plugin_path): def get_plugin(name, plugin_path):
search_dirs = [PluginFolder] search_dirs = [PluginFolder, '.']
if plugin_path: if plugin_path:
search_dirs = [plugin_path] + search_dirs search_dirs = [plugin_path] + search_dirs
for dir in search_dirs: for dir in search_dirs:
location = os.path.join(dir, name) location = os.path.join(dir, name)
if not os.path.isdir(location) or not MainModule + ".py" in os.listdir(location): if not os.path.isdir(location) or not MainModule + ".py" in os.listdir(location):
continue continue
info = imp.find_module(MainModule, [location]) spec = importlib.machinery.PathFinder.find_spec(MainModule, [location])
return {"name": name, "info": info, "path": location} return {"name": name, "spec": spec, "path": location}
raise Exception("Could not find plugin with name " + name) raise Exception("Could not find plugin with name " + name)
def load_plugin(plugin): def load_plugin(plugin):
return imp.load_module(MainModule, *plugin["info"]) spec = plugin["spec"]
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
return module

View File

@@ -15,9 +15,11 @@ class Filter:
raise ValueError("Unknown args: " + ','.join(args)) raise ValueError("Unknown args: " + ','.join(args))
def commit_message_filter(self, commit_data): def commit_message_filter(self, commit_data):
if not (self.skip_master and commit_data['branch'] == 'master'): if not (self.skip_master and commit_data['branch'] == b'master'):
if self.start: if self.start:
sep = ': ' if self.sameline else '\n' sep = b': ' if self.sameline else b'\n'
commit_data['desc'] = commit_data['branch'] + sep + commit_data['desc'] commit_data['desc'] = commit_data['branch'] + sep + commit_data['desc']
if self.end: if self.end:
commit_data['desc'] = commit_data['desc'] + '\n' + commit_data['branch'] commit_data['desc'] = (
commit_data['desc'] + b'\n' + commit_data['branch']
)

View File

@@ -6,6 +6,8 @@ class Filter():
pass pass
def file_data_filter(self,file_data): def file_data_filter(self,file_data):
if file_data['file_ctx'] == None:
return
file_ctx = file_data['file_ctx'] file_ctx = file_data['file_ctx']
if not file_ctx.isbinary(): if not file_ctx.isbinary():
file_data['data'] = file_data['data'].replace('\r\n', '\n') file_data['data'] = file_data['data'].replace(b'\r\n', b'\n')

12
plugins/drop/README.md Normal file
View File

@@ -0,0 +1,12 @@
## Drop commits from output
To use the plugin, add the command line flag `--plugin drop=<spec>`.
The flag can be given multiple times to drop more than one commit.
The <spec> value can be either
- a comma-separated list of hg hashes in the full form (40
hexadecimal characters) to drop the corresponding changesets, or
- a regular expression pattern to drop all changesets with matching
descriptions.

61
plugins/drop/__init__.py Normal file
View File

@@ -0,0 +1,61 @@
from __future__ import print_function
import sys, re
def build_filter(args):
if re.match(r'([A-Fa-f0-9]{40}(,|$))+$', args):
return RevisionIdFilter(args.split(','))
else:
return DescriptionFilter(args)
def log(fmt, *args):
print(fmt % args, file=sys.stderr)
sys.stderr.flush()
class FilterBase(object):
def __init__(self):
self.remapped_parents = {}
def commit_message_filter(self, commit_data):
rev = commit_data['revision']
mapping = self.remapped_parents
parent_revs = [rp for p in commit_data['parents']
for rp in mapping.get(p, [p])]
commit_data['parents'] = parent_revs
if self.should_drop_commit(commit_data):
log('Dropping revision %i.', rev)
self.remapped_parents[rev] = parent_revs
# Head commits cannot be dropped because they have no
# children, so detach them to a separate branch.
commit_data['branch'] = b'dropped-hg-head'
commit_data['parents'] = []
def should_drop_commit(self, commit_data):
return False
class RevisionIdFilter(FilterBase):
def __init__(self, revision_hash_list):
super(RevisionIdFilter, self).__init__()
self.unwanted_hg_hashes = {h.encode('ascii', 'strict')
for h in revision_hash_list}
def should_drop_commit(self, commit_data):
return commit_data['hg_hash'] in self.unwanted_hg_hashes
class DescriptionFilter(FilterBase):
def __init__(self, pattern):
super(DescriptionFilter, self).__init__()
self.pattern = re.compile(pattern.encode('ascii', 'strict'))
def should_drop_commit(self, commit_data):
return self.pattern.match(commit_data['desc'])

View File

@@ -0,0 +1,218 @@
# git_lfs_importer Plugin
This plugin automatically converts matching files to use Git LFS
(Large File Storage) during the Mercurial to Git conversion process.
## Overview
The git_lfs_importer plugin intercepts file data during the hg-fast-export
process and converts files matching specified patterns into Git LFS pointers.
This allows you to seamlessly migrate a Mercurial repository to Git while
simultaneously adopting LFS for large files.
Why use git_lfs_importer?
For large repositories, traditional migration requires two sequential,
long-running steps:
1. Full history conversion from Mercurial to Git.
2. Full history rewrite using git lfs import.
This two-step process can take hours or even days for massive
monorepos (e.g., 100GiB+).
This plugin eliminates the second, time-consuming history rewrite. It performs
the LFS conversion incrementally (Just-In-Time). During the initial export, the
plugin identifies large files and immediately writes LFS pointers into the Git
history. This results in significantly faster conversions and allows for
efficient incremental imports of new changesets.
## Prerequisites
### Dependencies
This plugin requires the `pathspec` package:
```bash
pip install pathspec
```
### Git Repository Setup
The destination Git repository must be pre-initialized with:
1. A `.gitattributes` file configured for LFS tracking
2. Git LFS properly installed and initialized
Example `.gitattributes`:
```
*.bin filter=lfs diff=lfs merge=lfs -text
*.iso filter=lfs diff=lfs merge=lfs -text
large_files/** filter=lfs diff=lfs merge=lfs -text
```
## Usage
### Step 1: Create the Destination Git Repository
```bash
# Create a new git repository
git init my-repo
cd my-repo
# Initialize Git LFS
git lfs install
# Create and commit a .gitattributes file
cat > .gitattributes << EOF
*.bin binary diff=lfs merge=lfs -text
*.iso binary diff=lfs merge=lfs -text
EOF
git add .gitattributes
git commit -m "Initialize Git LFS configuration"
# Get the commit hash (needed for --first-commit-hash)
git rev-parse HEAD
```
### Step 2: Create an LFS Specification File
Create a file (e.g., `lfs-spec.txt`) listing the patterns of files to convert
to LFS. This uses gitignore-style glob patterns:
```
*.bin
*.iso
*.tar.gz
large_files/**
*.mp4
```
### Step 3: Run hg-fast-export with the Plugin
```bash
hg-fast-export.sh \
-r <mercurial-repo-path> \
--plugin git_lfs_importer=lfs-spec.txt \
--first-commit-hash <git-commit-hash> \
--force
```
Replace `<git-commit-hash>` with the hash obtained from Step 1.
## How It Works
1. **Pattern Matching**: Files are matched against patterns in the
LFS specification file using gitignore-style matching
2. **File Processing**: For each matching file:
- Calculates SHA256 hash of the file content
- Stores the actual file content in `.git/lfs/objects/<hash-prefix>/<hash>`
- Replaces the file data with an LFS pointer containing:
- LFS version specification
- SHA256 hash of the original content
- Original file size
3. **Git Fast-Import**: The LFS pointer is committed instead of the actual
file content
## Important Notes
### First Commit Hash Requirement
The `--first-commit-hash` option must be provided with the Git commit hash that
contains your `.gitattributes` file. This allows the plugin to chain from the
existing Git history rather than creating a completely new history.
### Deletions
The plugin safely handles file deletions (data=None) and does not process them.
### Large Files and Largefiles
If the Mercurial repository uses Mercurial's largefiles extension, those files
are already converted to their original content before reaching this plugin,
allowing the plugin to apply LFS conversion if they match the patterns.
## Example Workflow
```bash
# Configuration variables
HG_REPO=/path/to/mercurial/repo
GIT_DIR_NAME=my-project-git
LFS_PATTERN_FILE=../lfs-patterns.txt
# 1. Prepare destination git repo
mkdir "$GIT_DIR_NAME"
cd "$GIT_DIR_NAME"
git init
git lfs install
# Create .gitattributes
cat > .gitattributes << EOF
*.bin filter=lfs diff=lfs merge=lfs -text
*.iso filter=lfs diff=lfs merge=lfs -text
EOF
git add .gitattributes
git commit -m "Add LFS configuration"
FIRST_HASH=$(git rev-parse HEAD)
# 2. Create LFS patterns file
cat > "$LFS_PATTERN_FILE" << EOF
*.bin
*.iso
build/artifacts/**
EOF
# 3. Run conversion
/path/to/hg-fast-export.sh \
-r "$HG_REPO" \
--plugin "git_lfs_importer=$LFS_PATTERN_FILE" \
--first-commit-hash $FIRST_HASH \
--force
# 4. Verify
git log --oneline
git lfs ls-files
```
## Troubleshooting
### LFS Files Not Tracked
Verify that:
- The `.gitattributes` file exists in the destination repository
- Patterns in `.gitattributes` match the files being converted
- `git lfs install` was run in the repository
### "pathspec" Module Not Found
Install the required dependency:
```bash
pip install pathspec
```
### Conversion Fails at Import
Ensure the `--first-commit-hash` value is:
- A valid commit hash in the destination repository
- From a commit that exists before the conversion starts
- The hash of the commit containing `.gitattributes`
### Force Requirement
You only need to pass the `--force` option when converting the *first*
Mercurial commit into a non-empty Git repository. By default, `hg-fast-export`
prevents importing Mercurial commits onto a non-empty Git repo to avoid
creating conflicting histories. Passing `--force` overrides that safety check
and allows the exporter to write the LFS pointer objects and integrate the
converted data with the existing Git history.
If you are doing an incremental conversion (i.e., running the script a second
time to import new changesets into an already converted repository),
the --force flag is not required.
Omitting `--force` when attempting to import the first Mercurial commit into a
non-empty repository will cause the importer to refuse the operation.
## See Also
- [Git LFS Documentation](https://git-lfs.github.com/)
- [gitignore Pattern Format](https://git-scm.com/docs/gitignore)
- [hg-fast-export Documentation](../README.md)

View File

@@ -0,0 +1,49 @@
import pathlib
import hashlib
import pathspec
def build_filter(args):
with open(args) as f:
lfs_spec = pathspec.PathSpec.from_lines(pathspec.patterns.GitWildMatchPattern, f)
return Filter(lfs_spec)
class Filter:
def __init__(self, lfs_spec):
self.lfs_spec = lfs_spec
def file_data_filter(self, file_data):
"""
file_data: {
'filename': <str>,
'file_ctx': <mercurial.filectx or None>,
'data': <bytes or None>,
'is_largefile': <bool>
}
May be called for deletions (data=None, file_ctx=None).
"""
filename = file_data.get('filename')
data = file_data.get('data')
# Skip deletions or filtered files early
if data is None or not self.lfs_spec.match_file(filename.decode("utf-8")):
return
# Get the file path
sha256hash = hashlib.sha256(data).hexdigest()
lfs_path = pathlib.Path(f".git/lfs/objects/{sha256hash[0:2]}/{sha256hash[2:4]}")
lfs_path.mkdir(parents=True, exist_ok=True)
lfs_file_path = lfs_path / sha256hash
# The binary blob is already in LFS
if not lfs_file_path.is_file():
(lfs_path / sha256hash).write_bytes(data)
# Write the LFS pointer
file_data['data'] = (
f"version https://git-lfs.github.com/spec/v1\n"
f"oid sha256:{sha256hash}\n"
f"size {len(data)}\n"
).encode("utf-8")

View File

@@ -0,0 +1,13 @@
## Convert Head to Branch
`fast-export` can only handle one head per branch. This plugin makes it possible
to create a new branch from a head by specifying the new branch name and
the first divergent commit for that head.
Note: the hg hash must be in the full form, 40 hexadecimal characters.
Note: you must run `fast-export` with `--ignore-unnamed-heads` option,
otherwise, the conversion will fail.
To use the plugin, add the command line flag `--plugin head2branch=name,<hg_hash>`.
The flag can be given multiple times to name more than one head.

View File

@@ -0,0 +1,24 @@
import sys
def build_filter(args):
return Filter(args)
class Filter:
def __init__(self, args):
args = args.split(',')
self.branch_name = args[0].encode('ascii', 'replace')
self.starting_commit_hash = args[1].encode('ascii', 'strict')
self.branch_parents = set()
def commit_message_filter(self, commit_data):
hg_hash = commit_data['hg_hash']
rev = commit_data['revision']
rev_parents = commit_data['parents']
if (hg_hash == self.starting_commit_hash
or any(rp in self.branch_parents for rp in rev_parents)
):
self.branch_parents.add(rev)
commit_data['branch'] = self.branch_name
sys.stderr.write('\nchanging r%s to branch %r\n' % (rev, self.branch_name))
sys.stderr.flush()

View File

@@ -7,9 +7,11 @@ def build_filter(args):
class Filter: class Filter:
def __init__(self, args): def __init__(self, args):
if not isinstance(args, bytes):
args = args.encode('utf8')
self.prefix = args self.prefix = args
def commit_message_filter(self, commit_data): def commit_message_filter(self, commit_data):
for match in re.findall('#[1-9][0-9]+', commit_data['desc']): for match in re.findall(b'#[1-9][0-9]+', commit_data['desc']):
commit_data['desc'] = commit_data['desc'].replace( commit_data['desc'] = commit_data['desc'].replace(
match, '#%s%s' % (self.prefix, match[1:])) match, b'#%s%s' % (self.prefix, match[1:]))

View File

@@ -4,13 +4,13 @@ def build_filter(args):
class Filter: class Filter:
def __init__(self, args): def __init__(self, args):
if args == '': if args == '':
message = '<empty commit message>' message = b'<empty commit message>'
else: else:
message = args message = args.encode('utf8')
self.message = message self.message = message
def commit_message_filter(self,commit_data): def commit_message_filter(self,commit_data):
# Only write the commit message if the recorded commit # Only write the commit message if the recorded commit
# message is null. # message is null.
if commit_data['desc'] == '\x00': if commit_data['desc'] == b'\x00':
commit_data['desc'] = self.message commit_data['desc'] = self.message

View File

@@ -15,6 +15,8 @@ class Filter:
d = file_data['data'] d = file_data['data']
file_ctx = file_data['file_ctx'] file_ctx = file_data['file_ctx']
filename = file_data['filename'] filename = file_data['filename']
if file_ctx == None:
return
filter_cmd = self.filter_contents + [filename, node.hex(file_ctx.filenode()), '1' if file_ctx.isbinary() else '0'] filter_cmd = self.filter_contents + [filename, node.hex(file_ctx.filenode()), '1' if file_ctx.isbinary() else '0']
try: try:
filter_proc = subprocess.Popen(filter_cmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE) filter_proc = subprocess.Popen(filter_cmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE)

1
t/.gitignore vendored Normal file
View File

@@ -0,0 +1 @@
/test-results/

12
t/Makefile Normal file
View File

@@ -0,0 +1,12 @@
T = $(wildcard *.t)
test: $(T)
@$(MAKE) --silent clean
$(T): clean
./$@ $(TEST_OPTS)
clean:
@rm -fr test-results
.PHONY: test $(T) clean

View File

@@ -0,0 +1,30 @@
blob
mark :1
data 7
good_a
reset refs/heads/master
commit refs/heads/master
mark :2
author Grevious Bodily Harmsworth <gbh@example.com> 1679014800 +0000
committer Grevious Bodily Harmsworth <gbh@example.com> 1679014800 +0000
data 3
r0
M 100644 :1 good_a.txt
commit refs/heads/master
mark :3
author Grevious Bodily Harmsworth <gbh@example.com> 1679018400 +0000
committer Grevious Bodily Harmsworth <gbh@example.com> 1679018400 +0000
data 3
r1
from :2
commit refs/heads/master
mark :4
author Grevious Bodily Harmsworth <gbh@example.com> 1679022000 +0000
committer Grevious Bodily Harmsworth <gbh@example.com> 1679022000 +0000
data 3
r2
from :3

View File

@@ -0,0 +1,91 @@
#!/bin/bash
#
# Copyright (c) 2023 Felipe Contreras
# Copyright (c) 2023 Frej Drejhammar
# Copyright (c) 2024 Stephan Hohe
#
# Check that files that file_data_filter sets to None are removed from repository
#
test_description='Remove files from file_data_filter plugin test'
. "${SHARNESS_TEST_SRCDIR-$(dirname "$0")/sharness}"/sharness.sh || exit 1
check() {
echo "$3" > expected &&
git -C "$1" show -q --format='%s' "$2" > actual &&
test_cmp expected actual
}
git_create() {
git init -q "$1" &&
git -C "$1" config core.ignoreCase false
}
git_convert() {
(
cd "$2" &&
hg-fast-export.sh --repo "../$1" \
-s --hgtags -n \
--plugin ../../plugins/removefiles_test_plugin
)
}
setup() {
cat > "$HOME"/.hgrc <<-EOF
[ui]
username = Grevious Bodily Harmsworth <gbh@example.com>
EOF
}
commit0() {
(
# Test inital revision with suppressed file
cd hgrepo &&
echo "good_a" > good_a.txt &&
echo "bad_a" > bad_a.txt &&
hg add good_a.txt bad_a.txt &&
hg commit -d "2023-03-17 01:00Z" -m "r0"
)
}
commit1() {
(
# Test modifying suppressed file
# Test adding suppressed file
cd hgrepo &&
echo "bad_a_modif" > bad_a.txt &&
echo "bad_b" > bad_b.txt &&
hg add bad_b.txt &&
hg commit -d "2023-03-17 02:00Z" -m "r1"
)
}
commit2() {
(
# Test removing suppressed file
cd hgrepo &&
hg rm bad_a.txt &&
hg commit -d "2023-03-17 03:00Z" -m "r2"
)
}
setup
test_expect_success 'all in one' '
test_when_finished "rm -rf hgrepo gitrepo" &&
(
hg init hgrepo &&
commit0 &&
commit1 &&
commit2
) &&
git_create gitrepo &&
git_convert hgrepo gitrepo &&
git -C gitrepo fast-export --all > actual &&
test_cmp "$SHARNESS_TEST_DIRECTORY"/file_data_filter-removefiles.expected actual
'
test_done

View File

@@ -0,0 +1,29 @@
blob
mark :1
data 7
a_file
blob
mark :2
data 17
a_file_to_rename
reset refs/heads/master
commit refs/heads/master
mark :3
author Grevious Bodily Harmsworth <gbh@example.com> 1679014800 +0000
committer Grevious Bodily Harmsworth <gbh@example.com> 1679014800 +0000
data 3
r0
M 100644 :1 a.txt
M 100644 :2 c.txt
commit refs/heads/master
mark :4
author Grevious Bodily Harmsworth <gbh@example.com> 1679018400 +0000
committer Grevious Bodily Harmsworth <gbh@example.com> 1679018400 +0000
data 3
r1
from :3
D c.txt

84
t/file_data_filter.t Executable file
View File

@@ -0,0 +1,84 @@
#!/bin/bash
#
# Copyright (c) 2023 Felipe Contreras
# Copyright (c) 2023 Frej Drejhammar
#
# Check that the file_data_filter is called for removed files.
#
test_description='Smoke test'
. "${SHARNESS_TEST_SRCDIR-$(dirname "$0")/sharness}"/sharness.sh || exit 1
check() {
echo "$3" > expected &&
git -C "$1" show -q --format='%s' "$2" > actual &&
test_cmp expected actual
}
git_create() {
git init -q "$1" &&
git -C "$1" config core.ignoreCase false
}
git_convert() {
(
cd "$2" &&
hg-fast-export.sh --repo "../$1" \
-s --hgtags -n \
--plugin ../../plugins/rename_file_test_plugin \
--plugin dos2unix \
--plugin shell_filter_file_contents=../../plugins/id
)
}
setup() {
cat > "$HOME"/.hgrc <<-EOF
[ui]
username = Grevious Bodily Harmsworth <gbh@example.com>
EOF
}
commit0() {
(
cd hgrepo &&
echo "a_file" > a.txt &&
echo "a_file_to_rename" > b.txt &&
hg add a.txt b.txt &&
hg commit -d "2023-03-17 01:00Z" -m "r0"
)
}
commit1() {
(
cd hgrepo &&
hg remove b.txt &&
hg commit -d "2023-03-17 02:00Z" -m "r1"
)
}
make-branch() {
hg branch "$1"
FILE=$(echo "$1" | sha1sum | cut -d " " -f 1)
echo "$1" > $FILE
hg add $FILE
hg commit -d "2023-03-17 $2:00Z" -m "Added file in branch $1"
}
setup
test_expect_success 'all in one' '
test_when_finished "rm -rf hgrepo gitrepo" &&
(
hg init hgrepo &&
commit0 &&
commit1
) &&
git_create gitrepo &&
git_convert hgrepo gitrepo &&
git -C gitrepo fast-export --all > actual &&
test_cmp "$SHARNESS_TEST_DIRECTORY"/file_data_filter.expected actual
'
test_done

117
t/first_commit_hash_option.t Executable file
View File

@@ -0,0 +1,117 @@
#!/bin/bash
#
# Copyright (c) 2025
#
test_description='git_lfs_importer plugin integration tests'
. "${SHARNESS_TEST_SRCDIR-$(dirname "$0")/sharness}"/sharness.sh || exit 1
setup() {
cat > "$HOME"/.hgrc <<-EOF
[ui]
username = Test User <test@example.com>
EOF
# Git config for the destination repo commits
git config --global user.email "test@example.com"
git config --global user.name "Test User"
}
setup
test_expect_success 'Mercurial history is imported over the provided commit' '
test_when_finished "rm -rf hgrepo gitrepo lfs-patterns.txt" &&
# 1. Create source Mercurial repository with binary files
(
hg init hgrepo &&
cd hgrepo &&
echo "regular text file" > readme.txt &&
hg add readme.txt &&
hg commit -m "initial commit"
) &&
# 2. Prepare destination git repo with LFS setup
mkdir gitrepo &&
(
cd gitrepo &&
git init -q &&
git config core.ignoreCase false &&
git lfs install --local &&
git switch --create master &&
cat > .gitattributes <<-EOF &&
* -text
EOF
git add .gitattributes &&
git commit -q -m "Initialize Git configuration"
) &&
FIRST_HASH=$(git -C gitrepo rev-parse HEAD) &&
# 3. Run hg-fast-export
(
cd gitrepo &&
hg-fast-export.sh \
-r "../hgrepo" \
--first-commit-hash "$FIRST_HASH" --force \
-M master
) &&
# 4. Verify git file is still present
git -C gitrepo show HEAD:.gitattributes > gitattributes_check.txt &&
test "$(cat gitattributes_check.txt)" = "* -text" &&
# 5. Verify hg file is imported
git -C gitrepo show HEAD:readme.txt > readme_check.txt &&
test "$(cat readme_check.txt)" = "regular text file"
'
test_expect_success 'Mercurial history has priority over git' '
test_when_finished "rm -rf hgrepo gitrepo lfs-patterns.txt" &&
# 1. Create source Mercurial repository with binary files
(
hg init hgrepo &&
cd hgrepo &&
echo "hg readme file" > readme.txt &&
hg add readme.txt &&
hg commit -m "initial commit"
) &&
# 2. Prepare destination git repo with LFS setup
mkdir gitrepo &&
(
cd gitrepo &&
git init -q &&
git config core.ignoreCase false &&
git lfs install --local &&
git switch --create master &&
cat > readme.txt <<-EOF &&
git readme file
EOF
git add readme.txt &&
git commit -q -m "Initialize Git readme file"
) &&
FIRST_HASH=$(git -C gitrepo rev-parse HEAD) &&
# 3. Run hg-fast-export
(
cd gitrepo &&
hg-fast-export.sh \
-r "../hgrepo" \
--first-commit-hash "$FIRST_HASH" --force \
-M master
) &&
# 5. Verify hg file is imported
git -C gitrepo show HEAD:readme.txt > readme_check.txt &&
test "$(cat readme_check.txt)" = "hg readme file"
'
test_done

189
t/git_lfs_importer_plugin.t Executable file
View File

@@ -0,0 +1,189 @@
#!/bin/bash
#
# Copyright (c) 2025
#
test_description='git_lfs_importer plugin integration tests'
. "${SHARNESS_TEST_SRCDIR-$(dirname "$0")/sharness}"/sharness.sh || exit 1
setup() {
cat > "$HOME"/.hgrc <<-EOF
[ui]
username = Test User <test@example.com>
EOF
# Git config for the destination repo commits
git config --global user.email "test@example.com"
git config --global user.name "Test User"
}
setup
test_expect_success 'git_lfs_importer converts matched binary files to LFS pointers and pointers are properly smudged when checkouting' '
test_when_finished "rm -rf hgrepo gitrepo lfs-patterns.txt" &&
# 1. Create source Mercurial repository with binary files
(
hg init hgrepo &&
cd hgrepo &&
echo "regular text file" > readme.txt &&
echo "binary payload" > payload.bin &&
hg add readme.txt payload.bin &&
hg commit -m "initial commit with binary"
) &&
# 2. Prepare destination git repo with LFS setup
mkdir gitrepo &&
(
cd gitrepo &&
git init -q &&
git config core.ignoreCase false &&
git lfs install --local &&
cat > .gitattributes <<-EOF &&
*.bin filter=lfs diff=lfs merge=lfs -text
EOF
git add .gitattributes &&
git commit -q -m "Initialize Git LFS configuration"
) &&
FIRST_HASH=$(git -C gitrepo rev-parse HEAD) &&
# 3. Create LFS patterns file
cat > lfs-patterns.txt <<-EOF &&
*.bin
EOF
# 4. Run hg-fast-export with git_lfs_importer plugin
(
cd gitrepo &&
hg-fast-export.sh \
-r "../hgrepo" \
--plugin "git_lfs_importer=../lfs-patterns.txt" \
--first-commit-hash "$FIRST_HASH" --force
) &&
# 5. Verify conversion: payload.bin should be an LFS pointer
git -C gitrepo show HEAD:payload.bin > lfs_pointer.txt &&
grep -q "version https://git-lfs.github.com/spec/v1" lfs_pointer.txt &&
grep -q "oid sha256:" lfs_pointer.txt &&
grep -q "size" lfs_pointer.txt &&
# 6. Verify non-matched file is unchanged
git -C gitrepo show HEAD:readme.txt > readme_check.txt &&
test "$(cat readme_check.txt)" = "regular text file" &&
# 7. Make sure the LFS pointer file is unsmeared when checked out
git -C gitrepo reset --hard HEAD &&
ls gitrepo &&
test "$(cat gitrepo/payload.bin)" = "binary payload"
'
test_expect_success 'git_lfs_importer skips files not matching patterns' '
test_when_finished "rm -rf hgrepo gitrepo lfs-patterns.txt" &&
# 1. Create source with various files
(
hg init hgrepo &&
cd hgrepo &&
echo "text" > file.txt &&
echo "data" > file.dat &&
echo "iso content" > image.iso &&
hg add . &&
hg commit -m "multiple files"
) &&
# 2. Prepare git repo with LFS
mkdir gitrepo &&
(
cd gitrepo &&
git init -q &&
git config core.ignoreCase false &&
git lfs install --local &&
cat > .gitattributes <<-EOF &&
*.iso filter=lfs diff=lfs merge=lfs -text
EOF
git add .gitattributes &&
git commit -q -m "Initialize Git LFS configuration"
) &&
FIRST_HASH=$(git -C gitrepo rev-parse HEAD) &&
# 3. Only .iso files should be converted
cat > lfs-patterns.txt <<-EOF &&
*.iso
EOF
(
cd gitrepo &&
hg-fast-export.sh \
-r "../hgrepo" \
--plugin "git_lfs_importer=../lfs-patterns.txt" \
--first-commit-hash "$FIRST_HASH" --force
) &&
# 4. Verify .iso is LFS pointer
git -C gitrepo show HEAD:image.iso | grep -q "oid sha256:" &&
# 5. Verify .txt and .dat are unchanged
test "$(git -C gitrepo show HEAD:file.txt)" = "text" &&
test "$(git -C gitrepo show HEAD:file.dat)" = "data"
'
test_expect_success 'git_lfs_importer handles directory patterns' '
test_when_finished "rm -rf hgrepo gitrepo lfs-patterns.txt" &&
# 1. Create repo with files in directory
(
hg init hgrepo &&
cd hgrepo &&
mkdir -p assets/images &&
echo "logo data" > assets/images/logo.bin &&
echo "regular" > readme.txt &&
hg add . &&
hg commit -m "files in directories"
) &&
# 2. Prepare git repo
mkdir gitrepo &&
(
cd gitrepo &&
git init -q &&
git config core.ignoreCase false &&
git lfs install --local &&
cat > .gitattributes <<-EOF &&
assets/** filter=lfs diff=lfs merge=lfs -text
EOF
git add .gitattributes &&
git commit -q -m "Initialize Git LFS configuration"
) &&
FIRST_HASH=$(git -C gitrepo rev-parse HEAD) &&
# 3. Match directory pattern
cat > lfs-patterns.txt <<-EOF &&
assets/**
EOF
(
cd gitrepo &&
hg-fast-export.sh \
-r "../hgrepo" \
--plugin "git_lfs_importer=../lfs-patterns.txt" \
--first-commit-hash "$FIRST_HASH" --force
) &&
# 4. Verify directory file is converted
git -C gitrepo show HEAD:assets/images/logo.bin | grep -q "oid sha256:" &&
# 5. Verify file outside directory is unchanged
test "$(git -C gitrepo show HEAD:readme.txt)" = "regular"
'
test_done

View File

@@ -0,0 +1,20 @@
blob
mark :1
data 7
a_file
blob
mark :2
data 6
large
reset refs/heads/master
commit refs/heads/master
mark :3
author Grevious Bodily Harmsworth <gbh@example.com> 1679014800 +0000
committer Grevious Bodily Harmsworth <gbh@example.com> 1679014800 +0000
data 3
r0
M 100644 :1 a.txt
M 100644 :2 b.txt

69
t/largefile_plugin.t Executable file
View File

@@ -0,0 +1,69 @@
#!/bin/bash
#
# Copyright (c) 2023 Felipe Contreras
# Copyright (c) 2023 Frej Drejhammar
# Copyright (c) 2025 Günther Nußmüller
#
# Check that plugin invocation works with largefiles.
# This test uses the echo_file_data_test_plugin to verify that the
# file data is passed correctly, including the largefile status.
#
test_description='Largefiles and plugin test'
. "${SHARNESS_TEST_SRCDIR-$(dirname "$0")/sharness}"/sharness.sh || exit 1
git_create() {
git init -q "$1" &&
git -C "$1" config core.ignoreCase false
}
git_convert() {
(
cd "$2" &&
hg-fast-export.sh --repo "../$1" \
-s --hgtags -n \
--plugin ../../plugins/echo_file_data_test_plugin
)
}
setup() {
cat > "$HOME"/.hgrc <<-EOF
[ui]
username = Grevious Bodily Harmsworth <gbh@example.com>
[extensions]
largefiles =
EOF
}
commit0() {
(
cd hgrepo &&
echo "a_file" > a.txt &&
echo "large" > b.txt
hg add a.txt &&
hg add --large b.txt &&
hg commit -d "2023-03-17 01:00Z" -m "r0"
)
}
setup
test_expect_success 'largefile and plugin' '
test_when_finished "rm -rf hgrepo gitrepo" &&
(
hg init hgrepo &&
commit0
) &&
git_create gitrepo &&
git_convert hgrepo gitrepo &&
git -C gitrepo fast-export --all > actual &&
test_cmp "$SHARNESS_TEST_DIRECTORY"/largefile_plugin.expected actual &&
test_cmp "$SHARNESS_TEST_DIRECTORY"/largefile_plugin_file_info.expected gitrepo/largefile_info.txt
'
test_done

View File

@@ -0,0 +1,12 @@
filename: b'b.txt'
data size: 6 bytes
ctx rev: 0
ctx binary: False
is largefile: True
filename: b'a.txt'
data size: 7 bytes
ctx rev: 0
ctx binary: False
is largefile: False

144
t/main.t Executable file
View File

@@ -0,0 +1,144 @@
#!/bin/bash
#
# Copyright (c) 2023 Felipe Contreras
#
test_description='Main tests'
. "${SHARNESS_TEST_SRCDIR-$(dirname "$0")/sharness}"/sharness.sh || exit 1
check() {
echo "$3" > expected &&
git -C "$1" show -q --format='%s' "$2" > actual &&
test_cmp expected actual
}
git_clone() {
(
git init -q "$2" &&
cd "$2" &&
git config core.ignoreCase false &&
hg-fast-export.sh --repo "../$1"
)
}
setup() {
cat > "$HOME"/.hgrc <<-EOF
[ui]
username = H G Wells <wells@example.com>
EOF
}
setup
test_expect_success 'basic' '
test_when_finished "rm -rf hgrepo gitrepo" &&
(
hg init hgrepo &&
cd hgrepo &&
echo zero > content &&
hg add content &&
hg commit -m zero
) &&
git_clone hgrepo gitrepo &&
check gitrepo @ zero
'
test_expect_success 'merge' '
test_when_finished "rm -rf hgrepo gitrepo" &&
(
hg init hgrepo &&
cd hgrepo &&
echo a > content &&
echo a > file1 &&
hg add content file1 &&
hg commit -m "origin" &&
echo b > content &&
echo b > file2 &&
hg add file2 &&
hg rm file1 &&
hg commit -m "right" &&
hg update -r0 &&
echo c > content &&
hg commit -m "left" &&
HGMERGE=true hg merge -r1 &&
hg commit -m "merge"
) &&
git_clone hgrepo gitrepo &&
cat > expected <<-EOF &&
left
c
tree @:
content
file2
EOF
(
cd gitrepo
git show -q --format='%s' @^ &&
git show @:content &&
git show @:
) > actual &&
test_cmp expected actual
'
test_expect_success 'hg large file' '
test_when_finished "rm -rf hgrepo gitrepo" &&
(
hg init hgrepo &&
cd hgrepo &&
echo "[extensions]" >> .hg/hgrc
echo "largefiles =" >> .hg/hgrc
echo a > content &&
echo a > file1 &&
hg add content &&
hg add --large file1 &&
hg commit -m "origin" &&
echo b > content &&
echo b > file2 &&
hg add --large file2 &&
hg rm file1 &&
hg commit -m "right" &&
hg update -r0 &&
echo c > content &&
hg commit -m "left" &&
HGMERGE=true hg merge -r1 &&
hg commit -m "merge"
) &&
git_clone hgrepo gitrepo &&
cat > expected <<-EOF &&
left
c
tree @:
content
file2
EOF
(
cd gitrepo
git show -q --format='%s' @^ &&
git show @:content &&
git show @:
) > actual &&
test_cmp expected actual
'
test_done

View File

@@ -0,0 +1,18 @@
import sys
from mercurial import node
def build_filter(args):
return Filter(args)
class Filter:
def __init__(self, _):
pass
def file_data_filter(self,file_data):
with open('largefile_info.txt', 'a') as f:
f.write(f"filename: {file_data['filename']}\n")
f.write(f"data size: {len(file_data['data'])} bytes\n")
f.write(f"ctx rev: {file_data['file_ctx'].rev()}\n")
f.write(f"ctx binary: {file_data['file_ctx'].isbinary()}\n")
f.write(f"is largefile: {file_data.get('is_largefile', False)}\n")
f.write("\n")

2
t/plugins/id Executable file
View File

@@ -0,0 +1,2 @@
#!/bin/bash
cat

View File

@@ -0,0 +1,15 @@
import subprocess
import shlex
import sys
from mercurial import node
def build_filter(args):
return Filter(args)
class Filter:
def __init__(self, args):
self.filter_contents = shlex.split(args)
def file_data_filter(self,file_data):
if file_data['filename'].startswith(b'bad'):
file_data['data'] = None

View File

@@ -0,0 +1,15 @@
import subprocess
import shlex
import sys
from mercurial import node
def build_filter(args):
return Filter(args)
class Filter:
def __init__(self, args):
self.filter_contents = shlex.split(args)
def file_data_filter(self,file_data):
if file_data['filename'] == b'b.txt':
file_data['filename'] = b'c.txt'

42
t/set_origin.expected Normal file
View File

@@ -0,0 +1,42 @@
blob
mark :1
data 5
zero
reset refs/heads/prefix/master
commit refs/heads/prefix/master
mark :2
author H G Wells <wells@example.com> 1679014800 +0000
committer H G Wells <wells@example.com> 1679014800 +0000
data 5
zero
M 100644 :1 content
blob
mark :3
data 8
branch1
commit refs/heads/prefix/branch1
mark :4
author H G Wells <wells@example.com> 1679018400 +0000
committer H G Wells <wells@example.com> 1679018400 +0000
data 29
Added file in branch branch1
from :2
M 100644 :3 b8486c4feca589a4237a1ee428322d7109ede12e
blob
mark :5
data 8
branch2
commit refs/heads/prefix/branch2
mark :6
author H G Wells <wells@example.com> 1679022000 +0000
committer H G Wells <wells@example.com> 1679022000 +0000
data 29
Added file in branch branch2
from :4
M 100644 :5 fe786baee0d76603092c25609f2967b9c28a2cf2

59
t/set_origin.t Executable file
View File

@@ -0,0 +1,59 @@
#!/bin/bash
#
# Copyright (c) 2023 Felipe Contreras
# Copyright (c) 2025 Günther Nußmüller
#
test_description='Set origin tests'
. "${SHARNESS_TEST_SRCDIR-$(dirname "$0")/sharness}"/sharness.sh || exit 1
check() {
git -C "$1" fast-export --all > actual
test_cmp "$SHARNESS_TEST_DIRECTORY"/set_origin.expected actual
}
git_clone() {
(
git init -q "$2" &&
cd "$2" &&
git config core.ignoreCase false &&
hg-fast-export.sh --repo "../$1" --origin "$3"
)
}
setup() {
cat > "$HOME"/.hgrc <<-EOF
[ui]
username = H G Wells <wells@example.com>
EOF
}
make-branch() {
hg branch "$1"
FILE=$(echo "$1" | sha1sum | cut -d " " -f 1)
echo "$1" > $FILE
hg add $FILE
hg commit -d "2023-03-17 $2:00Z" -m "Added file in branch $1"
}
setup
test_expect_success 'basic' '
test_when_finished "rm -rf hgrepo gitrepo" &&
(
hg init hgrepo &&
cd hgrepo &&
echo zero > content &&
hg add content &&
hg commit -m zero -d "2023-03-17 01:00Z" &&
make-branch branch1 02 &&
make-branch branch2 03
) &&
git_clone hgrepo gitrepo prefix &&
check gitrepo
'
test_done

1
t/sharness Submodule

Submodule t/sharness added at e457513ae8

15
t/smoke-test.branchmap Normal file
View File

@@ -0,0 +1,15 @@
"feature"="renamed-feature"
"a?"="valid-0"
"a/"="valid-1"
"a/b"="valid-2"
"a/?"="valid-3"
"?a"="valid-4"
"a."="valid-5"
"a.b"="valid-6"
".a"="valid-7"
"/"="valid-8"
"___3"="___a"
"__2"="__b"
"_1"="_c"
"åäö"="abc"
"Feature- 12V Vac \"Venom\""="venom"

300
t/smoke-test.expected Normal file
View File

@@ -0,0 +1,300 @@
blob
mark :1
data 5
r0-a
blob
mark :2
data 5
r0-b
reset refs/heads/master
commit refs/heads/master
mark :3
author Grevious Bodily Harmsworth <gbh@example.com> 1679014800 +0000
committer Grevious Bodily Harmsworth <gbh@example.com> 1679014800 +0000
data 3
r0
M 100644 :1 a.txt
M 100644 :2 b.txt
blob
mark :4
data 5
r1-c
blob
mark :5
data 5
r1-d
commit refs/tags/2019_Spring_R2
mark :6
author Grevious Bodily Harmsworth <gbh@example.com> 1679018400 +0000
committer Grevious Bodily Harmsworth <gbh@example.com> 1679018400 +0000
data 3
r1
from :3
M 100644 :4 c.txt
M 100644 :5 d.txt
blob
mark :7
data 56
e92e41dde44f9dbbac08bbb83351a65b6728f128 2019 Spring R2
commit refs/heads/mainline
mark :8
author Grevious Bodily Harmsworth <gbh@example.com> 1679019000 +0000
committer Grevious Bodily Harmsworth <gbh@example.com> 1679019000 +0000
data 52
Added tag 2019 Spring R2 for changeset e92e41dde44f
from :6
M 100644 :7 .hgtags
blob
mark :9
data 5
r2-e
blob
mark :10
data 5
r2-f
commit refs/heads/mainline
mark :11
author Grevious Bodily Harmsworth <gbh@example.com> 1679022000 +0000
committer Grevious Bodily Harmsworth <gbh@example.com> 1679022000 +0000
data 3
r2
from :8
M 100644 :9 e.txt
M 100644 :10 f.txt
commit refs/heads/mainline
mark :12
author badly-formed-user <devnull@localhost> 1679025600 +0000
committer badly-formed-user <devnull@localhost> 1679025600 +0000
data 3
r3
from :11
M 100644 :9 g.txt
M 100644 :10 h.txt
blob
mark :13
data 10
feature-a
blob
mark :14
data 10
feature-b
commit refs/heads/renamed-feature
mark :15
author Grevious Bodily Harmsworth <gbh@example.com> 1679029200 +0000
committer Grevious Bodily Harmsworth <gbh@example.com> 1679029200 +0000
data 8
feature
from :12
M 100644 :13 feature-a.txt
M 100644 :14 feature-b.txt
blob
mark :16
data 3
a?
commit refs/heads/valid-0
mark :17
author Grevious Bodily Harmsworth <gbh@example.com> 1679032800 +0000
committer Grevious Bodily Harmsworth <gbh@example.com> 1679032800 +0000
data 24
Added file in branch a?
from :15
M 100644 :16 c1086ce03e4f52aadd1c93b1d097da510138522a
blob
mark :18
data 3
a/
commit refs/heads/valid-1
mark :19
author Grevious Bodily Harmsworth <gbh@example.com> 1679036400 +0000
committer Grevious Bodily Harmsworth <gbh@example.com> 1679036400 +0000
data 24
Added file in branch a/
from :17
M 100644 :18 85ed6fbb96d655df9f194bc9107f2d86210b9263
blob
mark :20
data 4
a/b
commit refs/heads/valid-2
mark :21
author Grevious Bodily Harmsworth <gbh@example.com> 1679040000 +0000
committer Grevious Bodily Harmsworth <gbh@example.com> 1679040000 +0000
data 25
Added file in branch a/b
from :19
M 100644 :20 aae42d317509399fdda80c4d8e46774d152dbd04
blob
mark :22
data 4
a/?
commit refs/heads/valid-3
mark :23
author Grevious Bodily Harmsworth <gbh@example.com> 1679043600 +0000
committer Grevious Bodily Harmsworth <gbh@example.com> 1679043600 +0000
data 25
Added file in branch a/?
from :21
M 100644 :22 ba54a8de7fe91c5e6e0a2dd1b9b37de0976ff5a7
blob
mark :24
data 3
?a
commit refs/heads/valid-4
mark :25
author Grevious Bodily Harmsworth <gbh@example.com> 1679047200 +0000
committer Grevious Bodily Harmsworth <gbh@example.com> 1679047200 +0000
data 24
Added file in branch ?a
from :23
M 100644 :24 d4cde16119b586025976741e87775762a2598984
blob
mark :26
data 3
a.
commit refs/heads/valid-5
mark :27
author Grevious Bodily Harmsworth <gbh@example.com> 1679050800 +0000
committer Grevious Bodily Harmsworth <gbh@example.com> 1679050800 +0000
data 24
Added file in branch a.
from :25
M 100644 :26 b4ce96ddcee0706a8c51130917f910b2b29faf77
blob
mark :28
data 4
a.b
commit refs/heads/valid-6
mark :29
author Grevious Bodily Harmsworth <gbh@example.com> 1679054400 +0000
committer Grevious Bodily Harmsworth <gbh@example.com> 1679054400 +0000
data 25
Added file in branch a.b
from :27
M 100644 :28 97051191e1a92daa11165ef10770bf964268c58b
blob
mark :30
data 3
.a
commit refs/heads/valid-7
mark :31
author Grevious Bodily Harmsworth <gbh@example.com> 1679058000 +0000
committer Grevious Bodily Harmsworth <gbh@example.com> 1679058000 +0000
data 24
Added file in branch .a
from :29
M 100644 :30 a667f8feec02fdfa6649772f844a24cf1ad5ebec
blob
mark :32
data 2
/
commit refs/heads/valid-8
mark :33
author Grevious Bodily Harmsworth <gbh@example.com> 1679061600 +0000
committer Grevious Bodily Harmsworth <gbh@example.com> 1679061600 +0000
data 23
Added file in branch /
from :31
M 100644 :32 8f27084b6294ddbe28dbcbf98f798730e8a79289
blob
mark :34
data 5
___3
commit refs/heads/___a
mark :35
author Grevious Bodily Harmsworth <gbh@example.com> 1679065200 +0000
committer Grevious Bodily Harmsworth <gbh@example.com> 1679065200 +0000
data 26
Added file in branch ___3
from :33
M 100644 :34 9b171494eb6e5ce325934b1656e286ca0510a697
blob
mark :36
data 4
__2
commit refs/heads/__b
mark :37
author Grevious Bodily Harmsworth <gbh@example.com> 1679068800 +0000
committer Grevious Bodily Harmsworth <gbh@example.com> 1679068800 +0000
data 25
Added file in branch __2
from :35
M 100644 :36 5dca703b71d2613c6bb3262b9b1741d6165e4a2f
blob
mark :38
data 3
_1
commit refs/heads/_c
mark :39
author Grevious Bodily Harmsworth <gbh@example.com> 1679072400 +0000
committer Grevious Bodily Harmsworth <gbh@example.com> 1679072400 +0000
data 24
Added file in branch _1
from :37
M 100644 :38 2fee90e148a2afbd911b67ced9b6240151f904ec
blob
mark :40
data 25
Feature- 12V Vac "Venom"
commit refs/heads/venom
mark :41
author Grevious Bodily Harmsworth <gbh@example.com> 1679076000 +0000
committer Grevious Bodily Harmsworth <gbh@example.com> 1679076000 +0000
data 46
Added file in branch Feature- 12V Vac "Venom"
from :39
M 100644 :40 b01def8779aed4be2f4b7325a89992a9aa566fec
blob
mark :42
data 7
åäö
commit refs/heads/abc
mark :43
author Grevious Bodily Harmsworth <gbh@example.com> 1679079600 +0000
committer Grevious Bodily Harmsworth <gbh@example.com> 1679079600 +0000
data 28
Added file in branch åäö
from :41
M 100644 :42 a0d01fcbff5d86327d542687dcfd8b299d054147

163
t/smoke-test.t Executable file
View File

@@ -0,0 +1,163 @@
#!/bin/bash
#
# Copyright (c) 2023 Felipe Contreras
# Copyright (c) 2023 Frej Drejhammar
#
# Smoke test used to sanity test changes to fast-export.
#
test_description='Smoke test'
. "${SHARNESS_TEST_SRCDIR-$(dirname "$0")/sharness}"/sharness.sh || exit 1
check() {
echo "$3" > expected &&
git -C "$1" show -q --format='%s' "$2" > actual &&
test_cmp expected actual
}
git_create() {
git init -q "$1" &&
git -C "$1" config core.ignoreCase false
}
git_convert() {
(
cd "$2" &&
hg-fast-export.sh --repo "../$1" \
-s --hgtags -n \
-B "$SHARNESS_TEST_DIRECTORY"/smoke-test.branchmap \
-T "$SHARNESS_TEST_DIRECTORY"/smoke-test.tagsmap
)
}
setup() {
cat > "$HOME"/.hgrc <<-EOF
[ui]
username = Grevious Bodily Harmsworth <gbh@example.com>
EOF
}
commit0() {
(
cd hgrepo &&
echo "r0-a" > a.txt &&
echo "r0-b" > b.txt &&
hg add a.txt b.txt &&
hg commit -d "2023-03-17 01:00Z" -m "r0" &&
hg bookmark bm0
)
}
commit1() {
(
cd hgrepo &&
echo "r1-c" > c.txt &&
echo "r1-d" > d.txt &&
hg branch mainline &&
hg add c.txt d.txt &&
hg commit -d "2023-03-17 02:00Z" -m "r1" &&
hg tag -d "2023-03-17 02:10Z" "2019 Spring R2"
)
}
commit2() {
(
cd hgrepo &&
echo "r2-e" > e.txt &&
echo "r2-f" > f.txt &&
hg add e.txt f.txt &&
hg commit -d "2023-03-17 03:00Z" -m "r2" &&
hg bookmark bm1
)
}
commit3() {
(
cd hgrepo &&
echo "r2-e" > g.txt &&
echo "r2-f" > h.txt &&
hg add g.txt h.txt &&
hg commit -d "2023-03-17 04:00Z" -u "badly-formed-user" -m "r3"
)
}
commit_rest() {
(
cd hgrepo &&
hg branch feature &&
echo "feature-a" > feature-a.txt &&
echo "feature-b" > feature-b.txt &&
hg add feature-a.txt feature-b.txt &&
hg commit -d "2023-03-17 05:00Z" -m "feature" &&
hg bookmark bm2 &&
# Now create strangely named branches
make-branch "a?" 06 &&
make-branch "a/" 07 &&
make-branch "a/b" 08 &&
make-branch "a/?" 09 &&
make-branch "?a" 10 &&
make-branch "a." 11 &&
make-branch "a.b" 12 &&
make-branch ".a" 13 &&
make-branch "/" 14 &&
make-branch "___3" 15 &&
make-branch "__2" 16 &&
make-branch "_1" 17 &&
make-branch "Feature- 12V Vac \"Venom\"" 18 &&
make-branch "åäö" 19 &&
hg bookmark bm-for-the-rest
)
}
make-branch() {
hg branch "$1"
FILE=$(echo "$1" | sha1sum | cut -d " " -f 1)
echo "$1" > $FILE
hg add $FILE
hg commit -d "2023-03-17 $2:00Z" -m "Added file in branch $1"
}
setup
test_expect_success 'all in one' '
test_when_finished "rm -rf hgrepo gitrepo" &&
(
hg init hgrepo &&
commit0 &&
commit1 &&
commit2 &&
commit3 &&
commit_rest
) &&
git_create gitrepo &&
git_convert hgrepo gitrepo &&
git -C gitrepo fast-export --all > actual &&
test_cmp "$SHARNESS_TEST_DIRECTORY"/smoke-test.expected actual
'
test_expect_success 'incremental' '
test_when_finished "rm -rf hgrepo gitrepo" &&
hg init hgrepo &&
commit0 &&
git_create gitrepo &&
git_convert hgrepo gitrepo &&
commit1 &&
git_convert hgrepo gitrepo &&
commit2 &&
commit3 &&
git_convert hgrepo gitrepo &&
commit_rest &&
git_convert hgrepo gitrepo &&
git -C gitrepo fast-export --all > actual &&
test_cmp "$SHARNESS_TEST_DIRECTORY"/smoke-test.expected actual
'
test_done

1
t/smoke-test.tagsmap Normal file
View File

@@ -0,0 +1 @@
"2019 Spring R2"="2019_Spring_R2"

0
tests/__init__.py Normal file
View File

223
tests/test_drop_plugin.py Normal file
View File

@@ -0,0 +1,223 @@
import sys, os, subprocess
from tempfile import TemporaryDirectory
from unittest import TestCase
from pathlib import Path
class CommitDropTest(TestCase):
def test_drop_single_commit_by_hash(self):
hash1 = self.create_commit('commit 1')
self.create_commit('commit 2')
self.drop(hash1)
self.assertEqual(['commit 2'], self.git.log())
def test_drop_commits_by_desc(self):
self.create_commit('commit 1 is good')
self.create_commit('commit 2 is bad')
self.create_commit('commit 3 is good')
self.create_commit('commit 4 is bad')
self.drop('.*bad')
expected = ['commit 1 is good', 'commit 3 is good']
self.assertEqual(expected, self.git.log())
def test_drop_sequential_commits_in_single_plugin_instance(self):
self.create_commit('commit 1')
hash2 = self.create_commit('commit 2')
hash3 = self.create_commit('commit 3')
hash4 = self.create_commit('commit 4')
self.create_commit('commit 5')
self.drop(','.join((hash2, hash3, hash4)))
expected = ['commit 1', 'commit 5']
self.assertEqual(expected, self.git.log())
def test_drop_sequential_commits_in_multiple_plugin_instances(self):
self.create_commit('commit 1')
hash2 = self.create_commit('commit 2')
hash3 = self.create_commit('commit 3')
hash4 = self.create_commit('commit 4')
self.create_commit('commit 5')
self.drop(hash2, hash3, hash4)
expected = ['commit 1', 'commit 5']
self.assertEqual(expected, self.git.log())
def test_drop_nonsequential_commits(self):
self.create_commit('commit 1')
hash2 = self.create_commit('commit 2')
self.create_commit('commit 3')
hash4 = self.create_commit('commit 4')
self.drop(','.join((hash2, hash4)))
expected = ['commit 1', 'commit 3']
self.assertEqual(expected, self.git.log())
def test_drop_head(self):
self.create_commit('first')
self.create_commit('middle')
hash_last = self.create_commit('last')
self.drop(hash_last)
self.assertEqual(['first', 'middle'], self.git.log())
def test_drop_merge_commit(self):
initial_hash = self.create_commit('initial')
self.create_commit('branch A')
self.hg.checkout(initial_hash)
self.create_commit('branch B')
self.hg.merge()
merge_hash = self.create_commit('merge to drop')
self.create_commit('last')
self.drop(merge_hash)
expected_commits = ['initial', 'branch A', 'branch B', 'last']
self.assertEqual(expected_commits, self.git.log())
self.assertEqual(['branch B', 'branch A'], self.git_parents('last'))
def test_drop_different_commits_in_multiple_plugin_instances(self):
self.create_commit('good commit')
bad_hash = self.create_commit('bad commit')
self.create_commit('awful commit')
self.create_commit('another good commit')
self.drop('^awful.*', bad_hash)
expected = ['good commit', 'another good commit']
self.assertEqual(expected, self.git.log())
def test_drop_same_commit_in_multiple_plugin_instances(self):
self.create_commit('good commit')
bad_hash = self.create_commit('bad commit')
self.create_commit('another good commit')
self.drop('^bad.*', bad_hash)
expected = ['good commit', 'another good commit']
self.assertEqual(expected, self.git.log())
def setUp(self):
self.tempdir = TemporaryDirectory()
self.hg = HgDriver(Path(self.tempdir.name) / 'hgrepo')
self.hg.init()
self.git = GitDriver(Path(self.tempdir.name) / 'gitrepo')
self.git.init()
self.export = ExportDriver(self.hg.repodir, self.git.repodir)
def tearDown(self):
self.tempdir.cleanup()
def create_commit(self, message):
self.write_file_data('Data for %r.' % message)
return self.hg.commit(message)
def write_file_data(self, data, filename='test_file.txt'):
path = self.hg.repodir / filename
with path.open('w') as f:
print(data, file=f)
def drop(self, *spec):
self.export.run_with_drop(*spec)
def git_parents(self, message):
matches = self.git.grep_log(message)
if len(matches) != 1:
raise Exception('No unique commit with message %r.' % message)
subject, parents = self.git.details(matches[0])
return [self.git.details(p)[0] for p in parents]
class ExportDriver:
def __init__(self, sourcedir, targetdir, *, quiet=True):
self.sourcedir = Path(sourcedir)
self.targetdir = Path(targetdir)
self.quiet = quiet
self.python_executable = str(
Path.cwd() / os.environ.get('PYTHON', sys.executable))
self.script = Path(__file__).parent / '../hg-fast-export.sh'
def run_with_drop(self, *plugin_args):
cmd = [self.script, '-r', str(self.sourcedir)]
for arg in plugin_args:
cmd.extend(['--plugin', 'drop=' + arg])
output = subprocess.DEVNULL if self.quiet else None
subprocess.run(cmd, check=True, cwd=str(self.targetdir),
env={'PYTHON': self.python_executable},
stdout=output, stderr=output)
class HgDriver:
def __init__(self, repodir):
self.repodir = Path(repodir)
def init(self):
self.repodir.mkdir()
self.run_command('init')
def commit(self, message):
self.run_command('commit', '-A', '-m', message)
return self.run_command('id', '--id', '--debug').strip()
def log(self):
output = self.run_command('log', '-T', '{desc}\n')
commits = output.strip().splitlines()
commits.reverse()
return commits
def checkout(self, rev):
self.run_command('checkout', '-r', rev)
def merge(self):
self.run_command('merge', '--tool', ':local')
def run_command(self, *args):
p = subprocess.run(('hg', '-yq') + args,
cwd=str(self.repodir),
check=True,
text=True,
capture_output=True)
return p.stdout
class GitDriver:
def __init__(self, repodir):
self.repodir = Path(repodir)
def init(self):
self.repodir.mkdir()
self.run_command('init')
def log(self):
output = self.run_command('log', '--format=%s', '--reverse')
return output.strip().splitlines()
def grep_log(self, pattern):
output = self.run_command('log', '--format=%H',
'-F', '--grep', pattern)
return output.strip().splitlines()
def details(self, commit_hash):
fmt = '%s%n%P'
output = self.run_command('show', '-s', '--format=' + fmt,
commit_hash)
subject, parents = output.splitlines()
return subject, parents.split()
def run_command(self, *args):
p = subprocess.run(('git', '--no-pager') + args,
cwd=str(self.repodir),
check=True,
text=True,
capture_output=True)
return p.stdout

View File

@@ -0,0 +1,156 @@
import sys
sys.path.append("./plugins")
import hashlib
import pathlib
import time
import unittest
import tempfile
import os
import pathspec
from git_lfs_importer import Filter, build_filter
class TestGitLfsImporterPlugin(unittest.TestCase):
def setUp(self):
# create an isolated temp dir and chdir into it for each test
self._orig_cwd = os.getcwd()
self._tmpdir = tempfile.TemporaryDirectory()
self.tmp_path = pathlib.Path(self._tmpdir.name)
os.chdir(self.tmp_path)
def tearDown(self):
# restore cwd and cleanup
os.chdir(self._orig_cwd)
self._tmpdir.cleanup()
def empty_spec(self):
return pathspec.PathSpec.from_lines(pathspec.patterns.GitWildMatchPattern, [])
# --------------------------------------------------------
# GIVEN-WHEN-THEN TESTS for Filter.file_data_filter
# --------------------------------------------------------
def test_skips_deletions(self):
flt = Filter(self.empty_spec())
file_data = {"filename": b"file.txt", "data": None}
flt.file_data_filter(file_data)
self.assertIsNone(file_data["data"])
self.assertFalse((self.tmp_path / ".git").exists())
def test_skips_files_that_do_not_match_spec(self):
spec = pathspec.PathSpec.from_lines(pathspec.patterns.GitWildMatchPattern, ["*.bin"])
flt = Filter(spec)
original = b"not matched"
file_data = {"filename": b"file.txt", "data": original}
flt.file_data_filter(file_data)
self.assertEqual(file_data["data"], original)
self.assertFalse((self.tmp_path / ".git").exists())
def test_converts_only_matched_files_to_lfs_pointer(self):
spec = pathspec.PathSpec.from_lines(pathspec.patterns.GitWildMatchPattern, ["*.bin"])
flt = Filter(spec)
data = b"hello world"
sha = hashlib.sha256(data).hexdigest()
expected_pointer = (
f"version https://git-lfs.github.com/spec/v1\n"
f"oid sha256:{sha}\n"
f"size {len(data)}\n"
).encode("utf-8")
file_data = {"filename": b"payload.bin", "data": data}
flt.file_data_filter(file_data)
self.assertEqual(file_data["data"], expected_pointer)
lfs_file = pathlib.Path(".git/lfs/objects") / sha[:2] / sha[2:4] / sha
self.assertTrue(lfs_file.is_file())
self.assertEqual(lfs_file.read_bytes(), data)
def test_does_not_convert_unmatched_directory(self):
spec = pathspec.PathSpec.from_lines(pathspec.patterns.GitWildMatchPattern, ["assets/**"])
flt = Filter(spec)
data = b"outside directory"
file_data = {"filename": b"src/images/logo.png", "data": data}
flt.file_data_filter(file_data)
self.assertEqual(file_data["data"], data)
self.assertFalse((self.tmp_path / ".git").exists())
def test_converts_matched_directory(self):
spec = pathspec.PathSpec.from_lines(pathspec.patterns.GitWildMatchPattern, ["assets/**"])
flt = Filter(spec)
data = b"inside directory"
sha = hashlib.sha256(data).hexdigest()
file_data = {"filename": b"assets/images/logo.png", "data": data}
flt.file_data_filter(file_data)
self.assertIn(b"version https://git-lfs.github.com/spec/v1", file_data["data"])
lfs_file = pathlib.Path(".git/lfs/objects") / sha[:2] / sha[2:4] / sha
self.assertTrue(lfs_file.is_file())
self.assertEqual(lfs_file.read_bytes(), data)
def test_does_not_overwrite_existing_blob(self):
spec = pathspec.PathSpec.from_lines(pathspec.patterns.GitWildMatchPattern, ["*.bin"])
flt = Filter(spec)
data = b"abc"
sha = hashlib.sha256(data).hexdigest()
lfs_dir = pathlib.Path(".git/lfs/objects") / sha[:2] / sha[2:4]
lfs_dir.mkdir(parents=True, exist_ok=True)
lfs_file = lfs_dir / sha
lfs_file.write_bytes(data)
before_mtime = lfs_file.stat().st_mtime_ns
time.sleep(0.01) # Ensure timestamp difference
file_data = {"filename": b"abc.bin", "data": data}
flt.file_data_filter(file_data)
expected_pointer_prefix = b"version https://git-lfs.github.com/spec/v1"
self.assertTrue(file_data["data"].startswith(expected_pointer_prefix))
after_mtime = lfs_file.stat().st_mtime_ns
self.assertEqual(after_mtime, before_mtime)
def test_empty_file_converted_when_matched(self):
spec = pathspec.PathSpec.from_lines(pathspec.patterns.GitWildMatchPattern, ["*.bin"])
flt = Filter(spec)
data = b""
sha = hashlib.sha256(data).hexdigest()
file_data = {"filename": b"empty.bin", "data": data}
flt.file_data_filter(file_data)
self.assertIn(b"size 0", file_data["data"])
lfs_file = pathlib.Path(".git/lfs/objects") / sha[:2] / sha[2:4] / sha
self.assertTrue(lfs_file.is_file())
self.assertEqual(lfs_file.read_bytes(), data)
# --------------------------------------------------------
# Optional: GIVEN-WHEN-THEN for build_filter
# --------------------------------------------------------
def test_build_filter_reads_patterns_file(self):
patterns_file = self.tmp_path / "lfs_patterns.txt"
patterns_file.write_text("*.bin\nassets/**\n", encoding="utf-8")
flt = build_filter(str(patterns_file))
data_match = b"match me"
sha_match = hashlib.sha256(data_match).hexdigest()
fd_match = {"filename": b"assets/payload.bin", "data": data_match}
flt.file_data_filter(fd_match)
self.assertIn(b"oid sha256:", fd_match["data"])
lfs_file = pathlib.Path(".git/lfs/objects") / sha_match[:2] / sha_match[2:4] / sha_match
self.assertTrue(lfs_file.is_file())
data_skip = b"skip me"
fd_skip = {"filename": b"docs/readme.md", "data": data_skip}
flt.file_data_filter(fd_skip)
self.assertEqual(fd_skip["data"], data_skip)