108 Commits

Author SHA1 Message Date
Frej Drejhammar
667404e836 Merge branch 'PR291' 2022-09-21 18:31:16 +02:00
Nicolas Vanhoren
38e236962d Update README.md to change recommandation for crlf filtering 2022-09-21 01:37:39 +02:00
Frej Drejhammar
dbb8158527 Merge branch 'frej/submodule-doc-improvement' 2022-02-10 20:05:07 +01:00
Frej Drejhammar
bb0bcda7ba Merge branch 'frej/fix-re-future-warning' 2022-02-10 20:04:14 +01:00
Frej Drejhammar
838b654614 Remove inconsistencies from submodule documentation
The submodule documentation is not consistent with regards to the
example directory structure. Update the example to be consistent.

Closes #277.
2022-02-09 15:58:48 +01:00
Frej Drejhammar
f179afce65 Fix FutureWarning about nested sets in re
Since Python 3.7 the re module warns for syntax which could, in the
future, be misparsed as a nested set. Avoid this by escaping the
literal `[` we search for in the regexp.

Reported by Monte Davidoff @mndavidoff

Closes #269.
2022-02-09 15:37:29 +01:00
Frej Drejhammar
5b7ca5aaec Give proper error message when refusing to overwrite existing branch
If fast-export was asked to export a Mercurial branch to Git and a
branch of the same name already existed in the Git repo but it was not
created by fast export, fast-export would crash while trying to format
an error message claiming that the destination branch was modified
behind its back.

This patch extends fast-export to detect the situation above and give
a proper error message which hopefully is less confusing to the user.

Credits for discovering the original crash goes to Shun-ichi Goto
<gotoh@taiyo.co.jp>.

Closes: #269.
2021-08-27 16:04:40 +02:00
Frej Drejhammar
4227621eed Update contribution guidelines and make github display them
Try to make it clear that sloppy, throw it over the fence, patches
won't be accepted without revision and try to make sure a potential
contributor sees the warning while creating a pull request.
2021-07-29 15:28:01 +02:00
Frej Drejhammar
bdfc0c08c7 Merge branch 'frej/issue-258'
Closes 258
2021-02-26 16:44:31 +01:00
Frej Drejhammar
001749e69d Merge branch 'PR/260'
Closes 257
2021-02-26 16:40:12 +01:00
SirIntellegence
20c22a3110 Add plugin support for the 'extra' field
Permits plugins to import other information such as svn conversion revisions
2021-02-22 13:09:48 -07:00
Frej Drejhammar
f741bf39f2 bugfix: Avoid starting incremental conversions from scratch
Keys and values in the state cache are byte strings, therefore a
lookup of 'tip' will always fail. The failure makes the conversion
start over from the beginning, but as fast-export is deterministic the
results are the same, just very inefficient. The bug has existed since
the port to Python 3.

This patch switches the 'tip' lookup to use a byte string which should
make incremental conversions restart at the last converted commit. As
'x' == b'x' in Python 2, this should be a backwards compatible change.

Bug reported and fix suggested by Tomas Kolda.

Fixes #258.
2021-02-19 16:47:53 +01:00
Frej Drejhammar
427663c766 Merge branch 'PR/254' 2021-01-10 15:18:28 +01:00
Ray Luo
056756f193 Remove some ".py" wording
Avoid confusion about which file is the main entry point to fast-export,
in order to avoid the issue mentioned here

https://github.com/frej/fast-export/issues/158#issuecomment-754482516

Also fix a typo
2021-01-09 02:06:52 -08:00
Frej Drejhammar
588e03bb23 Merge branch 'PR/251' 2020-11-15 15:34:27 +01:00
Jason Winnebeck
89da4ad8af Document --ignore-unnamed-heads option 2020-11-14 21:24:54 -05:00
Frej Drejhammar
b0d5e56c8d Merge branch 'PR/247' 2020-10-29 19:01:04 +01:00
Frej Drejhammar
787e8559b9 Fix typo in README 2020-10-29 19:00:30 +01:00
Henrik Tunedal
ab500a24a7 Add plugin for dropping commits from output 2020-10-29 12:04:27 +01:00
Frej Drejhammar
ead75895b0 Enable code analysis
Merge github generated workflow into master
2020-10-10 16:26:53 +02:00
Frej Drejhammar
bf5f14ddab Create codeql-analysis.yml 2020-10-10 13:15:54 +00:00
Frej Drejhammar
7057ce2c2b Allow plugins to modify the committer
Plugins have since they were introduced been able to modify the author
of a commit, but not the committer. This patch adds the necessary
support for allowing them to also modify the committer.
2020-09-30 17:47:33 +02:00
Frej Drejhammar
2b6f735b8c Update section about submitting patches in README
Try to cover the most common reasons for requesting changes in PRs.
2020-09-09 14:08:00 +02:00
Frej Drejhammar
71acb42a09 Merge branch 'PR/236-v2' into master
Implement a plugin converting unnamed heads to branches
2020-07-31 17:08:04 +02:00
Ondrej Stanek
a7955bc49b Update head2branch plugin to accept hg commit hash
The revision number isn't a unique identifier of commits across
repository clones and forks, while the hg hash is guaranteed to be stable.
2020-07-31 10:50:57 +02:00
Ondrej Stanek
9c6dea9fd4 Pass original hg commit hash to plugins 2020-07-31 10:50:51 +02:00
Ethan Furman
21827a53f7 Add head2branch plugin
Support converting unnamed heads to named branches during mercurial
conversions.

Co-Authored-By:	ostan89@gmail.com
2020-07-31 10:49:08 +02:00
Ethan Furman
5c1cbf82b0 Add revision to commit_data for commit plugins
Co-Authored-By: ostan89@gmail.com
2020-07-31 10:48:33 +02:00
Ondrej Stanek
50631c4b34 Add option --ignore-unnamed-heads
This option allows the user to ignore only unnamed heads (compared to --force
which ignores all non-fatal issues). The intended use is for a future plugin
converting unnamed heads to named branches.
2020-07-31 10:30:53 +02:00
Ethan Furman
2a9dd53d14 Show all unnamed heads at once
Co-Authored-By: ostan89@gmail.com
2020-07-31 10:27:07 +02:00
Frej Drejhammar
597093eaf1 Merge branch 'fix-233'
Closes #233
2020-07-10 16:52:17 +02:00
Frej Drejhammar
3910044a97 Avoid crash during rev-parse when the default encoding is ascii
In some locales the default encoding is ascii in which case
subprocess.check_output() will fail if it is given a non-ascii ref as
one of the arguments. By forcing the ref to be utf8 we will avoid a
crash while still behaving correctly when the default encoding is
utf8.

The credits for this fix go to Nikita Bazhinov for discovering the fix
and Chris J Billington for explaining it.

Co-Authored-By: Nikita Bazhinov <nbazhinov@syntellect.ru>
Co-Authored-By: Chris J Billington <chrisjbillington@gmail.com>
2020-07-10 16:41:38 +02:00
Frej Drejhammar
44c50d0fae Merge branch 'PR/226' 2020-05-07 20:10:24 +02:00
chrisjbillington
d29d30363b Fix backward incompatible change for hg < 5.1
The port to Python 3 in b961f146 changed `repo.branchmap().iteritems()`
to use `.items()` instead. However, the object returned by mercurial
isn't a dictionary and its `.items()` method was only introduced (as an
alias for `iteritems`) in hg 5.1. `iteritems()` still exists, so let's
keep using it for now to retain compatibility with hg < 5.1.
2020-05-06 11:59:49 -04:00
Frej Drejhammar
f102d2a69f Merge branch 'PR/223'
Closes #223
2020-05-06 16:31:13 +02:00
Ondrej Stanek
cf0e5837b6 Allow converting a repository with git and hg subrepos
In the verification phase, fast-export falsely expects that both hg
and git subrepositories should have the appropriate line in the
subrepo-map file. The case is, that only hg subrepos need a line in
subrepo-map that references a converted subrepo, while git
subrepositories do not.
2020-05-06 16:30:05 +02:00
Frej Drejhammar
61d22307af Merge branch 'PR/217'
Closes: #215
2020-03-26 20:17:20 +01:00
chrisjbillington
3b3f86b71e Allow utf8 in mappings
We were previously processing entries in mapping files (when
`--mappings-are-raw` is not given) with
`.decode('unicode_escape').encode('utf8')` to replace backslash escape
sequences in bytestrings with the utf-8 encoded characters they
represent. However, it turns out that `.decode
('unicode_escape')` assumes latin-1 encoding if it encounters non-ascii
bytes: https://bugs.python.org/issue21331. So this gave incorrect
results if non-ascii utf8 data was present in the mapping.

To fix this, we now add an extra layer of `.decode('utf8').encode
('unicode-escape')` in order to convert any non-ascii characters into
their backslash escape sequences. Then the subsequent
`.decode('unicode_escape')` only encounters ascii characters and gives
correct results.
2020-03-25 12:33:42 -04:00
Frej Drejhammar
e51844cd65 Merge branch 'PR/214'
Closes: #213
2020-03-25 16:09:01 +01:00
Toni Sissala
90eeef2ff4 Fix TypeError when using -M command line argument
hg-fast-export.sanitize_name expects branch name to be a bytes
object. Command line parser gives out str objects. Convert
possible str object to bytes in hg2git.set_default_branch().
2020-03-25 11:19:25 +02:00
Frej Drejhammar
7f4d9c3ad4 Merge branch 'PR/211' 2020-03-10 17:51:47 +01:00
Pi Delport
b37420f404 Fix link markup for hg-export-tool 2020-03-09 16:41:26 +02:00
Frej Drejhammar
f2aa47fdf7 Merge branch 'PR/210'
Closes #210.
2020-03-08 19:43:23 +01:00
chrisjbillington
6361b44c33 Fix bug in ignoring .git files/folders on Windows
Mercurial internally stores (most) filepaths using forward slashes, and
returns them as such from its Python API, even on Windows.

So the splitting up of filepaths with `os.path.sep` was incorrect,
resulting in `.git` files (those within a subdirectory, anyway)
not being ignored on Windows as intended. Splitting on `b'/'` regardless
of OS fixes this.
2020-03-08 19:40:50 +01:00
Frej Drejhammar
afeb58ae95 Merge branch 'PR/209' 2020-03-06 17:30:52 +01:00
chrisjbillington
48508ee299 Fix failure to print error message in verify_heads
On Python 3, `b'%s' % None` fails with a TypeError. In verify_heads,
an error message prints the sha1 of a git commit, but that sha1
can be None.

This commit instead prints `b'<None>'` if sha1 is None.
2020-03-06 11:02:38 -05:00
Frej Drejhammar
56da62847a Merge branch 'PR/208'
Closes #207.
2020-03-01 14:34:38 +01:00
Max Fuqua
750fe6d3e1 Resolve type error resulting from passing an int to b'%s' in python3 2020-02-29 14:55:15 -05:00
Frej Drejhammar
e4d6d433ec Merge branch 'PR/206' 2020-02-29 14:48:46 +01:00
Steven Peters
058c791b75 Check python's mercurial version for compatibility
When checking that python has the mercurial package in hg-fast-export.sh,
use the same import statement that is used in hg-fast-export.py.

hg-fast-export.py imports revsymbol from mercurial.scmutil,
which was introduced in mercurial 4.6, but Ubuntu 18.04 only has
mercurial 4.5.3 using python2, so an incompatible python version may be
chosen without this change.
2020-02-28 15:41:24 -08:00
Frej Drejhammar
13010f7a25 Merge branch 'PR/204'
Closes #203.
2020-02-21 16:34:03 +01:00
chrisjbillington
4071f720b0 Fix issue #203: Resolve stderr encoding issues
In Python 3, `sys.stderr.write()` requires unicode strings, and all
output on standard streams is UTF8 encoded. Therefore in the port to
Python 3, we `.decode()`d all strings that are used in `%` formatting of
strings to be printed to stderr.

However, in Python 2, `sys.stderr` accepts either bytestrings or unicode
strings, and:

- `%s` formatting of a bytestring with a unicode string, i.e  `"%s" %
  u"foo"` results in a unicode string.
- Writing a unicode string to stderr/stdout uses that stream's encoding
- When the output of the process is being piped somewhere other than a
  terminal (as it is when called with pipes and shell redirection from
  hg-fast-export.sh), that encoding is None, which implies ASCII.
- This raises UnicodeEncodeError if the unicode strings passed to
  `stderr.write()` have non-ascii characters.

We cannot fix this problem simply by encoding UTF8 again before writing
to stderr on Python 2. This is because the *decoding* of filenames with
the UTF8 codec may fail - filenames may not even be valid UTF8 desite
this being the declared filesystem encoding.

We could `fsdecode()` filenames on Python 3, which would use the
`surrogateescape` error handler, but stderr does not use this error
handler for output, meaning we would just have to encode again (with the
same error handler) anyway. And Python 2 lacks the `surrogateescape`
error handler in any case - we would need to reimplement it just to do a
round-trip decode and encode for no reason.

This commit leaves filenames and other repository data as bytestrings,
and simply writes them to `sys.stderr.buffer` on Python 3 or
`sys.stderr` on Python 2 as-is, after `%` formatting with bytestring
literals. This avoids encoding issues of filenames altogether.

Other writing to stderr that does not involve repository data has been
left with "native" strings, i.e.
`sys.stderr.write("a string literal %s" % a_command_line_arg)`. These
will still fail on Python 3 if the user passes a non-UTF filename as a
command line argument or similar. This is acceptable IMHO - although
`hg-fast-export` may encounter invalid UTF8 in mercurial repositories,
it is not too much to impose that the user name their branch mapping
files etc with valid UTF8!
2020-02-19 12:18:00 -05:00
Frej Drejhammar
160aa3c9ef Add a reference to hg-export-tool in the documentation
Add pointers to hg-export-tool as a way to batch convert multiple
Mercurial repos, and deal with duplicate heads.
2020-02-14 17:16:18 +01:00
Frej Drejhammar
883474184d Merge branch 'PR/201'
Closes 201
2020-02-14 17:01:35 +01:00
chrisjbillington
b961f146df Support Python 3
Port hg-fast-import to Python 2/3 polyglot code.

Since mercurial accepts and returns bytestrings for all repository data,
the approach I've taken here is to use bytestrings throughout the
hg-fast-import code. All strings pertaining to repository data are
bytestrings. This means the code is using the same string datatype for
this data on Python 3 as it did (and still does) on Python 2.

Repository data coming from subprocess calls to git, or read from files,
is also left as the bytestrings either returned from
subprocess.check_output or as read from the file in 'rb' mode.

Regexes and string literals that are used with repository data have
all had a b'' prefix added.

When repository data is used in error/warning messages, it is decoded
with the UTF8 codec for printing.

With this patch, hg-fast-export.py writes binary output to
sys.stdout.buffer on Python 3 - on Python 2 this doesn't exist and it
still uses sys.stdout.

The only strings that are left as "native" strings and not coerced to
bytestrings are filepaths passed in on the command line, and dictionary
keys for internal data structures used by hg-fast-import.py, that do
not originate in repository data.

Mapping files are read in 'rb' mode, and thus bytestrings are read from
them. When an encoding is given, their contents are decoded with that
encoding, but then immediately encoded again with UTF8 and they are
returned as the resulting bytestrings

Other necessary changes were:

 - indexing byestrings with a single index returns an integer on Python.
   These indexing operations have been replaced with a one-element
   slice: x[0] -> x[0:1] or x[-1] -> [-1:] so at to return a bytestring.

 - raw_hash.encode('hex_codec') replaced with binascii.hexlify(raw_hash)

 - str(integer) -> b'%d' % integer

 - 'string_escape' codec replaced with 'unicode_escape' (which was
    backported to python 2.7). Strings decoded with this codec were then
    immediately re-encoded with UTF8.

 - Calls to map() intended to execute their contents immediately were
   unwrapped or converted to list comprehensions, since map() is an
   iterator and does not execute until iterated over.

hg-fast-export.sh has been modified to not require Python 2. Instead, if
PYTHON has not been defined, it checks python2, python, then python3,
and uses the first one that exists and can import the mercurial module.
2020-02-13 14:35:19 -05:00
Frej Drejhammar
595587b245 Merge branch 'PR/197'
Closes #197, #185, #196
2020-02-09 19:39:21 +01:00
Matthijs van der Burgh
0b6b83c3de Adapt to status becoming an object in Mercurial 5.3
Status has always been a tuple, but since 5.3, commit:
https://www.mercurial-scm.org/repo/hg/rev/c5548b0b6847, it is an object.
Therefore the __getitem__ of the tuple isn't available anymore.

This fix is compatible with mercurial>=4.6, as the old status tuple
still has the same properties.
2020-02-08 17:23:30 +01:00
Frej Drejhammar
29a457eccf Merge branch 'PR/198'
Closes 198
2020-02-08 16:08:56 +01:00
Frej Drejhammar
4bc6dec5eb Merge branch 'PR/199'
Closes #199
2020-02-08 16:05:01 +01:00
Frej Drejhammar
fa8ebd994d Add link to what's expected for commit messages to the README 2020-02-08 15:50:17 +01:00
Frej Drejhammar
e83501d30d Make README issue tracker link a Markdown link 2020-02-08 15:43:10 +01:00
chrisjbillington
8efbb57822 Add additional options to branch_name_in_commit plugin
- Allow skipping writing the branch name if the branch is 'master'.

- Allow writing the branch name on the same line as the first line of
  the commit message separated by a colon, instead of it having its own
  line.
2020-02-07 20:48:49 -05:00
chrisjbillington
8d135fe700 Ignore files and directories called .git
Git cannot track these files. Print a warning if encountering one.

Fixes #166
2020-02-07 17:52:57 -05:00
Frej Drejhammar
ed36227c62 Merge branch 'PR/192'
Closes #192
2020-01-31 17:12:30 +01:00
Frej Drejhammar
507c17cc1b Revert "Handle --force option correctly in any position"
This reverts commit 0c5617bf8d.

The changes turned out to require bash. Traditionally we have tried to
stay compatible with plain old sh, so this is a revert.

Closes #195.
2020-01-31 17:01:04 +01:00
James Douglass
1841ba4be9 Add a plugin to prefix an issue number with a user-defined string. 2020-01-29 14:18:17 -08:00
Frej Drejhammar
30e54cb55c Merge branch 'PR/194'
Closes #194.
2020-01-29 19:20:48 +01:00
Frej Drejhammar
5f7bf7ee71 Merge branch 'PR/193'
Closes #193
2020-01-29 19:15:18 +01:00
Alexander Regueiro
0c5617bf8d Handle --force option correctly in any position 2020-01-29 18:13:54 +00:00
Frej Drejhammar
29ec91970e Merge branch 'PR/189-fixup-subrepo-list-refreshing-v2'
Fixes #187.
2020-01-28 19:43:49 +01:00
James Douglass
601daf60f7 Adding a new plugin to overwrite null messages. 2020-01-26 17:25:07 -08:00
MokhamedDakhraui
9c9669d361 Check .hgsub and .hgsubstate files to detect subrepo changes 2020-01-26 00:36:34 +03:00
Frej Drejhammar
2ba5d77435 Merge PR#183
Closes #183
2019-12-22 19:08:08 +01:00
Justin Murray
e8a681121b Document default branch behavior
Document the default behavior of renaming the `default` hg branch to `master`
on git, and how to override from the command line when this causes problems.

See also: #182
2019-12-21 15:34:30 -05:00
Frej Drejhammar
ffdd27c2da Merge branch 'mossop/PR/git-submodules-v3'
Closes #180
2019-12-07 19:36:31 +01:00
Dave Townsend
ab31fdcbaa Add support for git submodules
Mercurial supports not only submodules which are Mercurial
repositories, but also Git and Subversion repositories. This
patch adds support for submodules which are Git repositories to
hg-fast-export.

As submodules which are Git repositories won't need a mapping
file we trigger the submodule update only on the occurence of the
`.hgsubstate` file and push the check for a valid
`submodule_mappings` to `refresh_gitmodules(ctx)`
2019-12-07 10:22:23 -08:00
Dave Townsend
acf93a80a9 Only export submodules that exist in the submodule mapping. 2019-12-07 10:21:26 -08:00
Dave Townsend
0f49bfe0db Move hg sub-module updating to its own function, NFC
This refactoring is in preparation to supporting Mercurial
submodules which are git repositories.
2019-12-07 09:39:43 -08:00
Frej Drejhammar
3af916d664 Clarify requirements
Make it clear that python 2.7.x is a hard requirement and that
Mercurial >= 4.6 is required. Also clean up an old editing artefact.
2019-11-12 17:46:08 +01:00
Frej Drejhammar
02c54a5513 Merge branch 'Mossop-obsolete'
Closes #175
2019-10-20 19:54:08 +02:00
Dave Townsend
b54046d3aa Avoid showing a warning when the mercurial repository has obsolete markers. 2019-10-20 19:49:25 +02:00
Dave Townsend
ff1c885305 Ignore obsolete changesets in the source repository
Obsolete changesets are, for example, create by the Evolve
extension. This patch switches to an unfiltered repository (the
filtered one throws on an attempt to access obsolete revisions) and
then filters out the obsolete revisions when it comes across them.

Fixes #173
2019-10-20 19:45:42 +02:00
Frej Drejhammar
0096085b6f Tag maps should use the same syntax as branch and author maps
When version v171002 introduced a new mapping file format for branches
and authors, that change never made it to the remapping of tags
although the README documents it.

Fixes #172.
2019-10-12 21:09:14 +02:00
Frej Drejhammar
6f9bc6517a Merge branch 'pr/FAQ' 2019-09-24 22:56:42 +02:00
Frej Drejhammar
243100eea4 Add a section on frequent problems to the README
This tries to preemptively avoid recurrence of issues #148, #152,
 #155, #165 and #168.
2019-09-19 16:41:04 +02:00
Frej Drejhammar
1181a0af47 Allow name sanitizer to be disabled with --no-auto-sanitize
Make it possible to completely disable the name sanitizer by the
--no-auto-sanitize flag. Previously the sanitizer was run on user
remapped names. As the sanitizer rewrites perfectly legal git
names (such as __.*) this is probably not what the user wants.

Closes #155.
2019-09-13 14:56:32 +02:00
Frej Drejhammar
7ab47e002f Merge branch 'jpaugh-patch-1'
Closes #164
2019-09-12 20:14:42 +02:00
Jonathan Paugh
96762f5474 README: Fix broken links
Use "footnote" style links to prevent future issues whenever the text is formatted to a specific length.
2019-09-11 16:46:55 -05:00
Frej Drejhammar
fcdc91634a Merge branch 'be-non-pep349-tolerant'
Closes: #143
Closes: #160
2019-09-01 18:31:46 +02:00
Frej Drejhammar
f57fba000b Try to do the right thing on non PEP394 compliant systems
PEP 394 [1] tells us that on systems with both a python 2 and 3
installed, the python 2 interpreter should be installed as python2.

Unfortunately not all distributions adheres to PEP 394 (I'm looking at
you, Windows) so to handle that we first try to find a 'python2', then
fall back on plain 'python'. In order to not silently pick a python 3
by mistake, we check sys.version_info using the the interpreter we
found.

[1] https://www.python.org/dev/peps/pep-0394/
2019-09-01 18:31:18 +02:00
Frej Drejhammar
b25cbd6753 Merge branch 'pr/157-v3'
Closes #156
2019-08-18 11:57:53 +02:00
MokhamedDakhraui
581b1b3d17 Remove git submodules if .hgsubstate file was removed or emptied 2019-08-18 05:46:46 +03:00
MokhamedDakhraui
7df01ac323 Refactor refresh_gitmodules()
Use the change context substate field instead of manually parsing the `.hgsubstate` file.
2019-08-16 02:42:03 +03:00
MokhamedDakhraui
914f5a0dbe Replaced several lambdas by one loop 2019-08-16 02:41:54 +03:00
MokhamedDakhraui
8779cb5e95 Extract operations with submodules to separated methods 2019-08-16 02:40:44 +03:00
Johannes Carlsson
47d330de83 Add support for mercurial subrepos
This adds a new command line option (--subrepo-map) that will
map mercurial subrepos to git submodules.

The --subrepo-map takes a mapping file as an argument that will
be used to map a subrepo folder to a git submodule.

For more information see the README-SUBMODULES.md.

This commit is inspired by the changes made by daolis in PR#38
that was never merged.

Closes: #51
Closes: #147
2019-01-07 18:41:19 +01:00
Frej Drejhammar
b51c58d3e0 Merge branch 'thetradedesk-master'
Closes #144
2018-12-06 21:51:32 +01:00
Johan Henkens
cadcfcbe90 Move filter_contents to plugin system 2018-12-05 13:25:48 -08:00
Johan Henkens
5e7895ca6b Add branch_name_in_commit plugin 2018-12-05 13:25:48 -08:00
Johan Henkens
679103795b Add dos2unix plugin 2018-12-05 13:25:48 -08:00
Johan Henkens
e895ce087f Add plugin system 2018-12-05 13:25:47 -08:00
Johan Henkens
850094c498 Add gitattributes, additional ignores 2018-12-05 13:25:47 -08:00
Daniel Small
2bb173ef68 hg 4.7: Replace call to util.email with templatefilters.email
This change is required for Mercurial 4.7 support and fixes #137.
2018-08-11 15:49:08 +02:00
Frej Drejhammar
ac60034ba3 Adhere to PEP 394
From PEP 394 [1]:

* python2 will refer to some version of Python 2.x.

* end users should be aware that python refers to python3 on at least
  Arch Linux (that change is what prompted the creation of this PEP),
  so python should be used in the shebang line only for scripts that
  are source compatible with both Python 2 and 3.

So to make sure that we run correctly on a system where python refers
to python3 and avoid problems like issue #11 we change the shebangs.

[1] https://www.python.org/dev/peps/pep-0394/
2018-08-11 15:07:19 +02:00
Frej Drejhammar
eca99b61eb Merge branch 'atykhyy-as-binary'
This closes #95
2018-06-22 16:46:10 +02:00
Anton Tykhyy
89db1d93cf Add --filter-contents 2018-06-17 21:09:59 +03:00
Frej Drejhammar
e200cec39f Adapt to changes in Mercurial 4.6
Starting with Mercurial 4.6 repo.lookup() no longer accepts raw hashes
for lookups.
2018-06-10 15:51:09 +02:00
Gabriel
51d5f893db Add a section about system requirements to the README
Add @rinu's suggestion on how to run fast-export on Windows to the
README, this fixes #121.
2018-06-10 15:44:46 +02:00
28 changed files with 1391 additions and 171 deletions

2
.gitattributes vendored Normal file
View File

@@ -0,0 +1,2 @@
# Set the default behavior, in case people don't have core.autocrlf set.
* text=auto

28
.github/contributing.md vendored Normal file
View File

@@ -0,0 +1,28 @@
When submitting a patch make sure the commits in your pull request:
* Have good commit messages
Please read Chris Beams' blog post [How to Write a Git Commit
Message](https://chris.beams.io/posts/git-commit/) on how to write a
good commit message. Although the article recommends at most 50
characters for the subject, up to 72 characters are frequently
accepted for fast-export.
* Adhere to good [commit
hygiene](http://www.ericbmerritt.com/2011/09/21/commit-hygiene-and-git.html)
When developing a pull request for hg-fast-export, base your work on
the current `master` branch and rebase your work if it no longer can
be merged into the current `master` without conflicts. Never merge
`master` into your development branch, rebase if your work needs
updates from `master`.
When a pull request is modified due to review feedback, please
incorporate the changes into the proper commit. A good reference on
how to modify history is in the [Pro Git book, Section
7.6](https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History).
Please do not submit a pull request if you are not willing to spend
the time required to address review comments or revise the patch until
it follows the guidelines above. A _take it or leave it_ approach to
contributing wastes both your and the maintainer's time.

71
.github/workflows/codeql-analysis.yml vendored Normal file
View File

@@ -0,0 +1,71 @@
# For most projects, this workflow file will not need changing; you simply need
# to commit it to your repository.
#
# You may wish to alter this file to override the set of languages analyzed,
# or to provide custom queries or build logic.
name: "CodeQL"
on:
push:
branches: [master]
pull_request:
# The branches below must be a subset of the branches above
branches: [master]
schedule:
- cron: '0 15 * * 4'
jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
# Override automatic language detection by changing the below list
# Supported options are ['csharp', 'cpp', 'go', 'java', 'javascript', 'python']
language: ['python']
# Learn more...
# https://docs.github.com/en/github/finding-security-vulnerabilities-and-errors-in-your-code/configuring-code-scanning#overriding-automatic-language-detection
steps:
- name: Checkout repository
uses: actions/checkout@v2
with:
# We must fetch at least the immediate parents so that if this is
# a pull request then we can checkout the head.
fetch-depth: 2
# If this run was triggered by a pull request event, then checkout
# the head of the pull request instead of the merge commit.
- run: git checkout HEAD^2
if: ${{ github.event_name == 'pull_request' }}
# Initializes the CodeQL tools for scanning.
- name: Initialize CodeQL
uses: github/codeql-action/init@v1
with:
languages: ${{ matrix.language }}
# If you wish to specify custom queries, you can do so here or in a config file.
# By default, queries listed here will override any specified in a config file.
# Prefix the list here with "+" to use these queries and those in the config file.
# queries: ./path/to/local/query, your-org/your-repo/queries@main
# Autobuild attempts to build any compiled languages (C/C++, C#, or Java).
# If this step fails, then you should remove it and run the build manually (see below)
- name: Autobuild
uses: github/codeql-action/autobuild@v1
# Command-line programs to run using the OS shell.
# 📚 https://git.io/JvXDl
# ✏️ If the Autobuild fails above, remove it and uncomment the following three lines
# and modify them (or add more) to build your code if your project
# uses a compiled language
#- run: |
# make bootstrap
# make release
- name: Perform CodeQL Analysis
uses: github/codeql-action/analyze@v1

2
.gitignore vendored
View File

@@ -1,2 +1,4 @@
*.orig
*.pyc
.dotest
.idea/

75
README-SUBMODULES.md Normal file
View File

@@ -0,0 +1,75 @@
# How to convert Mercurial Repositories with subrepos
## Introduction
hg-fast-export supports migrating mercurial subrepositories in the
repository being converted into git submodules in the converted repository.
Git submodules must be git repositories while mercurial's subrepositories can
be git, mercurial or subversion repositories. hg-fast-export will handle any
git subrepositories automatically, any other kinds must first be converted
to git repositories. Currently hg-fast-export does not support the conversion
of subversion subrepositories. The rest of this page covers the conversion of
mercurial subrepositories which require some manual steps:
The first step for mercurial subrepositories involves converting the
subrepository into a git repository using hg-fast-export. When all
subrepositories have been converted, a mapping file that maps the mercurial
subrepository path to a converted git submodule path must be created. The
format for this file is:
"<mercurial subrepo path>"="<git submodule path>"
"<mercurial subrepo path2>"="<git submodule path2>"
...
The path of this mapping file is then provided with the --subrepo-map
command line option.
## Example
Example mercurial repo folder structure (~/mercurial) containing two subrepos:
src/...
subrepos/subrepo1
subrepos/subrepo2
### Setup
Create an empty new folder where all the converted git modules will be imported:
mkdir ~/imported-gits
cd ~/imported-gits
### Convert all submodules to git:
mkdir submodule1
cd submodule1
git init
hg-fast-export.sh -r ~/mercurial/subrepos/subrepo1
cd ..
mkdir submodule2
cd submodule2
git init
hg-fast-export.sh -r ~/mercurial/subrepos/subrepo2
### Create mapping file
cd ~/imported-gits
cat > submodule-mappings << EOF
"subrepos/subrepo1"="../submodule1"
"subrepos/subrepo2"="../submodule2"
EOF
### Convert main repository
cd ~/imported-gits
mkdir git-main-repo
cd git-main-repo
git init
hg-fast-export.sh -r ~/mercurial --subrepo-map=~/imported-gits/submodule-mappings
### Result
The resulting repository will now contain the submodules at the paths
`subrepos/subrepo1` and `subrepos/subrepo2`. The created .gitmodules
file will look like:
[submodule "subrepos/subrepo1"]
path = subrepos/subrepo1
url = ../submodule1
[submodule "subrepos/subrepo2"]
path = subrepos/subrepo2
url = ../submodule2

229
README.md
View File

@@ -1,29 +1,42 @@
hg-fast-export.(sh|py) - mercurial to git converter using git-fast-import
hg-fast-export.sh - mercurial to git converter using git-fast-import
=========================================================================
Legal
-----
Most hg-* scripts are licensed under the [MIT license]
(http://www.opensource.org/licenses/mit-license.php) and were written
Most hg-* scripts are licensed under the [MIT license] and were written
by Rocco Rutte <pdmef@gmx.net> with hints and help from the git list and
\#mercurial on freenode. hg-reset.py is licensed under GPLv2 since it
copies some code from the mercurial sources.
The current maintainer is Frej Drejhammar <frej.drejhammar@gmail.com>.
[MIT license]: http://www.opensource.org/licenses/mit-license.php
Support
-------
If you have problems with hg-fast-export or have found a bug, please
create an issue at the [github issue tracker]
(https://github.com/frej/fast-export/issues). Before creating a new
create an issue at the [github issue tracker]. Before creating a new
issue, check that your problem has not already been addressed in an
already closed issue. Do not contact the maintainer directly unless
you want to report a security bug. That way the next person having the
same problem can benefit from the time spent solving the problem the
first time.
[github issue tracker]: https://github.com/frej/fast-export/issues
System Requirements
-------------------
This project depends on Python 2.7 or 3.5+, and the Mercurial >= 4.6
package (>= 5.2, if Python 3.5+). If Python is not installed, install
it before proceeding. The Mercurial package can be installed with `pip
install mercurial`.
On windows the bash that comes with "Git for Windows" is known to work
well.
Usage
-----
@@ -67,10 +80,10 @@ author information than git, an author mapping file can be given to
hg-fast-export to fix up malformed author strings. The file is
specified using the -A option. The file should contain lines of the
form `"<key>"="<value>"`. Inside the key and value strings, all escape
sequences understood by the python `string_escape` encoding are
supported. (Versions of fast-export prior to v171002 had a different
syntax, the old syntax can be enabled by the flag
`--mappings-are-raw`.)
sequences understood by the python `unicode_escape` encoding are
supported; strings are otherwise assumed to be UTF8-encoded.
(Versions of fast-export prior to v171002 had a different syntax, the
old syntax can be enabled by the flag `--mappings-are-raw`.)
The example authors.map below will translate `User
<garbage<tab><user@example.com>` to `User <user@example.com>`.
@@ -81,6 +94,9 @@ The example authors.map below will translate `User
-- End of authors.map --
```
If you have many Mercurial repositories, Chris J Billington's
[hg-export-tool] allows you to batch convert them.
Tag and Branch Naming
---------------------
@@ -89,12 +105,116 @@ name the -B and -T options allow a mapping file to be specified to
rename branches and tags (respectively). The syntax of the mapping
file is the same as for the author mapping.
When the -B and -T flags are used, you will probably want to use the
-n flag to disable the built-in (broken in many cases) sanitizing of
branch/tag names. In the future -n will become the default, but in
order to not break existing incremental conversions, the default
remains with the old behavior.
By default, the `default` mercurial branch is renamed to the `master`
branch on git. If your mercurial repo contains both `default` and
`master` branches, you'll need to override this behavior. Use
`-M <newName>` to specify what name to give the `default` branch.
Content filtering
-----------------
hg-fast-export supports filtering the content of exported files.
The filter is supplied to the --filter-contents option. hg-fast-export
runs the filter for each exported file, pipes its content to the filter's
standard input, and uses the filter's standard output in place
of the file's original content. The prototypical use of this feature
is to convert line endings in text files from CRLF to git's preferred LF:
```
-- Start of crlf-filter.sh --
#!/bin/sh
# $1 = pathname of exported file relative to the root of the repo
# $2 = Mercurial's hash of the file
# $3 = "1" if Mercurial reports the file as binary, otherwise "0"
if [ "$3" == "1" ]; then cat; else dos2unix -q; fi
# -q option in call to dos2unix allows to avoid returning an
# error code when handling non-ascii based text files (like UTF-16
# encoded text files)
-- End of crlf-filter.sh --
```
Plugins
-----------------
hg-fast-export supports plugins to manipulate the file data and commit
metadata. The plugins are enabled with the --plugin option. The value
of said option is a plugin name (by folder in the plugins directory),
and optionally, and equals-sign followed by an initialization string.
There is a readme accompanying each of the bundled plugins, with a
description of the usage. To create a new plugin, one must simply
add a new folder under the `plugins` directory, with the name of the
new plugin. Inside, there must be an `__init__.py` file, which contains
at a minimum:
```
def build_filter(args):
return Filter(args)
class Filter:
def __init__(self, args):
pass
#Or don't pass, if you want to do some init code here
```
Beyond the boilerplate initialization, you can see the two different
defined filter methods in the [dos2unix](./plugins/dos2unix) and
[branch_name_in_commit](./plugins/branch_name_in_commit) plugins.
```
commit_data = {'branch': branch, 'parents': parents, 'author': author, 'desc': desc, 'revision': revision, 'hg_hash': hg_hash, 'committer': 'committer', 'extra': extra}
def commit_message_filter(self,commit_data):
```
The `commit_message_filter` method is called for each commit, after parsing
from hg, but before outputting to git. The dictionary `commit_data` contains the
above attributes about the commit, and can be modified by any filter. The
values in the dictionary after filters have been run are used to create the git
commit.
```
file_data = {'filename':filename,'file_ctx':file_ctx,'d':d}
def file_data_filter(self,file_data):
```
The `file_data_filter` method is called for each file within each commit.
The dictionary `file_data` contains the above attributes about the file, and
can be modified by any filter. `file_ctx` is the filecontext from the
mercurial python library. After all filters have been run, the values
are used to add the file to the git commit.
Submodules
----------
See README-SUBMODULES.md for how to convert subrepositories into git
submodules.
Notes/Limitations
-----------------
hg-fast-export supports multiple branches but only named branches with
exactly one head each. Otherwise commits to the tip of these heads
within the branch will get flattened into merge commits.
within the branch will get flattened into merge commits. There are a
few options to deal with this:
1. Chris J Billington's [hg-export-tool] can help you to handle branches with
duplicate heads.
2. Use the [head2branch plugin](./plugins/head2branch) to create a new named
branch from an unnamed head.
3. You can ignore unnamed heads with the `-ignore-unnamed-heads` option, which
is appropriate in situations such as the extra heads being close commits
(abandoned, unmerged changes).
hg-fast-export will ignore any files or directories tracked by mercurial
called `.git`, and will print a warning if it encounters one. Git cannot
track such files or directories. This is not to be confused with submodules,
which are described in README-SUBMODULES.md.
As each git-fast-import run creates a new pack file, it may be
required to repack the repository quite often for incremental imports
@@ -108,8 +228,8 @@ possible to use hg-fast-export on remote repositories
Design
------
hg-fast-export.py was designed in a way that doesn't require a 2-pass
mechanism or any prior repository analysis: if just feeds what it
hg-fast-export was designed in a way that doesn't require a 2-pass
mechanism or any prior repository analysis: it just feeds what it
finds into git-fast-import. This also implies that it heavily relies
on strictly linear ordering of changesets from hg, i.e. its
append-only storage model so that changesets hg-fast-export already
@@ -118,6 +238,85 @@ saw never get modified.
Submitting Patches
------------------
Please use the issue-tracker at github
https://github.com/frej/fast-export to report bugs and submit
patches.
Please create a pull request at
[Github](https://github.com/frej/fast-export/pulls) to submit patches.
When submitting a patch make sure the commits in your pull request:
* Have good commit messages
Please read Chris Beams' blog post [How to Write a Git Commit
Message](https://chris.beams.io/posts/git-commit/) on how to write a
good commit message. Although the article recommends at most 50
characters for the subject, up to 72 characters are frequently
accepted for fast-export.
* Adhere to good [commit
hygiene](http://www.ericbmerritt.com/2011/09/21/commit-hygiene-and-git.html)
When developing a pull request for hg-fast-export, base your work on
the current `master` branch and rebase your work if it no longer can
be merged into the current `master` without conflicts. Never merge
`master` into your development branch, rebase if your work needs
updates from `master`.
When a pull request is modified due to review feedback, please
incorporate the changes into the proper commit. A good reference on
how to modify history is in the [Pro Git book, Section
7.6](https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History).
Please do not submit a pull request if you are not willing to spend
the time required to address review comments or revise the patch until
it follows the guidelines above. A _take it or leave it_ approach to
contributing wastes both your and the maintainer's time.
Frequent Problems
=================
* git fast-import crashes with: `error: cannot lock ref 'refs/heads/...`
Branch names in git behave as file names (as they are just files and
sub-directories under `refs/heads/`, and a path cannot name both a
file and a directory, i.e. the branches `a` and `a/b` can never
exist at the same time in a git repo.
Use a mapping file to rename the troublesome branch names.
* `Branch [<branch-name>] modified outside hg-fast-export` but I have
not touched the repo!
If you are running fast-export on a case-preserving but
case-insensitive file system (Windows and OSX), this will make git
treat `A` and `a` as the same branch. The solution is to use a
mapping file to rename branches which only differ in case.
* My mapping file does not seem to work when I rename the branch `git
fast-import` crashes on!
fast-export (imperfectly) mangles branch names it thinks won't be
valid. The mechanism cannot be removed as it would break already
existing incremental imports that expects it. When fast export
mangles a name, it prints out a warning of the form `Warning:
sanitized branch [<unmangled>] to [<mangled>]`. If `git fast-import`
crashes on `<mangled>`, you need to put `<unmangled>` into the
mapping file.
* fast-import mangles valid git branch names which I have remapped!
Use the `-n` flag to hg-fast-export.sh.
* `git status` reports that all files are scheduled for deletion after
the initial conversion.
By design fast export does not touch your working directory, so to
git it looks like you have deleted all files, when in fact they have
never been checked out. Just do a checkout of the branch you want.
* `Error: repository has at least one unnamed head: hg r<N>`
By design, hg-fast-export cannot deal with extra heads on a branch.
There are a few options depending on whether the extra heads are
in-use/open or normally closed. See [Notes/Limitations](#noteslimitations)
section for more details.
[hg-export-tool]: https://github.com/chrisjbillington/hg-export-tool

View File

@@ -1,17 +1,23 @@
#!/usr/bin/env python
#!/usr/bin/env python2
# Copyright (c) 2007, 2008 Rocco Rutte <pdmef@gmx.net> and others.
# License: MIT <http://www.opensource.org/licenses/mit-license.php>
from mercurial import node
from mercurial.scmutil import revsymbol
from hg2git import setup_repo,fixup_user,get_branch,get_changeset
from hg2git import load_cache,save_cache,get_git_sha1,set_default_branch,set_origin_name
from optparse import OptionParser
import re
import sys
import os
from binascii import hexlify
import pluginloader
PY2 = sys.version_info.major == 2
if PY2:
str = unicode
if sys.platform == "win32":
if PY2 and sys.platform == "win32":
# On Windows, sys.stdout is initially opened in text mode, which means that
# when a LF (\n) character is written to sys.stdout, it will be converted
# into CRLF (\r\n). That makes git blow up, so use this platform-specific
@@ -20,36 +26,47 @@ if sys.platform == "win32":
msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
# silly regex to catch Signed-off-by lines in log message
sob_re=re.compile('^Signed-[Oo]ff-[Bb]y: (.+)$')
sob_re=re.compile(b'^Signed-[Oo]ff-[Bb]y: (.+)$')
# insert 'checkpoint' command after this many commits or none at all if 0
cfg_checkpoint_count=0
# write some progress message every this many file contents written
cfg_export_boundary=1000
subrepo_cache={}
submodule_mappings=None
# True if fast export should automatically try to sanitize
# author/branch/tag names.
auto_sanitize = None
stdout_buffer = sys.stdout if PY2 else sys.stdout.buffer
stderr_buffer = sys.stderr if PY2 else sys.stderr.buffer
def gitmode(flags):
return 'l' in flags and '120000' or 'x' in flags and '100755' or '100644'
return b'l' in flags and b'120000' or b'x' in flags and b'100755' or b'100644'
def wr_no_nl(msg=''):
def wr_no_nl(msg=b''):
assert isinstance(msg, bytes)
if msg:
sys.stdout.write(msg)
stdout_buffer.write(msg)
def wr(msg=''):
def wr(msg=b''):
wr_no_nl(msg)
sys.stdout.write('\n')
stdout_buffer.write(b'\n')
#map(lambda x: sys.stderr.write('\t[%s]\n' % x),msg.split('\n'))
def checkpoint(count):
count=count+1
if cfg_checkpoint_count>0 and count%cfg_checkpoint_count==0:
sys.stderr.write("Checkpoint after %d commits\n" % count)
wr('checkpoint')
stderr_buffer.write(b"Checkpoint after %d commits\n" % count)
wr(b'checkpoint')
wr()
return count
def revnum_to_revref(rev, old_marks):
"""Convert an hg revnum to a git-fast-import rev reference (an SHA1
or a mark)"""
return old_marks.get(rev) or ':%d' % (rev+1)
return old_marks.get(rev) or b':%d' % (rev+1)
def file_mismatch(f1,f2):
"""See if two revisions of a file are not equal."""
@@ -78,7 +95,7 @@ def get_filechanges(repo,revision,parents,mleft):
l,c,r=[],[],[]
for p in parents:
if p<0: continue
mright=repo.changectx(p).manifest()
mright=revsymbol(repo,b"%d" %p).manifest()
l,c,r=split_dict(mleft,mright,l,c,r)
l.sort()
c.sort()
@@ -101,7 +118,7 @@ def get_author(logmessage,committer,authors):
"Signed-off-by: foo" and thus matching our detection regex. Prevent
that."""
loglines=logmessage.split('\n')
loglines=logmessage.split(b'\n')
i=len(loglines)
# from tail walk to top skipping empty lines
while i>=0:
@@ -122,28 +139,108 @@ def get_author(logmessage,committer,authors):
return r
return committer
def export_file_contents(ctx,manifest,files,hgtags,encoding=''):
def remove_gitmodules(ctx):
"""Removes all submodules of ctx parents"""
# Removing all submoduies coming from all parents is safe, as the submodules
# of the current commit will be re-added below. A possible optimization would
# be to only remove the submodules of the first parent.
for parent_ctx in ctx.parents():
for submodule in parent_ctx.substate.keys():
wr(b'D %s' % submodule)
wr(b'D .gitmodules')
def refresh_git_submodule(name,subrepo_info):
wr(b'M 160000 %s %s' % (subrepo_info[1],name))
stderr_buffer.write(
b"Adding/updating submodule %s, revision %s\n" % (name, subrepo_info[1])
)
return b'[submodule "%s"]\n\tpath = %s\n\turl = %s\n' % (name, name, subrepo_info[0])
def refresh_hg_submodule(name,subrepo_info):
gitRepoLocation=submodule_mappings[name] + b"/.git"
# Populate the cache to map mercurial revision to git revision
if not name in subrepo_cache:
subrepo_cache[name]=(load_cache(gitRepoLocation+b"/hg2git-mapping"),
load_cache(gitRepoLocation+b"/hg2git-marks",
lambda s: int(s)-1))
(mapping_cache,marks_cache)=subrepo_cache[name]
subrepo_hash=subrepo_info[1]
if subrepo_hash in mapping_cache:
revnum=mapping_cache[subrepo_hash]
gitSha=marks_cache[int(revnum)]
wr(b'M 160000 %s %s' % (gitSha,name))
stderr_buffer.write(
b"Adding/updating submodule %s, revision %s->%s\n"
% (name, subrepo_hash, gitSha)
)
return b'[submodule "%s"]\n\tpath = %s\n\turl = %s\n' % (name,name,
submodule_mappings[name])
else:
stderr_buffer.write(
b"Warning: Could not find hg revision %s for %s in git %s\n"
% (subrepo_hash, name, gitRepoLocation,)
)
return b''
def refresh_gitmodules(ctx):
"""Updates list of ctx submodules according to .hgsubstate file"""
remove_gitmodules(ctx)
gitmodules=b""
# Create the .gitmodules file and all submodules
for name,subrepo_info in ctx.substate.items():
if subrepo_info[2]==b'git':
gitmodules+=refresh_git_submodule(name,subrepo_info)
elif submodule_mappings and name in submodule_mappings:
gitmodules+=refresh_hg_submodule(name,subrepo_info)
if len(gitmodules):
wr(b'M 100644 inline .gitmodules')
wr(b'data %d' % (len(gitmodules)+1))
wr(gitmodules)
def export_file_contents(ctx,manifest,files,hgtags,encoding='',plugins={}):
count=0
max=len(files)
is_submodules_refreshed=False
for file in files:
if not is_submodules_refreshed and (file==b'.hgsub' or file==b'.hgsubstate'):
is_submodules_refreshed=True
refresh_gitmodules(ctx)
# Skip .hgtags files. They only get us in trouble.
if not hgtags and file == ".hgtags":
sys.stderr.write('Skip %s\n' % (file))
if not hgtags and file == b".hgtags":
stderr_buffer.write(b'Skip %s\n' % file)
continue
d=ctx.filectx(file).data()
if encoding:
filename=file.decode(encoding).encode('utf8')
else:
filename=file
wr('M %s inline %s' % (gitmode(manifest.flags(file)),
if b'.git' in filename.split(b'/'): # Even on Windows, the path separator is / here.
stderr_buffer.write(
b'Ignoring file %s which cannot be tracked by git\n' % filename
)
continue
file_ctx=ctx.filectx(file)
d=file_ctx.data()
if plugins and plugins['file_data_filters']:
file_data = {'filename':filename,'file_ctx':file_ctx,'data':d}
for filter in plugins['file_data_filters']:
filter(file_data)
d=file_data['data']
filename=file_data['filename']
file_ctx=file_data['file_ctx']
wr(b'M %s inline %s' % (gitmode(manifest.flags(file)),
strip_leading_slash(filename)))
wr('data %d' % len(d)) # had some trouble with size()
wr(b'data %d' % len(d)) # had some trouble with size()
wr(d)
count+=1
if count%cfg_export_boundary==0:
sys.stderr.write('Exported %d/%d files\n' % (count,max))
stderr_buffer.write(b'Exported %d/%d files\n' % (count,max))
if max>cfg_export_boundary:
sys.stderr.write('Exported %d/%d files\n' % (count,max))
stderr_buffer.write(b'Exported %d/%d files\n' % (count,max))
def sanitize_name(name,what="branch", mapping={}):
"""Sanitize input roughly according to git-check-ref-format(1)"""
@@ -163,54 +260,76 @@ def sanitize_name(name,what="branch", mapping={}):
def dot(name):
if not name: return name
if name[0] == '.': return '_'+name[1:]
if name[0:1] == b'.': return b'_'+name[1:]
return name
if not auto_sanitize:
return mapping.get(name,name)
n=mapping.get(name,name)
p=re.compile('([[ ~^:?\\\\*]|\.\.)')
n=p.sub('_', n)
if n[-1] in ('/', '.'): n=n[:-1]+'_'
n='/'.join(map(dot,n.split('/')))
p=re.compile('_+')
n=p.sub('_', n)
p=re.compile(b'([\\[ ~^:?\\\\*]|\.\.)')
n=p.sub(b'_', n)
if n[-1:] in (b'/', b'.'): n=n[:-1]+b'_'
n=b'/'.join([dot(s) for s in n.split(b'/')])
p=re.compile(b'_+')
n=p.sub(b'_', n)
if n!=name:
sys.stderr.write('Warning: sanitized %s [%s] to [%s]\n' % (what,name,n))
stderr_buffer.write(
b'Warning: sanitized %s [%s] to [%s]\n' % (what.encode(), name, n)
)
return n
def strip_leading_slash(filename):
if filename[0] == '/':
if filename[0:1] == b'/':
return filename[1:]
return filename
def export_commit(ui,repo,revision,old_marks,max,count,authors,
branchesmap,sob,brmap,hgtags,encoding='',fn_encoding=''):
branchesmap,sob,brmap,hgtags,encoding='',fn_encoding='',
plugins={}):
def get_branchname(name):
if brmap.has_key(name):
if name in brmap:
return brmap[name]
n=sanitize_name(name, "branch", branchesmap)
brmap[name]=n
return n
(revnode,_,user,(time,timezone),files,desc,branch,_)=get_changeset(ui,repo,revision,authors,encoding)
(revnode,_,user,(time,timezone),files,desc,branch,extra)=get_changeset(ui,repo,revision,authors,encoding)
if repo[revnode].hidden():
return count
branch=get_branchname(branch)
parents = [p for p in repo.changelog.parentrevs(revision) if p >= 0]
author = get_author(desc,user,authors)
hg_hash=revsymbol(repo,b"%d" % revision).hex()
if plugins and plugins['commit_message_filters']:
commit_data = {'branch': branch, 'parents': parents,
'author': author, 'desc': desc,
'revision': revision, 'hg_hash': hg_hash,
'committer': user, 'extra': extra}
for filter in plugins['commit_message_filters']:
filter(commit_data)
branch = commit_data['branch']
parents = commit_data['parents']
author = commit_data['author']
user = commit_data['committer']
desc = commit_data['desc']
if len(parents)==0 and revision != 0:
wr('reset refs/heads/%s' % branch)
wr(b'reset refs/heads/%s' % branch)
wr('commit refs/heads/%s' % branch)
wr('mark :%d' % (revision+1))
wr(b'commit refs/heads/%s' % branch)
wr(b'mark :%d' % (revision+1))
if sob:
wr('author %s %d %s' % (get_author(desc,user,authors),time,timezone))
wr('committer %s %d %s' % (user,time,timezone))
wr('data %d' % (len(desc)+1)) # wtf?
wr(b'author %s %d %s' % (author,time,timezone))
wr(b'committer %s %d %s' % (user,time,timezone))
wr(b'data %d' % (len(desc)+1)) # wtf?
wr(desc)
wr()
ctx=repo.changectx(str(revision))
ctx=revsymbol(repo, b"%d" % revision)
man=ctx.manifest()
added,changed,removed,type=[],[],[],''
@@ -220,88 +339,91 @@ def export_commit(ui,repo,revision,old_marks,max,count,authors,
added.sort()
type='full'
else:
wr('from %s' % revnum_to_revref(parents[0], old_marks))
wr(b'from %s' % revnum_to_revref(parents[0], old_marks))
if len(parents) == 1:
# later non-merge revision: feed in changed manifest
# if we have exactly one parent, just take the changes from the
# manifest without expensively comparing checksums
f=repo.status(repo.lookup(parents[0]),revnode)[:3]
added,changed,removed=f[1],f[0],f[2]
f=repo.status(parents[0],revnode)
added,changed,removed=f.added,f.modified,f.removed
type='simple delta'
else: # a merge with two parents
wr('merge %s' % revnum_to_revref(parents[1], old_marks))
wr(b'merge %s' % revnum_to_revref(parents[1], old_marks))
# later merge revision: feed in changed manifest
# for many files comparing checksums is expensive so only do it for
# merges where we really need it due to hg's revlog logic
added,changed,removed=get_filechanges(repo,revision,parents,man)
type='thorough delta'
sys.stderr.write('%s: Exporting %s revision %d/%d with %d/%d/%d added/changed/removed files\n' %
(branch,type,revision+1,max,len(added),len(changed),len(removed)))
stderr_buffer.write(
b'%s: Exporting %s revision %d/%d with %d/%d/%d added/changed/removed files\n'
% (branch, type.encode(), revision + 1, max, len(added), len(changed), len(removed))
)
if fn_encoding:
removed=[r.decode(fn_encoding).encode('utf8') for r in removed]
for filename in removed:
if fn_encoding:
filename=filename.decode(fn_encoding).encode('utf8')
filename=strip_leading_slash(filename)
if filename==b'.hgsub':
remove_gitmodules(ctx)
wr(b'D %s' % filename)
removed=[strip_leading_slash(x) for x in removed]
map(lambda r: wr('D %s' % r),removed)
export_file_contents(ctx,man,added,hgtags,fn_encoding)
export_file_contents(ctx,man,changed,hgtags,fn_encoding)
export_file_contents(ctx,man,added,hgtags,fn_encoding,plugins)
export_file_contents(ctx,man,changed,hgtags,fn_encoding,plugins)
wr()
return checkpoint(count)
def export_note(ui,repo,revision,count,authors,encoding,is_first):
(revnode,_,user,(time,timezone),_,_,_,_)=get_changeset(ui,repo,revision,authors,encoding)
if repo[revnode].hidden():
return count
parents = [p for p in repo.changelog.parentrevs(revision) if p >= 0]
wr('commit refs/notes/hg')
wr('committer %s %d %s' % (user,time,timezone))
wr('data 0')
wr(b'commit refs/notes/hg')
wr(b'committer %s %d %s' % (user,time,timezone))
wr(b'data 0')
if is_first:
wr('from refs/notes/hg^0')
wr('N inline :%d' % (revision+1))
hg_hash=repo.changectx(str(revision)).hex()
wr('data %d' % (len(hg_hash)))
wr(b'from refs/notes/hg^0')
wr(b'N inline :%d' % (revision+1))
hg_hash=revsymbol(repo,b"%d" % revision).hex()
wr(b'data %d' % (len(hg_hash)))
wr_no_nl(hg_hash)
wr()
return checkpoint(count)
wr('data %d' % (len(desc)+1)) # wtf?
wr(desc)
wr()
def export_tags(ui,repo,old_marks,mapping_cache,count,authors,tagsmap):
l=repo.tagslist()
for tag,node in l:
# Remap the branch name
tag=sanitize_name(tag,"tag",tagsmap)
# ignore latest revision
if tag=='tip': continue
if tag==b'tip': continue
# ignore tags to nodes that are missing (ie, 'in the future')
if node.encode('hex_codec') not in mapping_cache:
sys.stderr.write('Tag %s refers to unseen node %s\n' % (tag, node.encode('hex_codec')))
if hexlify(node) not in mapping_cache:
stderr_buffer.write(b'Tag %s refers to unseen node %s\n' % (tag, hexlify(node)))
continue
rev=int(mapping_cache[node.encode('hex_codec')])
rev=int(mapping_cache[hexlify(node)])
ref=revnum_to_revref(rev, old_marks)
if ref==None:
sys.stderr.write('Failed to find reference for creating tag'
' %s at r%d\n' % (tag,rev))
stderr_buffer.write(
b'Failed to find reference for creating tag %s at r%d\n' % (tag, rev)
)
continue
sys.stderr.write('Exporting tag [%s] at [hg r%d] [git %s]\n' % (tag,rev,ref))
wr('reset refs/tags/%s' % tag)
wr('from %s' % ref)
stderr_buffer.write(b'Exporting tag [%s] at [hg r%d] [git %s]\n' % (tag, rev, ref))
wr(b'reset refs/tags/%s' % tag)
wr(b'from %s' % ref)
wr()
count=checkpoint(count)
return count
def load_mapping(name, filename, mapping_is_raw):
raw_regexp=re.compile('^([^=]+)[ ]*=[ ]*(.+)$')
string_regexp='"(((\\.)|(\\")|[^"])*)"'
quoted_regexp=re.compile('^'+string_regexp+'[ ]*=[ ]*'+string_regexp+'$')
raw_regexp=re.compile(b'^([^=]+)[ ]*=[ ]*(.+)$')
string_regexp=b'"(((\\.)|(\\")|[^"])*)"'
quoted_regexp=re.compile(b'^'+string_regexp+b'[ ]*=[ ]*'+string_regexp+b'$')
def parse_raw_line(line):
m=raw_regexp.match(line)
@@ -309,26 +431,34 @@ def load_mapping(name, filename, mapping_is_raw):
return None
return (m.group(1).strip(), m.group(2).strip())
def process_unicode_escape_sequences(s):
# Replace unicode escape sequences in the otherwise UTF8-encoded bytestring s with
# the UTF8-encoded characters they represent. We need to do an additional
# .decode('utf8').encode('unicode-escape') to convert any non-ascii characters into
# their escape sequences so that the subsequent .decode('unicode-escape') succeeds:
return s.decode('utf8').encode('unicode-escape').decode('unicode-escape').encode('utf8')
def parse_quoted_line(line):
m=quoted_regexp.match(line)
if m==None:
return None
return (m.group(1).decode('string_escape'),
m.group(5).decode('string_escape'))
return
return (process_unicode_escape_sequences(m.group(1)),
process_unicode_escape_sequences(m.group(5)))
cache={}
if not os.path.exists(filename):
sys.stderr.write('Could not open mapping file [%s]\n' % (filename))
return cache
f=open(filename,'r')
f=open(filename,'rb')
l=0
a=0
for line in f.readlines():
l+=1
line=line.strip()
if l==1 and line[0]=='#' and line=='# quoted-escaped-strings':
if l==1 and line[0:1]==b'#' and line==b'# quoted-escaped-strings':
continue
elif line=='' or line[0]=='#':
elif line==b'' or line[0:1]==b'#':
continue
m=parse_raw_line(line) if mapping_is_raw else parse_quoted_line(line)
if m==None:
@@ -350,7 +480,7 @@ def branchtip(repo, heads):
break
return tip
def verify_heads(ui,repo,cache,force,branchesmap):
def verify_heads(ui,repo,cache,force,ignore_unnamed_heads,branchesmap):
branches={}
for bn, heads in repo.branchmap().iteritems():
branches[bn] = branchtip(repo, heads)
@@ -363,26 +493,38 @@ def verify_heads(ui,repo,cache,force,branchesmap):
sanitized_name=sanitize_name(b,"branch",branchesmap)
sha1=get_git_sha1(sanitized_name)
c=cache.get(sanitized_name)
if sha1!=c:
sys.stderr.write('Error: Branch [%s] modified outside hg-fast-export:'
'\n%s (repo) != %s (cache)\n' % (b,sha1,c))
if not c and sha1:
stderr_buffer.write(
b'Error: Branch [%s] already exists and was not created by hg-fast-export, '
b'export would overwrite unrelated branch\n' % b)
if not force: return False
elif sha1!=c:
stderr_buffer.write(
b'Error: Branch [%s] modified outside hg-fast-export:'
b'\n%s (repo) != %s (cache)\n' % (b, b'<None>' if sha1 is None else sha1, c)
)
if not force: return False
# verify that branch has exactly one head
t={}
for h in repo.heads():
unnamed_heads=False
for h in repo.filtered(b'visible').heads():
(_,_,_,_,_,_,branch,_)=get_changeset(ui,repo,h)
if t.get(branch,False):
sys.stderr.write('Error: repository has at least one unnamed head: hg r%s\n' %
repo.changelog.rev(h))
if not force: return False
stderr_buffer.write(
b'Error: repository has an unnamed head: hg r%d\n'
% repo.changelog.rev(h)
)
unnamed_heads=True
if not force and not ignore_unnamed_heads: return False
t[branch]=True
if unnamed_heads and not force and not ignore_unnamed_heads: return False
return True
def hg2git(repourl,m,marksfile,mappingfile,headsfile,tipfile,
authors={},branchesmap={},tagsmap={},
sob=False,force=False,hgtags=False,notes=False,encoding='',fn_encoding=''):
sob=False,force=False,ignore_unnamed_heads=False,hgtags=False,notes=False,encoding='',fn_encoding='',
plugins={}):
def check_cache(filename, contents):
if len(contents) == 0:
sys.stderr.write('Warning: %s does not contain any data, this will probably make an incremental import fail\n' % filename)
@@ -402,7 +544,7 @@ def hg2git(repourl,m,marksfile,mappingfile,headsfile,tipfile,
ui,repo=setup_repo(repourl)
if not verify_heads(ui,repo,heads_cache,force,branchesmap):
if not verify_heads(ui,repo,heads_cache,force,ignore_unnamed_heads,branchesmap):
return 1
try:
@@ -410,27 +552,41 @@ def hg2git(repourl,m,marksfile,mappingfile,headsfile,tipfile,
except AttributeError:
tip=len(repo)
min=int(state_cache.get('tip',0))
min=int(state_cache.get(b'tip',0))
max=_max
if _max<0 or max>tip:
max=tip
for rev in range(0,max):
(revnode,_,_,_,_,_,_,_)=get_changeset(ui,repo,rev,authors)
mapping_cache[revnode.encode('hex_codec')] = str(rev)
(revnode,_,_,_,_,_,_,_)=get_changeset(ui,repo,rev,authors)
if repo[revnode].hidden():
continue
mapping_cache[hexlify(revnode)] = b"%d" % rev
if submodule_mappings:
# Make sure that all mercurial submodules are registered in the submodule-mappings file
for rev in range(0,max):
ctx=revsymbol(repo,b"%d" % rev)
if ctx.hidden():
continue
if ctx.substate:
for key in ctx.substate:
if ctx.substate[key][2]=='hg' and key not in submodule_mappings:
sys.stderr.write("Error: %s not found in submodule-mappings\n" % (key))
return 1
c=0
brmap={}
for rev in range(min,max):
c=export_commit(ui,repo,rev,old_marks,max,c,authors,branchesmap,
sob,brmap,hgtags,encoding,fn_encoding)
sob,brmap,hgtags,encoding,fn_encoding,
plugins)
if notes:
for rev in range(min,max):
c=export_note(ui,repo,rev,c,authors, encoding, rev == min and min != 0)
state_cache['tip']=max
state_cache['repo']=repourl
state_cache[b'tip']=max
state_cache[b'repo']=repourl
save_cache(tipfile,state_cache)
save_cache(mappingfile,mapping_cache)
@@ -448,6 +604,9 @@ if __name__=='__main__':
parser=OptionParser()
parser.add_option("-n", "--no-auto-sanitize",action="store_false",
dest="auto_sanitize",default=True,
help="Do not perform built-in (broken in many cases) sanitizing of names")
parser.add_option("-m","--max",type="int",dest="max",
help="Maximum hg revision to import")
parser.add_option("--mapping",dest="mappingfile",
@@ -471,7 +630,9 @@ if __name__=='__main__':
parser.add_option("-T","--tags",dest="tagsfile",
help="Read tags map from TAGSFILE")
parser.add_option("-f","--force",action="store_true",dest="force",
default=False,help="Ignore validation errors by force")
default=False,help="Ignore validation errors by force, implies --ignore-unnamed-heads")
parser.add_option("--ignore-unnamed-heads",action="store_true",dest="ignore_unnamed_heads",
default=False,help="Ignore unnamed head errors")
parser.add_option("-M","--default-branch",dest="default_branch",
help="Set the default branch")
parser.add_option("-o","--origin",dest="origin_name",
@@ -484,10 +645,19 @@ if __name__=='__main__':
help="Assume file names from Mercurial are encoded in <filename_encoding>")
parser.add_option("--mappings-are-raw",dest="raw_mappings", default=False,
help="Assume mappings are raw <key>=<value> lines")
parser.add_option("--filter-contents",dest="filter_contents",
help="Pipe contents of each exported file through FILTER_CONTENTS <file-path> <hg-hash> <is-binary>")
parser.add_option("--plugin-path", type="string", dest="pluginpath",
help="Additional search path for plugins ")
parser.add_option("--plugin", action="append", type="string", dest="plugins",
help="Add a plugin with the given init string <name=init>")
parser.add_option("--subrepo-map", type="string", dest="subrepo_map",
help="Provide a mapping file between the subrepository name and the submodule name")
(options,args)=parser.parse_args()
m=-1
auto_sanitize = options.auto_sanitize
if options.max!=None: m=options.max
if options.marksfile==None: bail(parser,'--marks')
@@ -496,6 +666,14 @@ if __name__=='__main__':
if options.statusfile==None: bail(parser,'--status')
if options.repourl==None: bail(parser,'--repo')
if options.subrepo_map:
if not os.path.exists(options.subrepo_map):
sys.stderr.write('Subrepo mapping file not found %s\n'
% options.subrepo_map)
sys.exit(1)
submodule_mappings=load_mapping('subrepo mappings',
options.subrepo_map,False)
a={}
if options.authorfile!=None:
a=load_mapping('authors', options.authorfile, options.raw_mappings)
@@ -506,7 +684,7 @@ if __name__=='__main__':
t={}
if options.tagsfile!=None:
t=load_mapping('tags', options.tagsfile, True)
t=load_mapping('tags', options.tagsfile, options.raw_mappings)
if options.default_branch!=None:
set_default_branch(options.default_branch)
@@ -522,8 +700,36 @@ if __name__=='__main__':
if options.fn_encoding!=None:
fn_encoding=options.fn_encoding
plugins=[]
if options.plugins!=None:
plugins+=options.plugins
if options.filter_contents!=None:
plugins+=['shell_filter_file_contents='+options.filter_contents]
plugins_dict={}
plugins_dict['commit_message_filters']=[]
plugins_dict['file_data_filters']=[]
if plugins and options.pluginpath:
sys.stderr.write('Using additional plugin path: ' + options.pluginpath + '\n')
for plugin in plugins:
split = plugin.split('=')
name, opts = split[0], '='.join(split[1:])
i = pluginloader.get_plugin(name,options.pluginpath)
sys.stderr.write('Loaded plugin ' + i['name'] + ' from path: ' + i['path'] +' with opts: ' + opts + '\n')
plugin = pluginloader.load_plugin(i).build_filter(opts)
if hasattr(plugin,'file_data_filter') and callable(plugin.file_data_filter):
plugins_dict['file_data_filters'].append(plugin.file_data_filter)
if hasattr(plugin, 'commit_message_filter') and callable(plugin.commit_message_filter):
plugins_dict['commit_message_filters'].append(plugin.commit_message_filter)
sys.exit(hg2git(options.repourl,m,options.marksfile,options.mappingfile,
options.headsfile, options.statusfile,
authors=a,branchesmap=b,tagsmap=t,
sob=options.sob,force=options.force,hgtags=options.hgtags,
notes=options.notes,encoding=encoding,fn_encoding=fn_encoding))
sob=options.sob,force=options.force,
ignore_unnamed_heads=options.ignore_unnamed_heads,
hgtags=options.hgtags,
notes=options.notes,encoding=encoding,fn_encoding=fn_encoding,
plugins=plugins_dict))

View File

@@ -26,9 +26,26 @@ SFX_MARKS="marks"
SFX_HEADS="heads"
SFX_STATE="state"
GFI_OPTS=""
PYTHON=${PYTHON:-python}
USAGE="[--quiet] [-r <repo>] [--force] [-m <max>] [-s] [--hgtags] [-A <file>] [-B <file>] [-T <file>] [-M <name>] [-o <name>] [--hg-hash] [-e <encoding>]"
if [ -z "${PYTHON}" ]; then
# $PYTHON is not set, so we try to find a working python with mercurial:
for python_cmd in python2 python python3; do
if command -v $python_cmd > /dev/null; then
$python_cmd -c 'from mercurial.scmutil import revsymbol' 2> /dev/null
if [ $? -eq 0 ]; then
PYTHON=$python_cmd
break
fi
fi
done
fi
if [ -z "${PYTHON}" ]; then
echo "Could not find a python interpreter with the mercurial module >= 4.6 available. " \
"Please use the 'PYTHON' environment variable to specify the interpreter to use."
exit 1
fi
USAGE="[--quiet] [-r <repo>] [--force] [--ignore-unnamed-heads] [-m <max>] [-s] [--hgtags] [-A <file>] [-B <file>] [-T <file>] [-M <name>] [-o <name>] [--hg-hash] [-e <encoding>]"
LONG_USAGE="Import hg repository <repo> up to either tip or <max>
If <repo> is omitted, use last hg repository as obtained from state file,
GIT_DIR/$PFX-$SFX_STATE by default.
@@ -48,6 +65,8 @@ Options:
-B <file> Read branch map from file
-T <file> Read tags map from file
-M <name> Set the default branch name (defaults to 'master')
-n Do not perform built-in (broken in many cases) sanitizing
of branch/tag names.
-o <name> Use <name> as branch namespace to track upstream (eg 'origin')
--hg-hash Annotate commits with the hg hash as git notes in the
hg namespace.
@@ -56,6 +75,10 @@ Options:
--fe <filename_encoding> Assume filenames from Mercurial are encoded
in <filename_encoding>
--mappings-are-raw Assume mappings are raw <key>=<value> lines
--filter-contents <cmd> Pipe contents of each exported file through <cmd>
with <file-path> <hg-hash> <is-binary> as arguments
--plugin <plugin=init> Add a plugin with the given init string (repeatable)
--plugin-path <plugin-path> Add an additional plugin lookup path
"
case "$1" in
-h|--help)

View File

@@ -7,6 +7,7 @@ from mercurial import node
from hg2git import setup_repo,load_cache,get_changeset,get_git_sha1
from optparse import OptionParser
import sys
from binascii import hexlify
def heads(ui,repo,start=None,stop=None,max=None):
# this is copied from mercurial/revlog.py and differs only in
@@ -24,7 +25,7 @@ def heads(ui,repo,start=None,stop=None,max=None):
heads = {startrev: 1}
parentrevs = repo.changelog.parentrevs
for r in xrange(startrev + 1, max):
for r in range(startrev + 1, max):
for p in parentrevs(r):
if p in reachable:
if r not in stoprevs:
@@ -33,7 +34,7 @@ def heads(ui,repo,start=None,stop=None,max=None):
if p in heads and p not in stoprevs:
del heads[p]
return [(repo.changelog.node(r),str(r)) for r in heads]
return [(repo.changelog.node(r), b"%d" % r) for r in heads]
def get_branches(ui,repo,heads_cache,marks_cache,mapping_cache,max):
h=heads(ui,repo,max=max)
@@ -44,11 +45,11 @@ def get_branches(ui,repo,heads_cache,marks_cache,mapping_cache,max):
_,_,user,(_,_),_,desc,branch,_=get_changeset(ui,repo,rev)
del stale[branch]
git_sha1=get_git_sha1(branch)
cache_sha1=marks_cache.get(str(int(rev)+1))
cache_sha1=marks_cache.get(b"%d" % (int(rev)+1))
if git_sha1!=None and git_sha1==cache_sha1:
unchanged.append([branch,cache_sha1,rev,desc.split('\n')[0],user])
unchanged.append([branch,cache_sha1,rev,desc.split(b'\n')[0],user])
else:
changed.append([branch,cache_sha1,rev,desc.split('\n')[0],user])
changed.append([branch,cache_sha1,rev,desc.split(b'\n')[0],user])
changed.sort()
unchanged.sort()
return stale,changed,unchanged
@@ -57,20 +58,20 @@ def get_tags(ui,repo,marks_cache,mapping_cache,max):
l=repo.tagslist()
good,bad=[],[]
for tag,node in l:
if tag=='tip': continue
rev=int(mapping_cache[node.encode('hex_codec')])
cache_sha1=marks_cache.get(str(int(rev)+1))
if tag==b'tip': continue
rev=int(mapping_cache[hexlify(node)])
cache_sha1=marks_cache.get(b"%d" % (int(rev)+1))
_,_,user,(_,_),_,desc,branch,_=get_changeset(ui,repo,rev)
if int(rev)>int(max):
bad.append([tag,branch,cache_sha1,rev,desc.split('\n')[0],user])
bad.append([tag,branch,cache_sha1,rev,desc.split(b'\n')[0],user])
else:
good.append([tag,branch,cache_sha1,rev,desc.split('\n')[0],user])
good.append([tag,branch,cache_sha1,rev,desc.split(b'\n')[0],user])
good.sort()
bad.sort()
return good,bad
def mangle_mark(mark):
return str(int(mark)-1)
return b"%d" % (int(mark)-1)
if __name__=='__main__':
def bail(parser,opt):
@@ -107,7 +108,7 @@ if __name__=='__main__':
state_cache=load_cache(options.statusfile)
mapping_cache = load_cache(options.mappingfile)
l=int(state_cache.get('tip',options.revision))
l=int(state_cache.get(b'tip',options.revision))
if options.revision+1>l:
sys.stderr.write('Revision is beyond last revision imported: %d>%d\n' % (options.revision,l))
sys.exit(1)
@@ -117,19 +118,39 @@ if __name__=='__main__':
stale,changed,unchanged=get_branches(ui,repo,heads_cache,marks_cache,mapping_cache,options.revision+1)
good,bad=get_tags(ui,repo,marks_cache,mapping_cache,options.revision+1)
print "Possibly stale branches:"
map(lambda b: sys.stdout.write('\t%s\n' % b),stale.keys())
print("Possibly stale branches:")
for b in stale:
sys.stdout.write('\t%s\n' % b.decode('utf8'))
print "Possibly stale tags:"
map(lambda b: sys.stdout.write('\t%s on %s (r%s)\n' % (b[0],b[1],b[3])),bad)
print("Possibly stale tags:")
for b in bad:
sys.stdout.write(
'\t%s on %s (r%s)\n'
% (b[0].decode('utf8'), b[1].decode('utf8'), b[3].decode('utf8'))
)
print "Unchanged branches:"
map(lambda b: sys.stdout.write('\t%s (r%s)\n' % (b[0],b[2])),unchanged)
print("Unchanged branches:")
for b in unchanged:
sys.stdout.write('\t%s (r%s)\n' % (b[0].decode('utf8'),b[2].decode('utf8')))
print "Unchanged tags:"
map(lambda b: sys.stdout.write('\t%s on %s (r%s)\n' % (b[0],b[1],b[3])),good)
print("Unchanged tags:")
for b in good:
sys.stdout.write(
'\t%s on %s (r%s)\n'
% (b[0].decode('utf8'), b[1].decode('utf8'), b[3].decode('utf8'))
)
print "Reset branches in '%s' to:" % options.headsfile
map(lambda b: sys.stdout.write('\t:%s %s\n\t\t(r%s: %s: %s)\n' % (b[0],b[1],b[2],b[4],b[3])),changed)
print("Reset branches in '%s' to:" % options.headsfile)
for b in changed:
sys.stdout.write(
'\t:%s %s\n\t\t(r%s: %s: %s)\n'
% (
b[0].decode('utf8'),
b[1].decode('utf8'),
b[2].decode('utf8'),
b[4].decode('utf8'),
b[3].decode('utf8'),
)
)
print "Reset ':tip' in '%s' to '%d'" % (options.statusfile,options.revision)
print("Reset ':tip' in '%s' to '%d'" % (options.statusfile,options.revision))

View File

@@ -11,7 +11,24 @@ SFX_MAPPING="mapping"
SFX_HEADS="heads"
SFX_STATE="state"
QUIET=""
PYTHON=${PYTHON:-python}
if [ -z "${PYTHON}" ]; then
# $PYTHON is not set, so we try to find a working python with mercurial:
for python_cmd in python2 python python3; do
if command -v $python_cmd > /dev/null; then
$python_cmd -c 'import mercurial' 2> /dev/null
if [ $? -eq 0 ]; then
PYTHON=$python_cmd
break
fi
fi
done
fi
if [ -z "${PYTHON}" ]; then
echo "Could not find a python interpreter with the mercurial module available. " \
"Please use the 'PYTHON'environment variable to specify the interpreter to use."
exit 1
fi
USAGE="[-r <repo>] -R <rev>"
LONG_USAGE="Print SHA1s of latest changes per branch up to <rev> useful

View File

@@ -1,26 +1,36 @@
#!/usr/bin/env python
#!/usr/bin/env python2
# Copyright (c) 2007, 2008 Rocco Rutte <pdmef@gmx.net> and others.
# License: MIT <http://www.opensource.org/licenses/mit-license.php>
from mercurial import hg,util,ui,templatefilters
from mercurial import error as hgerror
from mercurial.scmutil import revsymbol,binnode
import re
import os
import sys
import subprocess
PY2 = sys.version_info.major < 3
if PY2:
str = unicode
fsencode = lambda s: s.encode(sys.getfilesystemencoding())
else:
from os import fsencode
# default git branch name
cfg_master='master'
cfg_master=b'master'
# default origin name
origin_name=''
origin_name=b''
# silly regex to see if user field has email address
user_re=re.compile('([^<]+) (<[^>]*>)$')
user_re=re.compile(b'([^<]+) (<[^>]*>)$')
# silly regex to clean out user names
user_clean_re=re.compile('^["]([^"]+)["]$')
user_clean_re=re.compile(b'^["]([^"]+)["]$')
def set_default_branch(name):
global cfg_master
cfg_master = name
cfg_master = name.encode('utf8') if not isinstance(name, bytes) else name
def set_origin_name(name):
global origin_name
@@ -31,24 +41,26 @@ def setup_repo(url):
myui=ui.ui(interactive=False)
except TypeError:
myui=ui.ui()
myui.setconfig('ui', 'interactive', 'off')
return myui,hg.repository(myui,url)
myui.setconfig(b'ui', b'interactive', b'off')
# Avoids a warning when the repository has obsolete markers
myui.setconfig(b'experimental', b'evolution.createmarkers', True)
return myui,hg.repository(myui, fsencode(url)).unfiltered()
def fixup_user(user,authors):
user=user.strip("\"")
user=user.strip(b"\"")
if authors!=None:
# if we have an authors table, try to get mapping
# by defaulting to the current value of 'user'
user=authors.get(user,user)
name,mail,m='','',user_re.match(user)
name,mail,m=b'',b'',user_re.match(user)
if m==None:
# if we don't have 'Name <mail>' syntax, extract name
# and mail from hg helpers. this seems to work pretty well.
# if email doesn't contain @, replace it with devnull@localhost
name=templatefilters.person(user)
mail='<%s>' % util.email(user)
if '@' not in mail:
mail = '<devnull@localhost>'
mail=b'<%s>' % templatefilters.email(user)
if b'@' not in mail:
mail = b'<devnull@localhost>'
else:
# if we have 'Name <mail>' syntax, everything is fine :)
name,mail=m.group(1),m.group(2)
@@ -57,25 +69,33 @@ def fixup_user(user,authors):
m2=user_clean_re.match(name)
if m2!=None:
name=m2.group(1)
return '%s %s' % (name,mail)
return b'%s %s' % (name,mail)
def get_branch(name):
# 'HEAD' is the result of a bug in mutt's cvs->hg conversion,
# other CVS imports may need it, too
if name=='HEAD' or name=='default' or name=='':
if name==b'HEAD' or name==b'default' or name==b'':
name=cfg_master
if origin_name:
return origin_name + '/' + name
return origin_name + b'/' + name
return name
def get_changeset(ui,repo,revision,authors={},encoding=''):
node=repo.lookup(revision)
# Starting with Mercurial 4.6 lookup no longer accepts raw hashes
# for lookups. Work around it by changing our behaviour depending on
# how it fails
try:
node=repo.lookup(revision)
except (TypeError, hgerror.ProgrammingError):
node=binnode(revsymbol(repo, b"%d" % revision)) # We were given a numeric rev
except hgerror.RepoLookupError:
node=revision # We got a raw hash
(manifest,user,(time,timezone),files,desc,extra)=repo.changelog.read(node)
if encoding:
user=user.decode(encoding).encode('utf8')
desc=desc.decode(encoding).encode('utf8')
tz="%+03d%02d" % (-timezone / 3600, ((-timezone % 3600) / 60))
branch=get_branch(extra.get('branch','master'))
tz=b"%+03d%02d" % (-timezone // 3600, ((-timezone % 3600) // 60))
branch=get_branch(extra.get(b'branch', b'master'))
return (node,manifest,fixup_user(user,authors),(time,tz),files,desc,branch,extra)
def mangle_key(key):
@@ -85,29 +105,35 @@ def load_cache(filename,get_key=mangle_key):
cache={}
if not os.path.exists(filename):
return cache
f=open(filename,'r')
f=open(filename,'rb')
l=0
for line in f.readlines():
l+=1
fields=line.split(' ')
if fields==None or not len(fields)==2 or fields[0][0]!=':':
fields=line.split(b' ')
if fields==None or not len(fields)==2 or fields[0][0:1]!=b':':
sys.stderr.write('Invalid file format in [%s], line %d\n' % (filename,l))
continue
# put key:value in cache, key without ^:
cache[get_key(fields[0][1:])]=fields[1].split('\n')[0]
cache[get_key(fields[0][1:])]=fields[1].split(b'\n')[0]
f.close()
return cache
def save_cache(filename,cache):
f=open(filename,'w+')
map(lambda x: f.write(':%s %s\n' % (str(x),str(cache.get(x)))),cache.keys())
f=open(filename,'wb')
for key, value in cache.items():
if not isinstance(key, bytes):
key = str(key).encode('utf8')
if not isinstance(value, bytes):
value = str(value).encode('utf8')
f.write(b':%s %s\n' % (key, value))
f.close()
def get_git_sha1(name,type='heads'):
try:
# use git-rev-parse to support packed refs
ref="refs/%s/%s" % (type,name)
l=subprocess.check_output(["git", "rev-parse", "--verify", "--quiet", ref])
ref="refs/%s/%s" % (type,name.decode('utf8'))
l=subprocess.check_output(["git", "rev-parse", "--verify",
"--quiet", ref.encode('utf8')])
if l == None or len(l) == 0:
return None
return l[0:40]

19
pluginloader/__init__.py Normal file
View File

@@ -0,0 +1,19 @@
import os
import imp
PluginFolder = os.path.join(os.path.dirname(os.path.realpath(__file__)),"..","plugins")
MainModule = "__init__"
def get_plugin(name, plugin_path):
search_dirs = [PluginFolder]
if plugin_path:
search_dirs = [plugin_path] + search_dirs
for dir in search_dirs:
location = os.path.join(dir, name)
if not os.path.isdir(location) or not MainModule + ".py" in os.listdir(location):
continue
info = imp.find_module(MainModule, [location])
return {"name": name, "info": info, "path": location}
raise Exception("Could not find plugin with name " + name)
def load_plugin(plugin):
return imp.load_module(MainModule, *plugin["info"])

View File

@@ -0,0 +1,20 @@
## Branch Name in Commit Message
Mercurial has a much stronger notion of branches than Git,
and some parties may not wish to lose the branch information
during the migration to Git. You can use this plugin to either
prepend or append the branch name from the mercurial
commit into the commit message in Git.
Valid arguments are:
- `start`: write the branch name at the start of the commit
- `end`: write the branch name at the end of the commit
- `sameline`: if `start` specified, put a colon and a space
after the branch name, such that the commit message reads
`branch_name: first line of commit message`. Otherwise, the
branch name is on the first line of the commit message by itself.
- `skipmaster`: Don't write the branch name if the branch is `master`.
To use the plugin, add
`--plugin branch_name_in_commit=<comma_separated_list_of_args>`.

View File

@@ -0,0 +1,25 @@
def build_filter(args):
return Filter(args)
class Filter:
def __init__(self, args):
args = {arg: True for arg in args.split(',')}
self.start = args.pop('start', False)
self.end = args.pop('end', False)
self.sameline = args.pop('sameline', False)
self.skip_master = args.pop('skipmaster', False)
if self.sameline and not self.start:
raise ValueError("sameline option only allowed if 'start' given")
if args:
raise ValueError("Unknown args: " + ','.join(args))
def commit_message_filter(self, commit_data):
if not (self.skip_master and commit_data['branch'] == b'master'):
if self.start:
sep = b': ' if self.sameline else b'\n'
commit_data['desc'] = commit_data['branch'] + sep + commit_data['desc']
if self.end:
commit_data['desc'] = (
commit_data['desc'] + b'\n' + commit_data['branch']
)

View File

@@ -0,0 +1,9 @@
## Dos2unix filter
This plugin converts CRLF line ending to LF in text files in the repo.
It is recommended that you have a .gitattributes file that maintains
the usage of LF endings going forward, for after you have converted your
repository.
To use the plugin, add
`--plugin dos2unix`.

View File

@@ -0,0 +1,11 @@
def build_filter(args):
return Filter(args)
class Filter():
def __init__(self, args):
pass
def file_data_filter(self,file_data):
file_ctx = file_data['file_ctx']
if not file_ctx.isbinary():
file_data['data'] = file_data['data'].replace(b'\r\n', b'\n')

12
plugins/drop/README.md Normal file
View File

@@ -0,0 +1,12 @@
## Drop commits from output
To use the plugin, add the command line flag `--plugin drop=<spec>`.
The flag can be given multiple times to drop more than one commit.
The <spec> value can be either
- a comma-separated list of hg hashes in the full form (40
hexadecimal characters) to drop the corresponding changesets, or
- a regular expression pattern to drop all changesets with matching
descriptions.

61
plugins/drop/__init__.py Normal file
View File

@@ -0,0 +1,61 @@
from __future__ import print_function
import sys, re
def build_filter(args):
if re.match(r'([A-Fa-f0-9]{40}(,|$))+$', args):
return RevisionIdFilter(args.split(','))
else:
return DescriptionFilter(args)
def log(fmt, *args):
print(fmt % args, file=sys.stderr)
sys.stderr.flush()
class FilterBase(object):
def __init__(self):
self.remapped_parents = {}
def commit_message_filter(self, commit_data):
rev = commit_data['revision']
mapping = self.remapped_parents
parent_revs = [rp for p in commit_data['parents']
for rp in mapping.get(p, [p])]
commit_data['parents'] = parent_revs
if self.should_drop_commit(commit_data):
log('Dropping revision %i.', rev)
self.remapped_parents[rev] = parent_revs
# Head commits cannot be dropped because they have no
# children, so detach them to a separate branch.
commit_data['branch'] = b'dropped-hg-head'
commit_data['parents'] = []
def should_drop_commit(self, commit_data):
return False
class RevisionIdFilter(FilterBase):
def __init__(self, revision_hash_list):
super(RevisionIdFilter, self).__init__()
self.unwanted_hg_hashes = {h.encode('ascii', 'strict')
for h in revision_hash_list}
def should_drop_commit(self, commit_data):
return commit_data['hg_hash'] in self.unwanted_hg_hashes
class DescriptionFilter(FilterBase):
def __init__(self, pattern):
super(DescriptionFilter, self).__init__()
self.pattern = re.compile(pattern.encode('ascii', 'strict'))
def should_drop_commit(self, commit_data):
return self.pattern.match(commit_data['desc'])

View File

@@ -0,0 +1,13 @@
## Convert Head to Branch
`fast-export` can only handle one head per branch. This plugin makes it possible
to create a new branch from a head by specifying the new branch name and
the first divergent commit for that head.
Note: the hg hash must be in the full form, 40 hexadecimal characters.
Note: you must run `fast-export` with `--ignore-unnamed-heads` option,
otherwise, the conversion will fail.
To use the plugin, add the command line flag `--plugin head2branch=name,<hg_hash>`.
The flag can be given multiple times to name more than one head.

View File

@@ -0,0 +1,24 @@
import sys
def build_filter(args):
return Filter(args)
class Filter:
def __init__(self, args):
args = args.split(',')
self.branch_name = args[0].encode('ascii', 'replace')
self.starting_commit_hash = args[1].encode('ascii', 'strict')
self.branch_parents = set()
def commit_message_filter(self, commit_data):
hg_hash = commit_data['hg_hash']
rev = commit_data['revision']
rev_parents = commit_data['parents']
if (hg_hash == self.starting_commit_hash
or any(rp in self.branch_parents for rp in rev_parents)
):
self.branch_parents.add(rev)
commit_data['branch'] = self.branch_name
sys.stderr.write('\nchanging r%s to branch %r\n' % (rev, self.branch_name))
sys.stderr.flush()

View File

@@ -0,0 +1,19 @@
## Issue Prefix
When migrating to other source code hosting sites, there are cases where a
project maintainer might want to reset their issue tracker and not have old
issue numbers in commit messages referring to the wrong issue. One way around
this is to prefix issue numbers with some other string.
If migrating to GitHub, this issue prefixing can be paired with GitHub's
autolinking capabilitiy to link back to a different issue tracker:
https://help.github.com/en/github/administering-a-repository/configuring-autolinks-to-reference-external-resources
To use this plugin, add:
`--plugin=issue_prefix=<some_prefix>`
Example:
`--plugin=issue_prefix=BB-`
This will prefix issue numbers with the string `BB-`. Example: `#123` will
change to `#BB-123`.

View File

@@ -0,0 +1,17 @@
# encoding=UTF-8
"""__init__.py"""
import re
def build_filter(args):
return Filter(args)
class Filter:
def __init__(self, args):
if not isinstance(args, bytes):
args = args.encode('utf8')
self.prefix = args
def commit_message_filter(self, commit_data):
for match in re.findall(b'#[1-9][0-9]+', commit_data['desc']):
commit_data['desc'] = commit_data['desc'].replace(
match, b'#%s%s' % (self.prefix, match[1:]))

View File

@@ -0,0 +1,23 @@
## Overwrite Null Commit Messages
There are cases (such as when creating a new, empty snippet on bitbucket
before they deprecated mercurial repositories) where you could create a
new repo with a single commit in it, but the message would be null. Then,
when attempting to convert this repository to a git repo and pushing to
a new host, the git push would fail with an error like this:
error: a NUL byte in commit log message not allowed
To get around this, you may provide a string that will be used in place of
a null byte in commit messages.
To use the plugin, add
--plugin overwrite_null_messages=""
This will use the default commit message `"<empty commit message>"`.
Or to specify a different commit message, you may pass this in at the
command line like so:
--plugin overwrite_null_messages="use this message instead"

View File

@@ -0,0 +1,16 @@
def build_filter(args):
return Filter(args)
class Filter:
def __init__(self, args):
if args == '':
message = b'<empty commit message>'
else:
message = args.encode('utf8')
self.message = message
def commit_message_filter(self,commit_data):
# Only write the commit message if the recorded commit
# message is null.
if commit_data['desc'] == b'\x00':
commit_data['desc'] = self.message

View File

@@ -0,0 +1,30 @@
## Shell Script File Filter
This plugin uses shell scripts in order to perform filtering of files.
If your preferred scripting is done via shell, this tool is for you.
Be noted, though, that this method can cause an order of magnitude slow
down. For small repositories, this wont be an issue.
To use the plugin, add
`--plugin shell_filter_file_contents=path/to/shell/script.sh`.
The filter script is supplied to the plugin option after the plugin name,
which is in turned passed to the plugin initialization. hg-fast-export
runs the filter for each exported file, pipes its content to the filter's
standard input, and uses the filter's standard output in place
of the file's original content. An example use of this feature
is to convert line endings in text files from CRLF to git's preferred LF,
although this task is faster performed using the native plugin.
The script is called with the following syntax:
`FILTER_CONTENTS <file-path> <hg-hash> <is-binary>`
```
-- Start of crlf-filter.sh --
#!/bin/sh
# $1 = pathname of exported file relative to the root of the repo
# $2 = Mercurial's hash of the file
# $3 = "1" if Mercurial reports the file as binary, otherwise "0"
if [ "$3" == "1" ]; then cat; else dos2unix; fi
-- End of crlf-filter.sh --
```

View File

@@ -0,0 +1,28 @@
#Pipe contents of each exported file through FILTER_CONTENTS <file-path> <hg-hash> <is-binary>"
import subprocess
import shlex
import sys
from mercurial import node
def build_filter(args):
return Filter(args)
class Filter:
def __init__(self, args):
self.filter_contents = shlex.split(args)
def file_data_filter(self,file_data):
d = file_data['data']
file_ctx = file_data['file_ctx']
filename = file_data['filename']
filter_cmd = self.filter_contents + [filename, node.hex(file_ctx.filenode()), '1' if file_ctx.isbinary() else '0']
try:
filter_proc = subprocess.Popen(filter_cmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE)
d, _ = filter_proc.communicate(d)
except:
sys.stderr.write('Running filter-contents %s:\n' % filter_cmd)
raise
filter_ret = filter_proc.poll()
if filter_ret:
raise subprocess.CalledProcessError(filter_ret, filter_cmd)
file_data['data'] = d

0
tests/__init__.py Normal file
View File

223
tests/test_drop_plugin.py Normal file
View File

@@ -0,0 +1,223 @@
import sys, os, subprocess
from tempfile import TemporaryDirectory
from unittest import TestCase
from pathlib import Path
class CommitDropTest(TestCase):
def test_drop_single_commit_by_hash(self):
hash1 = self.create_commit('commit 1')
self.create_commit('commit 2')
self.drop(hash1)
self.assertEqual(['commit 2'], self.git.log())
def test_drop_commits_by_desc(self):
self.create_commit('commit 1 is good')
self.create_commit('commit 2 is bad')
self.create_commit('commit 3 is good')
self.create_commit('commit 4 is bad')
self.drop('.*bad')
expected = ['commit 1 is good', 'commit 3 is good']
self.assertEqual(expected, self.git.log())
def test_drop_sequential_commits_in_single_plugin_instance(self):
self.create_commit('commit 1')
hash2 = self.create_commit('commit 2')
hash3 = self.create_commit('commit 3')
hash4 = self.create_commit('commit 4')
self.create_commit('commit 5')
self.drop(','.join((hash2, hash3, hash4)))
expected = ['commit 1', 'commit 5']
self.assertEqual(expected, self.git.log())
def test_drop_sequential_commits_in_multiple_plugin_instances(self):
self.create_commit('commit 1')
hash2 = self.create_commit('commit 2')
hash3 = self.create_commit('commit 3')
hash4 = self.create_commit('commit 4')
self.create_commit('commit 5')
self.drop(hash2, hash3, hash4)
expected = ['commit 1', 'commit 5']
self.assertEqual(expected, self.git.log())
def test_drop_nonsequential_commits(self):
self.create_commit('commit 1')
hash2 = self.create_commit('commit 2')
self.create_commit('commit 3')
hash4 = self.create_commit('commit 4')
self.drop(','.join((hash2, hash4)))
expected = ['commit 1', 'commit 3']
self.assertEqual(expected, self.git.log())
def test_drop_head(self):
self.create_commit('first')
self.create_commit('middle')
hash_last = self.create_commit('last')
self.drop(hash_last)
self.assertEqual(['first', 'middle'], self.git.log())
def test_drop_merge_commit(self):
initial_hash = self.create_commit('initial')
self.create_commit('branch A')
self.hg.checkout(initial_hash)
self.create_commit('branch B')
self.hg.merge()
merge_hash = self.create_commit('merge to drop')
self.create_commit('last')
self.drop(merge_hash)
expected_commits = ['initial', 'branch A', 'branch B', 'last']
self.assertEqual(expected_commits, self.git.log())
self.assertEqual(['branch B', 'branch A'], self.git_parents('last'))
def test_drop_different_commits_in_multiple_plugin_instances(self):
self.create_commit('good commit')
bad_hash = self.create_commit('bad commit')
self.create_commit('awful commit')
self.create_commit('another good commit')
self.drop('^awful.*', bad_hash)
expected = ['good commit', 'another good commit']
self.assertEqual(expected, self.git.log())
def test_drop_same_commit_in_multiple_plugin_instances(self):
self.create_commit('good commit')
bad_hash = self.create_commit('bad commit')
self.create_commit('another good commit')
self.drop('^bad.*', bad_hash)
expected = ['good commit', 'another good commit']
self.assertEqual(expected, self.git.log())
def setUp(self):
self.tempdir = TemporaryDirectory()
self.hg = HgDriver(Path(self.tempdir.name) / 'hgrepo')
self.hg.init()
self.git = GitDriver(Path(self.tempdir.name) / 'gitrepo')
self.git.init()
self.export = ExportDriver(self.hg.repodir, self.git.repodir)
def tearDown(self):
self.tempdir.cleanup()
def create_commit(self, message):
self.write_file_data('Data for %r.' % message)
return self.hg.commit(message)
def write_file_data(self, data, filename='test_file.txt'):
path = self.hg.repodir / filename
with path.open('w') as f:
print(data, file=f)
def drop(self, *spec):
self.export.run_with_drop(*spec)
def git_parents(self, message):
matches = self.git.grep_log(message)
if len(matches) != 1:
raise Exception('No unique commit with message %r.' % message)
subject, parents = self.git.details(matches[0])
return [self.git.details(p)[0] for p in parents]
class ExportDriver:
def __init__(self, sourcedir, targetdir, *, quiet=True):
self.sourcedir = Path(sourcedir)
self.targetdir = Path(targetdir)
self.quiet = quiet
self.python_executable = str(
Path.cwd() / os.environ.get('PYTHON', sys.executable))
self.script = Path(__file__).parent / '../hg-fast-export.sh'
def run_with_drop(self, *plugin_args):
cmd = [self.script, '-r', str(self.sourcedir)]
for arg in plugin_args:
cmd.extend(['--plugin', 'drop=' + arg])
output = subprocess.DEVNULL if self.quiet else None
subprocess.run(cmd, check=True, cwd=str(self.targetdir),
env={'PYTHON': self.python_executable},
stdout=output, stderr=output)
class HgDriver:
def __init__(self, repodir):
self.repodir = Path(repodir)
def init(self):
self.repodir.mkdir()
self.run_command('init')
def commit(self, message):
self.run_command('commit', '-A', '-m', message)
return self.run_command('id', '--id', '--debug').strip()
def log(self):
output = self.run_command('log', '-T', '{desc}\n')
commits = output.strip().splitlines()
commits.reverse()
return commits
def checkout(self, rev):
self.run_command('checkout', '-r', rev)
def merge(self):
self.run_command('merge', '--tool', ':local')
def run_command(self, *args):
p = subprocess.run(('hg', '-yq') + args,
cwd=str(self.repodir),
check=True,
text=True,
capture_output=True)
return p.stdout
class GitDriver:
def __init__(self, repodir):
self.repodir = Path(repodir)
def init(self):
self.repodir.mkdir()
self.run_command('init')
def log(self):
output = self.run_command('log', '--format=%s', '--reverse')
return output.strip().splitlines()
def grep_log(self, pattern):
output = self.run_command('log', '--format=%H',
'-F', '--grep', pattern)
return output.strip().splitlines()
def details(self, commit_hash):
fmt = '%s%n%P'
output = self.run_command('show', '-s', '--format=' + fmt,
commit_hash)
subject, parents = output.splitlines()
return subject, parents.split()
def run_command(self, *args):
p = subprocess.run(('git', '--no-pager') + args,
cwd=str(self.repodir),
check=True,
text=True,
capture_output=True)
return p.stdout